5,945 Matching Annotations
  1. May 2025
    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Sammons, Masserini et al. examine the connectivity of different types of CA3 pyramidal cells ("thorny" and "athorny"), and how their connectivity putatively contributes to their relative timing in sharp-wave-like activity. First, using patch-clamp recordings, they characterize the degree of connectivity within and between athorny and thorny cells. Based upon these experimental results, they compute a synaptic product matrix, and use this to inform a computational model of CA3 activity. This model finds that this differential connectivity between these populations, augmented by two different types of inhibitory neurons, can account for the relative timing of activity observed in sharp waves in vivo.

      We thank the reviewer for reading our manuscript, as well as for their nice summary and constructive comments

      Strengths:

      The patch-clamp experiments are exceptionally thorough and well done. These are very challenging experiments and the authors should be commended for their in-depth characterization of CA3 connectivity.

      Thank you for the recognition of our efforts.

      Weaknesses:

      (1) The computational elements of this study feel underdeveloped. Whereas the authors do a thorough job experimentally characterizing connections between excitatory neurons, the inhibitory neurons used in the model seem to be effectivity "fit neurons" and appear to have been tuned to produce the emergent properties of CA3 sharp wave-like activity. Although I appreciate the goal was to implicate CA3 connectivity contributions to activity timing, a stronger relationship seems like it could be examined. For example, did the authors try to "break" their model? It would be informative if they attempted different synaptic product matrices (say, the juxtaposition of their experimental product matrix) and see whether experimentally-derived sequential activity could not be elicited. It seems as though this spirit of analysis was examined in Figure 4C, but only insofar as individual connectivity parameters were changed in isolation.

      Including the two interneuron types (B and C) in the model is, on the one hand, necessary to align our modeling framework to the state-of-the-art model by Evangelista et al. (2020), which assumes that these populations act as switchers between an SPW and a non-SPW state, and on the other hand, less straightforward because the connectivity involving these interneurons is largely unknown.

      For B cells, the primary criterion to set their connections to and from excitatory cells was to balance the effect of the strong recurrent excitation and to achieve a mid-range firing rate for each population during sharp wave events. Our new simulations (Figure 5B) show that the initial suppression of population T (resulting in the long delay) indeed depends in equal proportions on the outlined excitatory connections and on how strongly each excitatory population is targeted by the B interneurons. However, these simulations demonstrate that there is a broad, clearly distinct, region of the parameter space that supports a long delay between the peaks, rather than a marginal set of finetuned parameters. In addition, the simulations show that B interneurons optimally contribute to the suppression of T when they primarily target T (Fig. 5B, panels 3,7,11,12,13) rather than A (panels 4,8,9,10,11). On the contrary, as reported in the parameter table, and now also displayed graphically in the new Figure 4A (included above, with arrow sizes proportional to the synaptic product between the parameters determining the total strength of each connection), we assume B to target A less weakly than T (to make up for the higher excitability of population A). Therefore, the long delay between the peaks in our model emerges in spite of the interneuron connectivity, rather than because of it, and it is an effect of the asymmetric connectivity between the two excitatory populations, in particular the extremely low connection from A to T.

      (2) Additional explanations of how parameters for interneurons were incorporated in the model would be very helpful. As it stands, it is difficult to understand the degree to which the parameters of these neurons are biologically constrained versus used as fit parameters to produce different time windows of activity in types of CA3 pyramidal cells.

      Response included in point (1).

      Reviewer #2 (Public Review):

      Sharp wave ripples are transient oscillations occurring in the hippocampus that are thought to play an important role in organising temporal sequences during the reactivation of neuronal activity. This study addresses the mechanism by which these temporal sequences are generated in the CA3 region focusing on two different subtypes of pyramidal neurons, thorny and athorny. Using high-quality electrophysiological recordings from up to 8 pyramidal neurons at a time the authors measure the connectivity rates between these pyramidal cell subtypes in a large dataset of 348 cells. This is a significant achievement and provides important data. The most striking finding is how similar connection characteristics are between cell types. There are no differences in synaptic strength or failure rates and some small differences in connectivity rates and short-term plasticity. Using model simulations, the authors explore the implications of the differences in connectivity rates for the temporal specificity of pyramidal cell firing within sharp-wave ripple events. The simulations show that the experimentally observed connectivity rates may contribute to the previously observed temporal sequence of pyramidal cell firing during sharp wave ripples.

      Thank you very much for your careful review of our manuscript and the overall positive assessment.

      The conclusions drawn from the simulations are not experimentally tested so remain theoretical. In the simple network model, the authors include basket cell and anti-SWR interneurons but the connectivity of these cell types is not measured experimentally and variations in interneuron parameters may also influence temporal specificity of firing.

      As variations in some of these parameters can indeed influence the temporal specificity of firing, we have now performed additional simulations, the results of which are in the new Figures 5 and S5. Please also see response to Reviewer 1, point 1.

      In addition, the influence of short-term plasticity measured in their experiments is not tested in the model.

      We have now included short-term synaptic depression in all the excitatory-to-excitatory synapses and compensated for the weakened recurrent excitation by scaling some of the other parameters. The results of re-running our simulations in this alternative version of the model are reported in Figure S3 and are qualitatively analogous to those in Figure 4.

      Interestingly, the experimental data reveal a large variability in many of the measured parameters. This may strongly influence the firing of pyramidal cells during SWRs but it is not represented within the model which uses the averaged data.

      We have now incorporated variability in the following simulation parameters: the strength and latency of the four excitatory-to-excitatory connections as well as the reversal potential and leak conductance of both types of pyramidal cells, assuming variabilities similar to those observed experimentally (see Materials and Methods for details). Upon a slight re-balancing of some inhibitory connection strengths, in order to achieve comparable firing rates, we found that this version of the model also supports the generation of sharp waves with two pyramidal components (Figure S4B), and is, thus, fully analogous to our basic model. Varying the excitatory connectivities as in the original simulations (cf. Figure 4C and Figure S4C) reveals that increasing the athorny-toathorny or decreasing the athorny-to-thorny connectivity still increases the delay between the peaks, although for some connectivity values the peak of the athorny population appears more spread out in time.

      Reviewer #3 (Public Review):

      Summary:

      The hippocampal CA3 region is generally considered to be the primary site of initiation of sharp wave ripples-highly synchronous population events involved in learning and memory although the precise mechanism remains elusive. A recent study revealed that CA3 comprises two distinct pyramidal cell populations: thorny cells that receive mossy fiber input from the dentate gyrus, and athorny cells that do not. That study also showed that it is athorny cells in particular that play a key role in sharp wave initiation. In the present work, Sammons, Masserini, and colleagues expand on this by examining the connectivity probabilities among and between thorny and athorny cells. First, using whole-cell patch clamp recordings, they find an asymmetrical connectivity pattern, with athorny cells receiving the most synaptic connections from both athorny and thorny cells, and thorny cells receiving fewer. They then demonstrate in spiking neural network simulations how this asymmetrical connectivity may underlie the preferential role of athorny cells in sharp wave initiation.

      Strengths:

      The authors provide independent validation of some of the findings by Hunt et al. (2018) concerning the distinction between thorny and athorny pyramidal cells in CA3 and advance our understanding of their differential integration in CA3 microcircuits. The properties of excitatory connections among and between thorny and athorny cells described by the authors will be key in understanding CA3 functions including, but not limited to, sharp wave initiation.

      As stated in the paper, the modeling results lend support to the idea that the increased excitatory connectivity towards athorny cells plays a key role in causing them to fire before thorny cells in sharp waves. More generally, the model adds to an expanding pool of models of sharp wave ripples which should prove useful in guiding and interpreting experimental research.

      Thank you very much for your careful review of our manuscript and this positive assessment.

      Weaknesses:

      The mechanism by which athorny cells initiate sharp waves in the model is somewhat confusingly described. As far as I understood, random fluctuations in the activities of A and B neurons provide windows of opportunity for pyramidal cells to fire if they have additionally recovered from adaptive currents. Thorny and athorny pyramidal cells are then set in a winner-takes-all competition which is quickly won by the athorny cells. The main thesis of the paper seems to be that athorny cells win this competition because they receive more inputs both from themselves and from thorny cells, hence, the connectivity "underlies the sequential activation". However, it is also stated that athorny cells activate first due to their lower rheobase and steeper f-I curve, and it is also indicated in the methods that athorny (but not thorny) cells fire in bursts. It seems that it is primarily these features that make them fire first, something which apparently happens even when the A to A connectivity is set to 0albeit with a very small lag. Perhaps the authors could further clarify the differential role of single cell and network parameters in determining the sequential activation of athorny and thorny cells. Is the role of asymmetric excitatory connectivity only to enhance the initial intrinsic advantage of athorny cells? If so, could this advantage also be enhanced in other ways?

      Thank you for the time invested in the review of our manuscript. We especially thank you for pointing out that the description of these dynamics was unclear: we have now improved it in the main text and we provide here an additional summary. As correctly highlighted by Reviewer 3, athorny neurons (A) are more excitable than thorny (T) ones due to single-neuron parameters: therefore, if there is a winner-takes-all competition, they are going to win it. Whether there is a competition in the first place, however, depends on the excitatory (and inhibitory) connections. In particular, we should distinguish two questions: does the activity of populations A and B (PV baskets), without adaptation (so at the beginning of the sharp wave) suppress T? And does the activity of populations T and B suppress A?

      The four possible combinations can be appreciated, for example, in the new Figure 5A5. If A can suppress T, but T cannot suppress A (low A-to-T, high T-to-A, bottom right corner, like in the data), A “wins” and T fires later, after a long delay. If both A and T can suppress each other (both cross-connections are low, bottom left corner), we still get the same outcome: A wins because of its earlier and sharper onset (due to single-neuron parameters). If neither population can suppress the other (high cross-connections, top right corner), then there is no competition and the populations reach the peak approximately at the same time. Only in the case in which T can suppress A, but A cannot suppress T (low T-to-A, high A-to-T, top left corner, opposite to the data), then A “loses” the competition. However, since A neurons nevertheless display some early activity (again, due to the single neuron parameters), this scenario is not as clean as the reversed one: rather, A cells have an initial, small peak, then T neurons quickly take over and grow to their own peak, and then, depending on how strongly T neurons suppress A neurons, there may or may not be a second peak for the A neurons. This is the reason why, in the top left corner of Figure 5B, the statistics show either a long positive or long negative delay, depending on whether the first (small) or second (absent, for some parameters) peak of A is taken into account. In summary, the experimentally measured connectivity does not only enhance the initial intrinsic advantage of A cells, but sets up the competitive dynamics in the first place, which are crucial for the emergence of two distinct peaks, rather than a single peak involving both populations.

      Although a clear effort has been made to constrain the model with biological data, too many degrees of freedom remain that allow the modeler to make arbitrary decisions. This is not a problem in itself, but perhaps the authors could explain more of their reasoning and expand upon the differences between their modeling choices and those of others. For example, what are the conceptual or practical advantages of using adaptation in pyramidal neurons as opposed to short-term synaptic plasticity as in the model by Hunt et al.?

      It should be pointed out that the model by Hunt et al. features adaptation in pyramidal neurons as well, as the neuronal units employed are also adaptive-exponential integrate-and-fire. In an early stage of this project, we obtained from Hunt et al. the code for their model, and ascertained that adaptation is the main mechanism governing the alternations between the sharp-wave and the non-sharp-wave states, to the extent that fully removing short-term plasticity from their model does not have any significant impact on the network dynamics. Therefore, our choices are, in this regard, fully consistent with theirs. In order to confirm that synaptic depression does not significantly impact the dynamics also in our model, we now performed additional simulations (Figure S3), addressed in the main text (lines 149-151) and in the response to Reviewer 1, who expressed similar concerns.

      Relatedly, what experimental observations could validate or falsify the proposed mechanisms?

      As sharp wave generation in this model relies on disinhibitory dynamics (suppression of the anti-sharp-wave interneurons C), the model could be validated/falsified by proving/disproving that a class of interneurons with anti-sharp-wave features exists. In addition, the mechanism we proposed for the long delay between the peaks of the athorny and thorny activity requires at least some connectivity from athorny to basket and from basket to thorny neurons.

      In the data by Hunt et al., thorny cells have a higher baseline (non-SPW) firing rate, and it is claimed that it is actually stochastic correlations in their firing that are amplified by athorny cells to initiate sharp waves. However, in the current model, the firing of both types of pyramidal cells outside of ripples appears to be essentially zero. Can the model handle more realistic firing rates as described by Hunt et al., or as produced by e.g., walking around an environment tiled with place cells, or would that trigger SPWs continuously?

      When building this model, we aimed at having two clearly distinct states the network could alternate between, so we picked a rather polarized connectivity to and from the anti-sharp wave cells (C), resulting in polarized states. As a result, we obtain a low, although non-zero, activity of pyramidal neurons in non-SPW states (0.4 spikes/s for athorny and 0.2 spikes/s for thorny). These assumptions can be partially relaxed, for example in the original model by Evangelista et al. (2020), where the background firing rate of pyramidal cells is ~2 spikes/s. It should also be noted that, when walking in an environment tiled with place cells, the hippocampus is subject to additional extra-hippocampal inputs (e.g. from the medial septum, resulting in theta oscillations) and to neuromodulation, which can alter the network in various ways that we have not included in our model. However, our results are not in contradiction to transient SPW-like activity states initiated at a certain phase of the theta oscillation, when the inhibition is weakest.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The manuscript reads like it was intended as a short-form manuscript for another journal. The introduction and discussion in particular are very brief and would benefit from being expanded and providing a bigger picture for the reader.

      We had originally aimed to submit in the eLife “short report” format. However, also thanks to the suggestion of Reviewer 1, we realized that our text would be better supported by extended introduction and discussion sections, as well as additional figures.

      (2) Graphs would benefit from including all datapoints, where appropriate.

      All datapoints have now been added to boxplots in the main figures and supplement.

      (3) The panels of Figure 4 are laid out strangely, it may be worthwhile to adjust.

      We thank the reviewer for this suggestion. We have now adjusted the layout of Figure 4 and believe it is now easier to follow.

      Reviewer #2 (Recommendations For The Authors):

      Useful points to address include:

      (1) Explore within the model the effect of altering interneuron connectivity. Are there other factors that can influence temporal specificity within SWRs?

      The effects of varying the connectivity to and from B interneurons (the ones which are SPWactive and therefore relevant for temporal specificity) have now been investigated in the new Figure 5B, in which such parameters were varied in pairs or combined with the two most relevant excitatoryto-excitatory connections.

      (2) Implement the experimentally observed short-term plasticity in the model to determine how this influences temporal specificity.

      All the findings in Figure 4 have now been replicated in the new Figure S3, in which excitatory-to-excitatory synapses feature synaptic depression.

      (3) Consider if it is possible to incorporate observed experimental variability in the model and explore the implications.

      All the findings in Figure 4 have now been replicated in the new Figure S4, in which heterogeneity has been introduced in multiple neuronal and synaptic parameters of thorny and athorny neurons.

      (4) Include the co-connectivity rates in the data. Ie how many of the recorded neurons are reciprocally connected? Does this change the model simulations?

      We have now added the rates of reciprocal connections that we observed into the main text (lines 8688). We found 2 pairs of reciprocally connected athorny neurons and 2 pairs of reciprocally connected thorny neurons. These rates of reciprocity were not statistically significant. We did not observe reciprocal connections in other paired neuron combinations (i.e. athorny-thorny or vice-versa). Coconnectivity does not have any effect on the model simulations, as the model includes thousands of neurons grouped in populations without specific sub-structures. It might, however, be more relevant if the excitatory populations were further subdivided in assemblies.

      Reviewer #3 (Recommendations For The Authors):

      (1) Specify which part of CA3 you are recording from.

      We have added this information into our results section - we recorded from 20 cells in CA3a, 274 cells in CA3b and 54 cells in CA3c. This information can now be found in the text on lines 68-69.

      (2) Comment on why you might observe a larger fraction of athorny cells than Hunt et al.

      Hunt et al. cite a broad range for the fraction of athorny cells in their discussion (10-20%). It is unclear where these estimates originate from. In their study, Hunt et al. use the bursting and nonbursting phenotypes as proxies for athorny and thorny cells respectively, and report here numbers of 32 and 70 equating to 31% athorny and 69% thorny. This fraction of athorny cells is more or less in line with our own findings, albeit slightly lower (34% and 66%). However, we believe this difference falls within the range of experimental variability. One caveat is that our electrophysiological recordings likely represent a biased sample of cells. In particular, with multipatch recordings, placement of later electrodes is often restricted to the borders of the pyramidal layer so as not to disturb already patched cells. Thus, our recorded cells do not represent a fully random sample of CA3 pyramidal cells. We believe that, only once a reliable genetic marker for athorny cells has been established can the size of this cell population be properly estimated. Furthermore, the ratio of thorny and athorny cells varies along the proximal distal axis of the CA3 so differences in ratios seen between our study and Hunt et al. may arise from sampling differences along this axis.

      (3) In Figure 3, Aiii (the cell fractions) could also be represented as a vector of two squares stacked one on top of the other, then you could add multiplication signs between Ai, Aii and Aiii, and an equal sign between Aiii and Aiv.

      Thank you! We have implemented this very nice suggestion.

      (4) In Figure 4A, it would be helpful to display the strength of the connections similar to how it is done in Figure 3B.

      We thank the reviewer for this suggestion. We have now updated Fig 4A to include connection strengths.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Cognitive and brain development during the first two years of life is vast and determinant for later development. However, longitudinal infant studies are complicated and restricted to occidental high-income countries. This study uses fNIRS to investigate the developmental trajectories of functional connectivity networks in infants from a rural community in Gambia. In addition to resting-state data collected from 5 to 24 months, the authors collected growing measures from birth until 24 months and administrated an executive functioning task at 3 or 5 years old.

      The results show left and right frontal-middle and right frontal-posterior negative connections at 5 months that increase with age (i.e., become less negative). Interestingly, contrary to previous findings in high-income countries, there was a decrease in frontal interhemispheric connectivity. Restricted growth during the first months of life was associated with stronger frontal interhemispheric connectivity and weaker right frontal-posterior connectivity at 24 months. Additionally, the study describes that some connectivity patterns related to better cognitive flexibility at pre-school age.

      Strengths:

      - The authors analyze data from 204 infants from a rural area of Gambia, already a big sample for most infant studies. The study might encourage more research on different underrepresented infant populations (i.e., infants not living in occidental high-income countries).

      - The study shows that fNIRS is a feasible instrument to investigate cognitive development when access to fMRI is not possible or outside a lab setting.

      - The fNIRS data preprocessing and analysis are well-planned, implemented, and carefully described. For example, the authors report how the choices in the parameters for the motion artifacts detection algorithm affect data rejection and show how connectivity stability varies with the length of the data segment to justify the threshold of at least 250 seconds free of artifacts for inclusion.

      - The authors use proper statistical methods for analysis, considering the complexity of the dataset.

      We thank the reviewer for highlighting the strengths of this work.

      Weaknesses:

      - No co-registration of the optodes is implemented. The authors checked for correct placement by looking at pictures taken during the testing session. However, head shape and size differences might affect the results, especially considering that the study involves infants from 5 months to 24 months and that the same fNIRS array was used at all ages.

      The fNIRS array used in this work was co-registered onto age-appropriate MNI templates at every time point in a previous published work L. H. Collins-Jones, et al., Longitudinal infant fNIRS channel-space analyses are robust to variability parameters at the group-level: An image reconstruction investigation. Neuroimage 237, 118068 (2021). This is reference No. 68 in the manuscript.

      As we mentioned in the section fNIRS preprocessing and data-analysis: ‘The sections were established via the 17 channels of each hemisphere which were grouped into front, middle and back (for a total of six regions) based on a previous co-registration of the BRIGHT fNIRS arrays onto age-appropriate templates’. The procedure mentioned by the reviewer, involving the examination of pictures showing the placement of headbands on participants, aimed to exclude infants with excessive cap displacement from further analysis.

      - The authors regress the global signal to remove systemic physiological noise. While the authors also report the changes in connectivity without global signal regression, there are some critical differences. In particular, the apparent decrease in frontal inter-hemispheric connections is not present when global signal regression is omitted, even though it is present for deoxy-Hb. The authors use connectivity results obtained after applying global signal regression for further analysis. The choice of regressing the global signal is questionable since it has been shown to introduce anti-correlations in fMRI data (Murphy et al., 2009), and fNIRS in young infants does not seem to be highly affected by physiological noise (Emberson et al., 2016). Systemic physiological noise might change at different ages, which makes its remotion critical to investigate functional network development. However, global signal regression might also affect the data differently. The study would have benefited from having short separation channels to measure the systemic psychological component in the data.

      The work of Emberson et. al (2016) mentioned by the reviewer highlights indeed the challenges of removing systemic changes from the infants’ haemodynamic signal with short-channel separation (SSC). In fact, even a SSC of 1 cm detected changes in the blood in the brain, therefore by regressing this signal from the recorded one, the authors removed both systemic changes AND haemodynamic signal. This paper from Emberson et. al (2016) is taken as a reference in the field to suggest that SSC might not be an ideal tool to remove systemic changes when collecting fNIRS data on young infants, as we did in this work.

      We agree with the reviewer's observation that systemic physiological noise may vary with age and among infants. Therefore, for each infant at each age, we regressed the mean value calculated across all channels. This ensures that the regressed signal is not biased by averaged calculations at group levels.

      We are aware of the criticisms directed towards global signal regression in the fMRI literature, although some other works showed anticorrelations in functional connectivity networks both with and without global signal regression (Chaia, 2012). Furthermore, Murphy himself revised his criticism on the use of global signal regression in functional connectivity analysis in one of his more recent works (Murphy et al, 2017). The fact that the decreased FC is significant in results from data pre-processed without global signal regression gives us confidence that this finding is statistically robust and not solely driven by this preprocessing choice in our pipeline.

      An interesting study by Abdalmalak et al. (2022) demonstrated that failing to correct for systemic changes using any method is inappropriate when estimating FC with fNIRS, as it can lead to a high risk of elevated connectivity across the whole brain (see Figure 4 of the mentioned paper). Consequently, we strongly advocate for the implementation of global signal regression in our analysis pipeline as a fundamental step for accurate functional connectivity estimations.

      References:

      Emberson, L. L., Crosswhite, S. L., Goodwin, J. R., Berger, A. J., & Aslin, R. N. (2016). Isolating the effects of surface vasculature in infant neuroimaging using short-distance optical channels: a combination of local and global effects. Neurophotonics, 3(3), 031406-031406.

      Chaia, X. J., Castañóna, A. N., Öngürb, D., & Whitfield-Gabrielia, S. (2012). Anticorrelations in resting state networks without global signal regression. NeuroImage, 59(2), 1420–1428. https://doi.org/10.1515/9783050076010-014

      Murphy, K., & Fox, M. D. (2017). Towards a consensus regarding global signal regression for resting state functional connectivity MRI. NeuroImage, 154(November 2016), 169–173. https://doi.org/10.1016/j.neuroimage.2016.11.052

      Abdalmalak, A., Novi, S. L., Kazazian, K., Norton, L., Benaglia, T., Slessarev, M., ... & Owen, A. M. (2022). Effects of systemic physiology on mapping resting-state networks using functional near-infrared spectroscopy. Frontiers in neuroscience, 16, 803297.

      - I believe the authors bypass a fundamental point in their framing. When discussing the results, the authors compare the developmental trajectories of the infants tested in a rural area of Gambia with the trajectories reported in previous studies on infants growing in occidental high-income countries (likely in urban contexts) and attribute the differences to adverse effects (i.e., nutritional deficits). Differences in developmental trajectories might also derive from other environmental and cultural differences that do not necessarily lead to poor cognitive development.

      We agree with the reviewer that other factors differing between low- and poor-resource settings might have an impact on FC trajectories. We therefore specified this in the discussion as follows: “We acknowledge that differences in FC could also be attributed to other environmental and cultural disparities between high-resource and low-resource settings, and future studies are needed to investigate this further” (line 238).

      - While the study provides a solid description of the functional connectivity changes in the first two years of life at the group level, the evidence regarding the links between adverse situations, developmental trajectories, and later cognitive capacities is weaker. The authors find that early restricted growth predicts specific connectivity patterns at 24 months and that certain connectivity patterns at specific ages predict cognitive flexibility. However, the link between development trajectories (individual changes in connectivity) with growth and later cognitive capacities is missing. To address this question adequately, the study should have compared infants with different growing profiles or those who suffered or did not from undernutrition. However, as the authors discussed, they lacked statistical power.

      We agree with the reviewer, and indeed we highlighted this as one of the main limitation of our work: “Even given the large sample in our study, we were underpowered to test for group comparisons between sets of infants with distinct undernutrition growth profiles, e.g., infants with early poor growth that later resolved and infants with standard growth early that had a poor growth later. We were also underpowered to test the associations between early growth and FC on clinically undernourished infants (defined as having DWLZ two standard deviations below the mean) (line 311, discussion section).

      We believe this is an important point to consider for the field, as it addresses the sample size required for studies investigating brain development in clinically malnourished infants. We hope this will serve as a valuable reference for future studies in the field. For example, a new study led by Prof. Sophie Moore and other members of the BRIGHT team (INDiGO) is currently recruiting six-hundreds pregnant women with the aim of obtaining a broader distribution of infants’ growth measures (https://www.kcl.ac.uk/research/sophie-moore-research-group).

      Reviewer #2 (Public Review):

      Summary and strengths:

      The article pertains to a topic of importance, specifically early life growth faltering, a marker of undernutrition, and how it influences brain functional connectivity and cognitive development. In addition, the data collection was laborious, and data preprocessing was quite rigorous to ensure data quality, utilizing cutting-edge preprocessing methods.

      We thank the reviewer for highlighting the strengths of this work.

      Weaknesses:

      However, the subsequent analysis and explanations were not very thorough, which made some results and conclusions less convincing. For example, corrections for multiple tests need to be consistently maintained; if the results do not survive multiple corrections, they should not be discussed as significant results. Additionally, alternative plans for analysis strategies could be worth exploring, e.g., using ΔFC in addition to FC at a certain age. Lastly, some analysis plans lacked a strong theoretical foundation, such as the relationship between functional connectivity (FC) between certain ROIs and the development of cognitive flexibility.

      Thus, as much as I admire the advanced analysis of connectivity that was conducted and the uniqueness of longitudinal fNIRS data from these samples (even the sheer effort to collect fNIRS longitudinally in a low-income country at such a scale!), I have reservations about the importance of this paper's contribution to the field in its present form. Major revisions are needed, in my opinion, to enhance the paper's quality. 

      We acknowledge the reviewer’s concern regarding the reporting of results that do not survive multiple comparisons. However, considering the uniqueness of our dataset and the novelty of our work, we believe it is crucial to report all significant findings as well as hypothesis-generating findings that may not pass stringent significance thresholds. We have taken great care to transparently distinguish between results that survived multiple comparisons and those that did not in both the Results and Discussion sections, ensuring that readers are not misled. It is possible that future studies may replicate and further strengthen these associations. Therefore, by sharing these results with the research community, we provide valuable insights for future investigations.

      The relationship between FC and cognitive flexibility (as well as the relationship between growth and FC) has been explored focusing on those FC that showed a significant change with age, as specified in the results sections: ‘To investigate the impact of early nutritional status on FC at 24 months, we used multiple regression with the infant growth trajectory [...] and FC at 24 months [...]. To maximise power, we considered only those FC that showed a statistically significant change with age’ (line 183) and ‘To investigate whether FC early in life predicted cognitive flexibility at preschool age, we used multiple regression of FC across the first two years of life against later cognitive flexibility in preschoolers at three and five years. As per the analysis above, we focused on only those FC that showed a statistically significant change with age’ (line 198).

      We explored the possibility of investigating the relationship between changes in FC and changes in growth. However, the degrees of freedom in these analyses dropped dramatically (~25/30), thereby putting the significance and the meaning of the results at risk. We look forward to future longitudinal studies with less attrition across these time points to maintain the statistical power necessary to run such analyses.

      Reviewer #3 (Public Review):

      Summary:

      This study aimed to investigate whether the development of functional connectivity (FC) is modulated by early physical growth and whether these might impact cognitive development in childhood. This question was investigated by studying a large group of infants (N=204) assessed in Gambia with fNIRS at 5 visits between 5 and 24 months of age. Given the complexity of data acquisition at these ages and following data processing, data could be analyzed for 53 to 97 infants per age group. FC was analyzed considering 6 ensembles of brain regions and thus 21 types of connections. Results suggested that: i) compared to previously studied groups, this group of Gambian infants have different FC trajectory, in particular with a change in frontal inter-hemispheric FC with age from positive to null values; ii) early physical growth, measured through weight-for-length z-scores from birth on, is associated with FC at 24 months. Some relationships were further observed between FC during the first two years and cognitive flexibility at 4-5 years of age, but results did not survive corrections for multiple comparisons.

      Strengths:

      The question investigated in this article is important for understanding the role of early growth and undernutrition on brain and behavioral development in infants and children. The longitudinal approach considered is highly relevant to investigate neurodevelopmental trajectories. Furthermore, this study targets a little-studied population from a low-/middle-income country, which was made possible by the use of fNIRS outside the lab environment. The collected dataset is thus impressive and it opens up a wide range of analytical possibilities.

      We thank the reviewer for highlighting the strengths of this work.

      Weaknesses:

      - Analyzing such a huge amount of collected data at several ages is not an easy task to test developmental relationships between growth, FC, and behavioral capacities. In its present form, this study and the performed analyses lack clarity, unity and perhaps modeling, as it suggests that all possible associations were tested in an exploratory way without clear mechanistic hypotheses. Would it be possible to specify some hypotheses to reduce the number of tests performed? In particular, considering metrics at specific ages or changes in the metrics with age might allow us to test different hypotheses: the authors might clarify what they expect specifically for growth-FC-behaviour associations. Since some FC measures and changes might be related to one another, would it be reasonable to consider a dimensionality reduction approach (e.g., ICA) to select a few components for further correlation analyses?

      We confirm that this work was motivated by a compelling theoretical question: whether neural mechanisms, specifically FC, can be influenced by early adversity, such as growth, and subsequently impact cognitive outcomes, such as cognitive flexibility. This aligns with the overarching goal of the BRIGHT project, established in 2015 (Lloyd-Fox, 2023). We believe this was evident throughout the manuscript in several instances, for example:

      - “The goal of the study was to investigate early physical growth in infancy, developmental trajectories of brain FC across the first two years of life, and cognitive outcome at school age in a longitudinal cohort of infants and children from rural Gambia, an environment with high rates of maternal and child undernutrition. Specifically, we aimed to: (i) investigate whether differences in physical growth through the first two years of life are related to FC at 24 months, and (ii) investigate if trajectories of early FC have an impact on cognitive outcome at pre-school age in these children.” (page 4, introduction)

      - “This study investigated how early adversity via undernutrition drives longitudinal changes in brain functional connectivity at five time points throughout the first two years of life and how these developmental trajectories are associated with cognitive flexibility at preschool age.” (page 6, discussion)

      - We had a clear hypothesis regarding short-range connectivity decreasing with age and long-range connectivity increasing with age, as stated at the end of the introduction: We hypothesized that (i) long-range FC would increase and short-range FC would decrease throughout the first two years of life” (page 4, line 147). However, we were not able to formulate clear hypotheses about the localization of these connections due to the scarcity of previous studies conducted within this age range, particularly in low-resource settings. The ROI approach for analysis was chosen to mitigate this challenge by reducing the number of comparisons while still enabling us to estimate the developmental trajectories of all the connections from which we acquired data.

      Regarding the use of dimensionality reduction approach, we have not considered the use of ICA in our analysis. These methods require selecting a fixed number of components to remove from all participants. However, due to the high variability of infant fNIRS data across the five timepoints, we considered it untenable to precisely determine the number of components to remove at the group level. Such a procedure carries the risk of over-cleaning the data for some participants while leaving noise in for others (Di Lorenzo, 2019). We also felt that using PCA in this initial study would be beyond the scope of the brain-region-specific hypotheses and would be more appropriate in a follow-up analysis of these important data.

      References:

      Lloyd-Fox, S., McCann, S., Milosavljevic, B., Katus, L., Blasi, A., Bulgarelli, C., Crespo-Llado, M., Ghillia, G., Fadera, T., Mbye, E., Mason, L., Njai, F., Njie, O., Perapoch-Amado, M., Rozhko, M., Sosseh, F., Saidykhan, M., Touray, E., Moore, S. E., … Team, and the B. S. (2023). The Brain Imaging for Global Health (BRIGHT) Study: Cohort Study Protocol. Gates Open Research, 7(126).

      Di Lorenzo, R., Pirazzoli, L., Blasi, A., Bulgarelli, C., Hakuno, Y., Minagawa, Y., & Brigadoi, S. (2019). Recommendations for motion correction of infant fNIRS data applicable to multiple data sets and acquisition systems. NeuroImage, 200(April), 511–527.

      - It seems that neurodevelopmental trajectories over the whole period (5-24 months) are little investigated, and considering more robust statistical analyses would be an important aspect to strengthen the results. The discussion mentions the potential use of structural equation modelling analyses, which would be a relevant way to better describe such complex data.

      We appreciate the complexity of the dataset we are working with, which includes multiple measures and time points. Currently, our focus within the outputs from the BRIGHT project is on examining the relationship between selected measures. While this may not involve statistically advanced modelling at the moment, it is worth noting that most of the results presented in this work have survived correction for multiple comparisons, indicating their statistical robustness. We believe that more advanced statistical analyses are beyond the scope of this rich initial study. In the next phase of the project, known as BRIGHT IMPACT, our team is collaborating with statisticians and experts in statistical modelling to apply more sophisticated and advanced statistical techniques to the data.

      - Given the number of analyses performed, only describing results that survive correction for multiple comparisons is required. Unifying the correction approach (FDR / Bonferroni) is also recommended. For the association between cognitive flexibility and FC, results are not significant, and one might wonder why FC at specific ages was considered rather than the change in FC with age. One of the relevant questions of such a study would be whether early growth and later cognitive flexibility are related through FC development, but testing this would require a mediation analysis that was not performed.

      We acknowledge the reviewer’s concern regarding the reporting of results that do not survive multiple comparisons. However, considering the uniqueness of our dataset and the novelty of our work, we believe it is crucial to report all significant findings. We have taken great care to transparently distinguish between results that survived multiple comparisons and those that did not in both the Results and Discussion sections, ensuring that readers are not misled. It is possible that future studies may replicate and further strengthen these associations. Therefore, by sharing these results with the research community, we provide valuable insights for future investigations.

      We did not perform a mediation analysis as i) ΔWLZ between birth and the subsequent time points positively predicted frontal interhemispheric FC at 24 months, ii) frontal interhemispheric FC at 18 months (and right fronto-posterior connectivity at 24 months) predicted cognitive flexibility at preschool age. Considering that the frontal interhemispheric FC at 24 months that was positively predicted by growth, did not significantly predicted cognitive outcome at preschool age, we did not perform mediation models.

      The reviewer raised concerns about using different methods to correct for multiple comparisons throughout the work. Results showing changes in FC with age were Bonferroni corrected, while we used FDR correction for the regression analyses investigating the relationship between growth and FC, as well as FC and cognitive flexibility. Both methods have good control over Type I errors (false positives), but Bonferroni is very conservative, increasing the likelihood of Type II errors (false negatives). We considered Bonferroni an appropriate method for correcting results showing changes in FC with age, where we had a large sample with strong statistical power (i.e. linear mixed models with 132 participants who had at least 250 seconds of good data for 2 out of 5 visits). However, Bonferroni was too conservative for the regression analyses, with N between 57 and 78) (Acharya, 2014; Félix & Menezes, 2018; Narkevich et al., 2020; Narum, 2006; Olejnik et al., 1997).

      References:

      Acharya, A. (2014). A Complete Review of Controlling the FDR in a Multiple Comparison Problem Framework--The Benjamini-Hochberg Algorithm. ArXiv Preprint ArXiv:1406.7117.

      Félix, V. B., & Menezes, A. F. B. (2018). Comparisons of ten corrections methods for t-test in multiple comparisons via Monte Carlo study. Electronic Journal of Applied Statistical Analysis, 11(1), 74–91.

      Narkevich, A. N., Vinogradov, K. A., & Grjibovski, A. M. (2020). Multiple comparisons in biomedical research: the problem and its solutions. Ekologiya Cheloveka (Human Ecology), 27(10), 55–64.

      Narum, S. R. (2006). Beyond Bonferroni: less conservative analyses for conservation genetics. Conservation Genetics, 7, 783–787.

      Olejnik, S., Li, J., Supattathum, S., & Huberty, C. J. (1997). Multiple testing and statistical power with modified Bonferroni procedures. Journal of Educational and Behavioral Statistics, 22(4), 389–406.

      - Growth is measured at different ages through different metrics. Justifying the use of weight-for-length z-scores would be welcome since weight-for-age z-scores might be a better marker of growth and possible undernutrition (this impacting potentially both weight and length). Showing the distributions of these z-scores at different ages would allow the reader to estimate the growth variability across infants.

      We consistently used WLZ as the metric to measure growth throughout. Our analysis investigating the relationship between WLZ and growth included HCZ at 7/14 days to correct for head size at birth. When selecting the best growth measure for this paper, we opted for WLZ over WAZ, given extant evidence that infants in our sample are smaller and shorter compared to the reference WHO standard for the same age group (Nabwera et al., 2017). Therefore, using WLZ allows us to adjust each infant's weight for its own length.

      References:

      Nabwera, H. M., Fulford, A. J., Moore, S. E., & Prentice, A. M. (2017). Growth faltering in rural Gambian children after four decades of interventions: a retrospective cohort study. The Lancet Global Health, 5(2), e208–e216.

      - Regarding FC, clarifications about the long-range vs short-range connections would be welcome, as well as drawing a summary of what is expected in terms of FC "typical" trajectory, for the different brain regions and connections, as a marker of typical development. For instance, the authors suggest that an increase in long-range connectivity vs a decrease in short-range is expected based on previous fNIRS studies. However anatomical studies of white matter growth and maturation would suggest the reverse pattern (short-range connections developing mostly after birth, contrarily to long-range connections prenatally).

      We expected an increase in long-range functional connectivity with age, as discussed in the introduction:

      - “Based on data from fMRI, current models hypothesize that FC patterns mature throughout early development (23–27), where in typically developing brains, adult-like networks emerge over the first years of life as long-range functional connections between pre-frontal, parietal, temporal, and occipital regions become stronger and more selective (28–31). This maturation in FC has been shown to be related to the cascading maturation of myelination and synaptogenesis (32, 33) - fundamental processes for healthy brain development (34)” (line 93, page 3, introduction);

      - “Importantly, normative developmental patterns may be disrupted and even reversed in clinical conditions that impact development; e.g., increased short-range and reduced long-range FC have been observed in preterm infants (36) and in children with autism spectrum disorder (37, 38)” (line 103, page 3, introduction);

      - “We hypothesized that (i) long-range FC would increase and short-range FC would decrease throughout the first two years of life” (line 147, page 4, introduction).

      Since inferences about FC patterns recorded with fNIRS are highly limited by the number and locations of the optodes, it is challenging to make strong inferences about specific brain regions. Moreover, infant FC fNIRS studies are still limited, which is why we focused our inferences on long-range versus short-range connectivity, without specifically pinpointing particular brain regions.

      Additionally, were unable to locate the works mentioned by the reviewer regarding an increase in short-range white matter connectivity immediately after birth. On the contrary, we found several studies documenting an increase in white-matter long-range connectivity after birth, which is consistent with the hypothesised increase in FC long-range connectivity, such as:

      Yap, P. T., Fan, Y., Chen, Y., Gilmore, J. H., Lin, W., & Shen, D. (2011). Development trends of white matter connectivity in the first years of life. PloS one, 6(9), e24678.

      Dubois, J., Dehaene-Lambertz, G., Kulikova, S., Poupon, C., Hüppi, P. S., & Hertz-Pannier, L. (2014). The early development of brain white matter: a review of imaging studies in fetuses, newborns and infants. Neuroscience, 276, 48-71.

      Stephens, R. L., Langworthy, B. W., Short, S. J., Girault, J. B., Styner, M. A., & Gilmore, J. H. (2020). White matter development from birth to 6 years of age: a longitudinal study. Cerebral Cortex, 30(12), 6152-6168.

      Hagmann, P., Sporns, O., Madan, N., Cammoun, L., Pienaar, R., Wedeen, V. J., ... & Grant, P. E. (2010). White matter maturation reshapes structural connectivity in the late developing human brain. Proceedings of the National Academy of Sciences, 107(44), 19067-19072.

      Collin G, van den Heuvel MP. The ontogeny of the human connectome: development and dynamic changes of brain connectivity across the life span. Neuroscientist. 2013 Dec;19(6):616-28. doi: 10.1177/1073858413503712.

      The authors test associations between FC and growth, but making sense of such modulation results is difficult without a clearer view of developmental changes per se (e.g., what does an early negative FC mean? Is it an increase in FC when the value gets close to 0? In particular, at 24m, it seems that most FC values are not significantly different from 0, Figure 2B). Observing positive vs negative association effects depending on age is quite puzzling. It is also questionable, for some correlation analyses with cognitive flexibility, to focus on FC that changes with age but to consider FC at a given age.

      We thank the reviewer for bringing up this important point and understand that it requires some additional consideration. The negative FC values decreasing with age indicate that these regions go from being anti-correlated to becoming increasingly correlated. Hence, FC of these ROIs increased with age. The trajectory seems to suggest that this will keep increasing with age but of course further data need to be collected to assess this.

      Unfortunately, when considering ΔFC to predict cognitive flexibility, the numbers of participants dropped significantly, with N=~15/20 infants per group of preschoolers, making it very challenging to interpret the results with meaningful statistical power.

      - The manuscript uses inappropriate terms "to predict", "prediction" whereas the conducted analyses are not prediction analyses but correlational.

      We thank the reviewer for giving us to opportunity to thoroughly revise the manuscript about this matter. In this work, we had clear hypotheses regarding which variables predicted which certain measures (such as growth predicting FC and FC predicting cognitive outcomes). Therefore, we performed regression analyses rather than correlational analyses to investigate these associations. Hence, we believe that using the term ‘predict and ‘prediction’ is appropriate

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) In the introduction and discussion, the authors talk about the link between developmental trajectories and cognitive capacities, and undernutrition. However, they did not compare developmental trajectories but connectivity patterns at different ages with ΔWLZ and cognitive flexibility. I recommend that the authors rephrase the introduction and discussion.

      We thank the reviewer for pointing out places requiring better clarity in the text. We made edits through the introduction to better match our investigations. In particular we changed:

      - ‘our understanding of the relationships between early undernutrition, developmental trajectories of brain connectivity, and later cognitive outcomes is still very limited,’ to, ‘our understanding of the relationships between early undernutrition, brain connectivity, and later cognitive outcomes is still very limited’ (line 89, introduction);

      - ‘(ii) investigate if trajectories of early FC have an impact on cognitive outcome at pre-school age in these children,’ to, ‘(ii) investigate if early FC has an impact on cognitive outcome at pre-school age in these children’ (line 137, introduction);

      - ‘This study investigated how early adversity via undernutrition drives longitudinal changes in brain functional connectivity at five time points throughout the first two years of life and how these developmental trajectories are associated with cognitive flexibility at preschool age,’ to, ‘This study investigated how early adversity via undernutrition drives brain functional connectivity throughout the first two years of life and how these early functional connections are associated with cognitive flexibility at preschool age’ (line 215, discussion).

      (2) Considering most research is done in occidental high-income countries, and this work is one of the few presenting research in another context, I think the authors should discuss in the manuscript that differences with previous studies might also be due to environmental and cultural differences. Since the study lacks the statistical power to perform a statistical analysis that directly establishes a link between developmental trajectories and restricted growth and cognitive flexibility, the authors cannot disentangle which differences are related to undernutrition and which might result from growing up in a different environment. I recommend that the authors avoid phrases like (lines 57-58): "We observed that early physical growth before the fifth month of life drove optimal developmental trajectories of FC..." or (lines 223-224) "...our cohort of Gambian infants exhibit atypical developmental trajectories of functional connectivity...".

      We thank the reviewer for this observation, and we agree with the reviewer that other factors differing between low- and poor-resource settings might have an impact on FC trajectories. We therefore specified this in the discussion as follows: “We acknowledge that differences in FC could also be attributed to other environmental and cultural disparities between high-resource and low-resource settings, and future studies are needed to explore this further” (line 238). We revised the whole manuscript to reflect similar statements.

      (3) To better interpret the results, it would be interesting to know if poor early growth predicts late cognitive flexibility in the tested sample and if the ΔWLZ distributions differ compared to a population in a high-income country where undernutrition is less frequent.

      We explored the relationship between changes in growth and cognitive flexibility in the two preschooler group, but there were no significant associations.

      Mean and SD values of WLZ are reported in Table 3. The values at every age are negative, indicating that the infants' weight-for-length is below the expected norm at all ages. To our knowledge, no other studies have assessed changes in growth in an infant sample with similar closely spaced age time points in high-income countries, making comparisons on growth changes challenging.

      (4) It is unclear why WLZ at birth and HCZ at 7-14 days are included in the models. I imagine this is to ensure that differences are not due to growing restrictions before birth. It would be nice if the authors could explain this.

      As the reviewer pointed out, HCZ at 7-14 days was included to ensure associations between growth and FC are not due to physical differences at birth. This case be considered as a 'baseline' measure for cerebral development, in the same way that WLZ at birth was used as a baseline for physical development. Therefore, we can more confidently  assume that the associations between growth and FC were specific to the impact of change in WLZ postnatally and not confounded by the size or maturity of the infant at birth. We specified this in the manuscript as follows: “These analyses were adjusted by WLZ at birth and HCZ at 7/14 days, to more confidently assume that the associations between growth and FC were specific to the impact of change in WLZ postnatally and not confounded by the size or maturity of the infant at birth” (line 520, statistical analysis section in the method section).

      (5) Right frontal-posterior connections at 24 months negatively correlate with ΔWLZ. Thus, restricted growth results in stronger frontal-posterior connections at 24 months. However, the same connections at 24 months positively correlate with cognitive flexibility (stronger connections predict better cognitive flexibility). Do the authors have any interpretation of this? How could this relate to previous findings of the authors (Bulgarelli et al. 2020), showing first an increase and then a decrease in functional connectivity between frontal and parietal regions?

      We acknowledge that interpreting the negative relationship between changes in growth and fronto-posterior FC at 24 months, alongside the positive association between the same connection and later cognitive flexibility, is challenging. We refrain from relating these findings to those published by Bulgarelli in 2020 due to differences in optode locations and because in that work the decrease in fronto-posterior FC was observed after 24 months (up to 36 months), whereas the endpoint in this study is right at 24 months.

      (6) With the growth of the head, the frontal channels move to more temporal areas, right? Could this determine the decrease in frontal inter-hemisphere connections?

      As shown in Nabwera (2017) head size does not increase that much in Gambian infants, or at least as expected by the WHO standard measures. We have added HCZ mean and SD values per age in Table 3.

      Minor points

      - HCZ is used in line 184 but not defined.

      We thank the reviewer for spotting this, we have now specified HCZ at line 184 as follows: ‘head-circumference z-score (HCZ)’.

      - Table SI2: NIRS not undertaken = the participant was assessed but did want or could not perform... I imagine there is a missing "not".

      We thank the reviewer for spotting this, we have now modified the legend of Table SI2 as follows: ‘the participant was assessed but did not want or could not perform the NIRS assessments.’

      - The authors should explain what weight-for-length is for those who are not familiar with it.

      We have added an explanation of weight-for-length in the experimental design section, line 339 as follows: ‘We then tested for relationships between brain FC at age 24 months with measures of early growth, as indexed by changes in weight-for-length z-scores (reflecting body weight in proportion to attained growth in length) at one month of age, and at each of the four subsequent visits (details provided below).’

      Reviewer #2 (Recommendations For The Authors):

      (1) I am confused about the authors' interpretation that left and right front-middle and right front-back FC increased with age. It appears in Figure 2 that the negative FC among these ROIs should actually decrease with age. This means that as individuals grow older, the FC values between these regions and zero diminished, albeit starting with negative FC (anticorrelation values) in younger age groups.

      Yes, the reviewer is correct. The negative values of the left and right front-middle and right front-back FC decreasing with age indicate that these regions go from being anti-correlated to becoming increasingly correlated. Hence, FC of these ROIs increased with age.

      (2) Are these negative values mentioned above at 24 months still negative? Have t-tests been run to examine the differences from zero?

      As suggested, we performed t-tests against zero for the mentioned FC at 24 months, and only the left and right fronto-middle FC are significantly different than zero (left fronto-middle FC: t(94) = 1.8, p = 0.036; right fronto-middle FC t(94) = 2.7, p = 0.003).

      (3) With so many correlation analyses, have multiple comparisons been consistently controlled for? While I assume this was done according to the Methods section, could the authors clarify whether FDR adjustment was applied to all the p-values at once or to a group of p-values each time? I found the following way of reporting FDR-adjusted p-values quite informative, such as PFDR, 24 pairs of ROIs < 0.05.

      We thank the reviewer for this insightful comment. P-values of regression analyses were FDR corrected per connection investigated, i.e. 21 possible ΔWLZ values per connection. We have specified this in the method section as follows: “To ensure statistical reliability, results from the regression analyses on each FC were corrected for multiple comparisons using false discovery rate (FDR)(Benjamini & Hochberg, 1995) per each connection investigated, i.e. 21 possible ΔWLZ values per each connection,” (page 12, Statistical Analyses section).

      (4) Can early growth trajectories predict changes in FC? Why not use ΔWLZ to predict ΔFC?

      Unfortunately, when considering ΔWLZ to predict ΔFC, the numbers of participants dropped significantly, with N=~30 infants, making it very challenging to interpret the results. We believe this emphasizes the importance of recruiting large samples when conducting longitudinal studies involving infants and employing multiple measures.

      (5) I might have missed the rationale, but why weren't the growth changes after 5 months studied?

      ΔWLZ between all time points were assessed as predictors of FC at 24 months. We have specified this at line 183 as follows: ‘we used multiple regression with the infant growth trajectory (delta weight for length z-score between all time points, DWLZ) and FC at 24 months’. As indicated in Table 2 and 3 the associations between ΔWLZ at all time points and FC at 24 months were tested, but only those with DWLZ calculated between birth and 1 month and the subsequent time points were significant. DWLZ between 5 months and the subsequent time points, DWLZ between 8 months and the subsequent time points, DWLZ between 12 months and the subsequent time points, DWLZ between 18 months and the subsequent time points did not significantly predict FC at 24 months. These are highlighted in Table 2 and Figure 3 in blue and marked as NS (non-significant).

      (6) Once more, the advantage of longitudinal data is that it allows us to tap into developmental changes. Analyzing and predicting cognitive development based solely on FC values at a single age stage (i.e., 24 months) would overlook the benefits of a longitudinal design, which is regrettable. I suggest that the authors attempt to use ΔFC for predictions and observe the outcomes.

      As mentioned to point (4) raised by the reviewer, unfortunately, when considering ΔWLZ to predict ΔFC, the numbers of participants dropped significantly, with N=~30 infants, making it very challenging to interpret the results. We believe this emphasizes the importance of recruiting large samples when conducting longitudinal studies involving infants and employing various measures.

      (7) In the section "Early FC predicts cognitive flexibility at preschool age", the authors pointed out that "...,none of these survived FDR correction for multiple comparisons." However, the paper discussed the association between FC at 24 months of age and cognitive flexibility, as it was supported by the statistical analysis in the following sections. If FDR correction cannot be satisfied, I would rephrase the implication/conclusion of the results to suggest that early FC does not predict cognitive flexibility at preschool age.

      We acknowledge the reviewer’s concern regarding the reporting of results that do not survive multiple comparisons. However, considering the uniqueness of our dataset and the novelty of our work, we believe it is crucial to report all significant findings, even those not passing multiple comparisons corrections, as they may motivate hypothesis-generation for future studies. We have taken great care to transparently distinguish between results that survived multiple comparisons and those that did not in both the Results and Discussion sections, ensuring that readers are not misled. It is possible that future studies may replicate and further support these associations. Therefore, by sharing these results with the research community, we provide valuable insights for future investigations.

      Following the reviewer’ suggestion, we specified that results from regression analysis are significant but they did not survive multiple comparisons in the discussion as follows: ‘While our results are consistent with previous studies, we acknowledge that the significant association between early FC and later cognitive flexibility does not withstand multiple comparisons. Therefore, we encourage future studies that may replicate these findings with a larger sample. (line 290, discussion section).

      (8) Have the authors assessed the impact of growth trajectories on cognitive flexibility?

      We explored the relationship between changes in growth and cognitive flexibility in the two preschooler groups, but there were no significant associations.

      (9) Are there no other cognitive or behavioural measures available? Cognitive flexibility is just one domain of cognitive development, and would the impact of undernutrition on cognitive development be domain-specific? There is a lack of theoretical support here. Why choose cognitive flexibility, and should the impact of undernutrition be domain-specific or domain-general?

      We agree with the reviewer that in this work, we chose to focus on one specific cognitive outcome. While this does not imply that the impact of undernutrition is domain-specific, cognitive flexibility, being a core executive function, has been extensively studied in terms of its neural underpinnings using other neuroimaging modalities, especially fMRI (for example see Dajani, 2015; Uddin, 2021).

      Moreover, other studies looking at the effect of adversity on cognitive outcomes focus on specific cognitive skills, such as working memory (Roberts, 2017), reading and arithmetic skills (Soni, 2021).

      We did assess infants also with Mullen Scales of Early Learning (MSEL), although the cognitive flexibility task within the Early Years Toolbox has been specifically designed for preschoolers (Howard, 2015), and this set of tasks has recently been validated in our team in The Gambia (Milosavljevic, 2023).Future works from the BRIGHT team will investigate performance at the MSEL in relation to other variable of the project.

      References:

      D. R. Dajani, L. Q. Uddin, Demystifying cognitive flexibility: Implications for clinical and developmental neuroscience. Trends Neurosci. 38, 571–578 (2015).

      L. Q. Uddin, Cognitive and behavioural flexibility: neural mechanisms and clinical considerations. Nat. Rev. Neurosci. 22, 167–179 (2021).

      Roberts, S. B., Franceschini, M. A., Krauss, A., Lin, P. Y., de Sa, A. B., Có, R., ... & Muentener, P. (2017). A pilot randomized controlled trial of a new supplementary food designed to enhance cognitive performance during prevention and treatment of malnutrition in childhood. Current developments in nutrition, 1(11), e000885.

      Soni, A., Fahey, N., Bhutta, Z. A., Li, W., Frazier, J. A., Moore Simas, T., ... & Allison, J. J. (2021). Early childhood undernutrition, preadolescent physical growth, and cognitive achievement in India: A population-based cohort study. PLoS Medicine, 18(10), e1003838.

      Howard, S. J., & Melhuish, E. (2015). An Early Years Toolbox (EYT) for assessing early executive function, language, self-regulation, and social development: Validity, reliability, and preliminary norms. Journal of Psychoeducational Assessment, 35(3), 255-275.

      Milosavljevic, B., Cook, C. J., Fadera, T., Ghillia, G., Howard, S. J., Makaula, H., ... & Lloyd‐Fox, S. (2023). Executive functioning skills and their environmental predictors among pre‐school aged children in South Africa and The Gambia. Developmental Science, e13407.

      (10) I would review more previous fNIRS studies on infants if they exist (e.g., the work by S Lloyd-Fox, L Emberson, and others). These studies can help identify brain ROIs likely linked to undernutrition and cognitive flexibility. The current analysis methods lean towards exploratory research. This makes the paper more of a proof-of-concept report rather than a strongly theoretically-driven study.

      We thank the reviewer for this important point. While we have reviewed existing fNIRS infant studies, there are no extant works that showed whether specific brain regions are related undernutrition. However, several fMRI studies assessed regions that do support cognitive flexibility, and we mentioned these in the manuscript (for example see Dajani, 2015; Uddin, 2021).

      Other than the BRIGHT project, we are aware of two other projects that assessed the effect of undernutrition on brain development, assessing cognitive outcomes in poor-resource settings:

      - the BEAN project in Bangladesh in which fNIRS data were recorded from the bilateral temporal cortex (i.e. Pirazzoli, 2022);

      - a project in India in which fNIRS data were recorded from frontal, temporal and parietal cortex bilaterally (i.e. Delgado Reyes, 2020)

      The brain regions recorded in these studies largely overlap with the brain regions we recorded from in this study.

      Another aspect to consider is that infants underwent several fNIRS tasks as part of the BRIGHT project, focusing on social processing, deferred imitation, and habituation responses. Therefore, brain regions for data acquisition were chosen to maximize the likelihood of recording meaningful data for all tasks (Lloyd-Fox, 2023). To clarify the text, we specified this information in the methods section (line 383).

      References:

      D. R. Dajani, L. Q. Uddin, Demystifying cognitive flexibility: Implications for clinical and developmental neuroscience. Trends Neurosci. 38, 571–578 (2015).

      Pirazzoli, L., Sullivan, E., Xie, W., Richards, J. E., Bulgarelli, C., Lloyd-Fox, S., ... & Nelson III, C. A. (2022). Association of psychosocial adversity and social information processing in children raised in a low-resource setting: an fNIRS study. Developmental Cognitive Neuroscience, 56, 101125.

      Delgado Reyes, L., Wijeakumar, S., Magnotta, V. A., Forbes, S. H., & Spencer, J. P. (2020). The functional brain networks that underlie visual working memory in the first two years of life. NeuroImage, 219, Article 116971.

      Lloyd-Fox, S., McCann, S., Milosavljevic, B., Katus, L., Blasi, A., Bulgarelli, C., Crespo-Llado, M., Ghillia, G., Fadera, T., Mbye, E., Mason, L., Njai, F., Njie, O., Perapoch-Amado, M., Rozhko, M., Sosseh, F., Saidykhan, M., Touray, E., Moore, S. E., … Team, and the B. S. (2023). The Brain Imaging for Global Health (BRIGHT) Study: Cohort Study Protocol. Gates Open Research, 7(126).

      (11) Last but not least, in the paper, the authors mentioned that fNIRS offers better spatial resolution and anatomical specificity compared to EEG, thereby providing more precise and reliable localization of brain networks. While I partially agree with this perspective, it remains to be explored whether the current fNIRS analysis strategies indeed yield higher spatial resolution. It is hoped that the authors will delve deeper into this discussion in the paper.

      The brain regions of focus were selected based on coregistration work previously conducted at each time point on the array used in this project (Collins-Jones, 2019). We deliberately avoided making claims about small brain regions, considering that head size might increase slightly less with age in The Gambia compared to Western countries (Nabwera, 2017) . However, we maintain that the conclusions drawn in this study offer higher brain-region specificity than could have been  identified with current common EEG methods alone.

      References:

      L. H. Collins-Jones, et al., Longitudinal infant fNIRS channel-space analyses are robust to variability parameters at the group-level: An image reconstruction investigation. Neuroimage 237, 118068 (2021).

      Nabwera, H. M., Fulford, A. J., Moore, S. E., & Prentice, A. M. (2017). Growth faltering in rural Gambian children after four decades of interventions: a retrospective cohort study. The Lancet Global Health, 5(2), e208–e216.

      Reviewer #3 (Recommendations For The Authors):

      Introduction

      - Among important developmental mechanisms to mention are the development of exuberant connections and the further selection/stabilization of the relevant ones according to environmental stimulation, vs the pruning of others.

      We agree with the reviewer that the development of exuberant connections and subsequent pruning is a universal process of paramount importance during the first years of life. However, after revising our introduction, given the word limit of the journal, we maintained focus on neurodevelopment and early adversity.

      Results

      - Adding a few more information on the 6 sections and 21 connections would be welcome. In particular for within-section FC: how was this computed?

      The 6 sections were created based on the co-registration of the array used in this study at each age in a previous published work L. H. Collins-Jones, et al., Longitudinal infant fNIRS channel-space analyses are robust to variability parameters at the group-level: An image reconstruction investigation. Neuroimage 237, 118068 (2021). This is reference No. 68 in the manuscript.

      As we mentioned in the section fNIRS preprocessing and data-analysis: ‘The sections were established via the 17 channels of each hemisphere which were grouped into front, middle and back (for a total of six regions) based on a previous co-registration of the BRIGHT fNIRS arrays onto age-appropriate templates’.

      The 21 connections were defined as all the possible links between the 6 regions, specifically: the interhemispheric homotopic connections (in orange in Figure SI1), which connect the same regions between hemispheres (i.e., front left with front right); the intrahemispheric connections (in green in Figure SI1), which correlate channels belonging to the same region; the fronto-posterior connections (in blue in Figure SI1), which link front and middle, middle and back, and front and back regions of the same hemisphere; and the crossing interhemispheric connections (non-homotopic interhemispheric, in yellow in Figure SI1), which link the front, middle, and back areas between left and right hemispheres. We added these specifications also in the legend of Figure SI1 for clarity.

      - The denomination intrahemispheric vs fronto-posterior vs crossed connections is not clear. Maybe prefer intra-hemispheric vs inter-hemispheric homotopic vs inter-hemispheric non-homotopic (also in Figure SI1).

      We appreciate the reviewer's suggestion regarding terminology. However, we believe that the term 'inter-hemispheric non-homotopic' could potentially refer to both connections within the same brain hemisphere from front to back and connections crossing between hemispheres, leading to increased confusion. Therefore, we have chosen not to include the term 'non-homotopic' and instead added 'homotopic' to 'interhemispheric' throughout the manuscript to emphasize that these functional connections occur between corresponding regions of the two hemispheres.

      - with time -> with age.

      We replaced “with time” with “with age” as suggested through the manuscript.

      - The description of both HbO2 and HHb results overloads the main text: would it be relevant to present one of the two in Supplementary Information if the results are coherent?

      We understand the reviewer’s concern regarding overloading the results section with reporting both chromophores. However, reporting results for both HbO and HHb is considered a crucial step for publications in the fNIRS field, as emphasized in recent formal guidance (Yücel et al., 2020). One of the strengths of fNIRS compared to fMRI is its ability to record from both chromophores, enabling a more precise characterization of brain activations and oscillations. Moreover, in FC studies like this one, ensuring that HbO and HHb results overlap is an important check that increases confidence in interpreting the findings.

      References:

      Yücel, M. A., von Lühmann, A., Scholkmann, F., Gervain, J., Dan, I., Ayaz, H., Boas, D., Cooper, R. J., Culver, J., Elwell, C. E., Eggebrecht, A. ., Franceschini, M. A., Grova, C., Homae, F., Lesage, F., Obrig, H., Tachtsidis, I., Tak, S., Tong, Y., … Wolf, M. (2020). Best Practices for fNIRS publications. Neurophotonics, 1–34. https://doi.org/10.1117/1.NPh.8.1.012101

      - HCZ is not defined when first used.

      We thank the reviewer for spotting this, we have now specified HCZ at line 184 as follows: ‘head-circumference z-score (HCZ)’.

      - Choosing the analyzed measures to "maximize power" could be criticised.

      We appreciate the reviewer’s concern. However, correlating all the FC values with all changes in growth would have raised an important issue for multiple comparisons. We therefore we made a priori decision to focus on investigating the relationship between changes in growth and those FC that showed a significant change with age, considering these as the most interesting ones from a developmental perspective in our sample.

      Discussion

      - I would recommend using the same order to synthesize results and further discuss them.

      We agree with the reviewer that the suggested structure is optimal for a clear discussion section. We have indeed followed it, with each paragraph covering specific aspects:

      - Recap of the study aims

      - Results summary and discussion of developmental changes

      - Results summary and discussion of the relationship between changes in growth and FC

      - Results summary and discussion of the relationship between FC and cognitive flexibility

      - Limitations

      - Conclusion

      Given the numerous results presented in this paper, we believe that readers will better digest them by first reading a summary of the results followed by their interpretations, rather than condensing all the interpretations together.

      - Highlighting how "atypical" developmental trajectories are in Gambian infants would be welcome in the Results section. Other interpretations can be found than "The observed decrease in frontal inter-hemispheric FC with increasing age may be due to the exposure to early life undernutrition adversity".

      We agree with the reviewer that other factors that differ between low- and high-resource settings might have an impact on FC trajectories. We therefore specified this in the discussion as follows: “We acknowledge that differences in FC could also be attributed to other environmental and cultural disparities between high-resource and low-resource settings, and future studies are needed to further investigate cultural, environmental, and genetic effects on brain FC” (line 238).

      - Focusing on FC at 24m for the relationship with growth is questionable.

      Correlating the FC values at 5 time points with all changes in growth would have raised an important issue for multiple comparisons. We therefore we made a decision a priori to focus on investigating the relationship between changes in growth and FC at 24 months as our final time point of data collection. We added this information in the methods section as follows: “To investigate the impact of undernutrition on FC development, we used DWLZ as independent variables in regression analyses on HbO2 (as the chromophore with the highest signal-to-noise ratio) FC at 24 months, our final time point of data collection” (line 517, method section).

      - There is too much emphasis on the correlation between FC and cognitive flexibility, whereas results are not significant after correction for multiple comparisons.

      Following the reviewer’ suggestion, we specified that results from regression analysis are significant but they did not survive multiple comparisons in the discussion as follows: While our results are consistent with previous studies, we acknowledge that the significant association between early FC and later cognitive flexibility does not withstand multiple comparisons. Therefore, we encourage future studies that may replicate these findings with a larger sample. (line 290, discussion section).

      Methods

      - I would recommend detailing how z-scores were computed in the paragraph "Anthropometric measures".

      We specified how z-scores were computed in the statistical analysis section as follows: “Anthropometric measures were converted to age and sex adjusted z‐scores that are based on World Health Organization Child Growth Standards (93). Weight‐for‐Length (WLZ) and Head Circumference (HCZ) z-scores were computed” (line 509, method section). As transforming data is the first step of statistical analysis and is not directly related to data collection, we believe it is more appropriate to retain this description in the statistical analysis section.

      - FC computation: the mention of "correlating the first and the last 250s" is not clear.

      We specified this more clearly in the text as follows: We found that correlating the first and the last 250 seconds of valid data after pre-processing provided the highest percentage of infants with strong correlation between the first and the last portion of data (line 467).

      - The manuscript mentions "age 3 years" for the younger preschoolers but ~48months rather corresponds to 4 years.

      We revised the entire manuscript and the supplementary materials, but we could not find any instance in which preschoolers are referred with age in months rather than in years.

      - Specify the number of children evaluated at 4 and 5 years. Is the test of cognitive flexibility normalized for age? If not, how were the 2 groups considered in the analyses? (age as a confounding factor).

      We have added the number of children in the two preschooler groups as follows: younger preschoolers (age mean ± SD=47.96 ± 2.77 months, N=77) and older preschoolers (age mean ± SD=57.58 ± 2.11 months, N=84). (line 484).

      The cognitive flexibility test was not normalized for age, as this task was specifically developed for preschoolers (Howard, 2015). As mentioned in ‘Cognitive flexibility at preschool age’ of the methods section, “data were collected in two ranges of preschool ages”, which guided our decision to perform regression analysis on the impact of FC on cognitive flexibility separately within these two age groups, rather than treating them as a single group of preschoolers.

      References:

      Howard, S. J., & Melhuish, E. (2015). An Early Years Toolbox (EYT) for assessing early executive function, language, self-regulation, and social development: Validity, reliability, and preliminary norms. Journal of Psychoeducational Assessment, 35(3), 255-275.

      Figures and Tables

      - Table 1 could highlight the significant results. It is not clear what the "baseline" results correspond to.

      We have marked in bold the results that are statistically significant in Table 1. In the linear mixed model we performed, the first time point (i.e. 5 months) is chosen as ‘baseline’, i.e. the reference against which the other time points are compared to, and its statistical values refer to its significance against 0 (as it has been performed in Bulgarelli 2020).

      - Figures 2 B and C seem redundant? What is SE vs SD?

      We believe that both figures 2B and 2C are useful for the readers. While the first one shows the mean FC values at the group level, the second one highlights the individual variability of FC values (typical of infant neuroimaging data), which also why it is interesting to relate these measures to other variables of our dataset (i.e. growth and cognitive flexibility). Figure 2C also reports mean FC values per age, but these might be less visible considering that also one dot per infant is also plotted.

      SE stands for standard error, and in the legend of the figure we specified this as follows: ‘Mean and standard error of the mean (SE)’. SD stands for standard deviation, and we have now specified this as follows: ‘mean ± standard deviation (SD)’ .

      - Table 2: I would recommend removing results that don't survive corrections for multiple comparisons.

      We acknowledge the reviewer’s concern regarding the reporting of results that do not survive multiple comparisons. However, considering the uniqueness of our dataset and the novelty of our work, we believe it is crucial to report all significant findings. We have taken great care to transparently distinguish between results that survived multiple comparisons and those that did not in both the Results and Discussion sections, ensuring that readers are not misled. It is possible that future studies may replicate and further strengthen these associations. Therefore, by sharing these results with the research community, we provide valuable insights for future investigations.

      - Figure 3: the top is redundant with Table 2: to be merged? B: the statistical results might be shown in a Table.

      We agree with the reviewer that the top part of Figure 3 and Table 2 report the same results. However, given the richness of these findings, we believe that the top part of Figure 3 serves as a useful summary for readers. Additionally, examining both the top and bottom parts of Figure 3 provides a comprehensive overview of the regression analysis conducted in this study.

      - Figure SI6: Is it really a % in x-axis?

      We thank the reviewer for spotting this typo, the percentage is relevant for the y-axis only. We removed the % symbol from ticks of the x-axis.

      - Table SI1: the presented p-values don't seem to survive Bonferroni correction, contrary to what is written.

      We thank the reviewer for spotting this mistake, we removed the reference to the Bonferroni correction for the p-values.

      - Table SI2: For the proportion of children included in the analysis, maybe be precise that the proportion was computed based on the ones with acquired data. Maybe also add the proportion according to all children, to better show the high drop-out rate at certain ages?

      We thank the reviewer for these useful suggestions. We have specified in the legend of the table how we calculated the proportion of infants included as follows: ‘The proportion of children included in the analysis was computed based on the infants with FC data’. We have also added a column in the table called ‘Inclusion rate (from the 204 infants recruited)’, following the reviewer’s suggestion. This will be a useful reference for future studies.

      - A few typos should be corrected throughout the manuscript.

      We thoroughly revised the main manuscript and the supplementary materials for typos.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      Building on previous in vitro synaptic circuit work (Yamawaki et al., eLife 10, 2021), Piña Novo et al. utilize an in vivo optogenetic-electrophysiological approach to characterize sensory-evoked spiking activity in the mouse's forelimb primary somatosensory (S1) and motor (M1) areas. Using a combination of a novel "phototactile" somatosensory stimuli to the mouse's hand and simultaneous high-density linear array recordings in both S1 and M1, the authors report in awake mice that evoked cortical responses follow a triphasic peak-suppression-rebound pattern response. They also find that M1 responses are delayed and attenuated relative to S1. Further analysis revealed a 20-fold difference in subcortical versus corticocortical propagation speeds.

      They also report that PV interneurons in S1 are strongly recruited by hand stimulation. Furthermore, they report that selective activation of PV cells can produce a suppression and rebound response similar to "phototactile" stimuli. Lastly, the authors demonstrate that silencing S1 through local PV cell activation reduces M1 response to hand stimulation, suggesting S1 may directly drive M1 responses.

      Strengths:

      The study was technically well done, with convincing results. The data presented are appropriately analyzed. The author's findings build on a growing body of both in vitro and in vivo work examining the synaptic circuits underlying the interactions between S1 and M1. The paper is well-written and illustrated. Overall, the study will be useful to those interested in forelimb S1-M1 interactions.

      Weaknesses:

      Although the results are clear and convincing, one weakness is that many results are consistent with previous studies in other sensorimotor systems, and thus not all that surprising. For example, the findings that sensory stimulation results in delayed and attenuated responses in M1 relative to S1 and that PV inhibitory cells in S1 are strongly recruited by sensory stimulation are not novel (e.g., Bruno et al., J Neurosci 22, 10966-10975, 2002; Swadlow, Philos Trans R Soc Lond B Biol Sci 357, 1717-1727, 2002; Gabernet et al., Neuron 48, 315-327, 2005; Cruikshank et al., Nat Neurosci 10, 462-468, 2007; Ferezou et al., Neuron 56, 907-923, 2007; Sreenivasan et al., Neuron 92, 1368-1382, 2016; Yu et al., Neuron 104, 412-427 e414, 2019). Furthermore, the observation that sensory processing in M1 depends upon activity in S1 is also not novel (e.g., Ferezou et al., Neuron 56, 907-923, 2007; Sreenivasan et al., Neuron 92, 1368-1382, 2016). The authors do a good job highlighting how their results are consistent with these previous studies.

      We thank the reviewer for the close reading of the manuscript and the many constructive comments and critiques. As the reviewer notes, there have been many prior studies of related circuits in other sensorimotor systems forming an important context for our study and findings, as we have tried to highlight. We appreciate the suggestions for additional relevant articles to cite.

      Perhaps a more significant weakness, in my opinion, was the missing analyses given the rich dataset collected. For example, why lump all responsive units and not break them down based on their depth? Given superficial and deep layers respond at different latencies and have different response magnitudes and durations to sensory stimuli (e.g., L2/3 is much more sparse) (e.g., Constantinople et al., Science 340, 1591-1594, 2013; Manita et al., Neuron 86, 1304-1316, 2015; Petersen, Nat Rev Neurosci 20, 533-546, 2019; Yu et al., Neuron 104, 412-427 e414, 2019), their conclusions could be biased toward more active layers (e.g., L4 and L5). These additional analyses could reveal interesting similarities or important differences, increasing the manuscript's impact. Given the authors use high-density linear arrays, they should have this data.

      We have analyzed the activity patterns as a function of cortical depth, and now include these results in the manuscript as suggested. The key new finding is that the M1 responses are strongest in upper layers, consistent with expectations based on the excitatory corticocortical synaptic connectivity characterized previously. Changes to the manuscript include new figures (Figure 5; Figure 5 - figure supplement 1), which we explain (Methods: page 14, lines 618-621), describe (new Results section: pages 4-5, lines 183-189), comment on (Discussion: page 9, lines 378-391), and summarize the significance of (Abstract: page 1, lines 22-24). In addition, we incorporated the new laminar analysis into a summary schematic (Figure 9). We thank the reviewer for suggesting this analysis.

      Similarly, why not isolate and compare PV versus non-PV units in M1? They did the photostimulation experiments and presumably have the data. Recent in vitro work suggests PV neurons in the upper layers (L2/3) of M1 are strongly recruited by S1 (e.g., Okoro et al., J Neurosci 42, 8095-8112, 2022; Martinetti et al., Cerebral cortex 32, 1932-1949, 2022). Does the author's data support these in vitro observations?

      These experiments were relatively complex and M1 optotagging was not routinely included in the stimulus and acquisition protocol. Therefore, we don’t have sufficient data for this analysis. We plan to address this in future studies.

      It would have also been interesting to suppress M1 while stimulating the hand to determine if any part of the S1 triphasic response depends on M1 feedback.

      We agree that this is of interest but consider this to be outside the scope of the current study.

      I appreciate the control experiment showing that optical hand stimulation did not evoke forelimb movement. However, this appears to be an N=1. How consistent was this result across animals, and how was this monitored in those animals? Can the authors say anything about digit movement?

      We have performed additional experiments to address this point. A constraint with EMG is that it is limited to the muscle(s) one chooses to record from, and it is difficult to implant tiny muscles of the hand. Therefore, for this analysis, we used kilohertz videography as a high-sensitivity method for movement surveillance across the hand. Hand stimulation did not evoke any detectable movements. Changes in the manuscript include: revised Figure 1 - figure supplement 1; supplementary Figure 1 - video 1; and associated text edits in the Methods (page 13, line 557; page 14, lines 626-639) and Results sections (page 2, lines 84-85).

      A light intensity of 5 mW was used to stimulate the hand, but it is unclear how or why the authors chose this intensity. Did S1 and M1 responses (e.g., amplitude and latency) change with lower or higher intensities? Was the triphasic response dependent on the intensity of the "phototactile" stimuli?

      As we now say in the Methods > Optogenetic photostimulation of the hand section (page 13, lines 562-565), “This intensity was chosen based on pilot experiments in which we varied the LED power, which showed that this intensity was reliably above the threshold for evoking robust responses in both S1 and M1 without evoking any visually detectable movements (as subsequently confirmed by videography)”.

      Reviewer #2 (Public review):

      Summary:

      Communication between sensory and motor cortices is likely to be important for many aspects of behavior, and in this study, the authors carefully analyse neuronal spiking activity in S1 and M1 evoked by peripheral paw stimulation finding clear evidence for sensory responses in both cortical regions

      Strengths:

      The experiments and data analyses appear to have been carefully carried out and clearly represented.

      Weaknesses:

      (1) Some studies have found evidence for excitatory projection neurons expressing PV and in particular some excitatory pyramidal cells can be labelled in PV-Cre mice. The authors might want to check if this is the case in their study, and if so, whether that might impact any conclusions.

      Thank you for pointing this out. The prior studies suggest it is mainly a subset of layer 5B excitatory neurons that may express PV. We checked this in two ways. Anatomically, we did not find double-labeling. An electrophysiology assay showed that, although some evoked excitatory synaptic input could be detected in some neurons, these inputs were very weak. Results from these assays are shown in new Figure 6 - figure supplement 1, with associated text edits in the Methods (page 11, lines 469-471; page 15, lines 657-668) and Results (page 5, lines 198-199) sections.

      (2) I think the analysis shown in Figure S1 apparently reporting the absence of movements evoked by the forepaw stimulation could be strengthened. It is unclear what is shown in the various panels. I would imagine that an average of many stimulus repetitions would be needed to indicate whether there is an evoked movement or not. This could also be state-dependent and perhaps more likely to happen early in a recording session. Videography could also be helpful.

      As noted above, we have performed additional experiments to address this.

      (3) Some similar aspects of the evoked responses, including triphasic dynamics, have been reported in whisker S1 and M1, and the authors might want to cite Sreenivasan et al., 2016.

      Thank you for pointing this out; we now cite this article (page 1, line 46; page 10, line 415).

      Reviewer #3 (Public review):

      Summary:

      This is a solid study of stimulus-evoked neural activity dynamics in the feedforward pathway from mouse hand/forelimb mechanoreceptor afferents to S1 and M1 cortex. The conclusions are generally well supported, and match expectations from previous studies of hand/forelimb circuits by this same group (Yamawaki et al., 2021), from the well-studied whisker tactile pathway to whisker S1 and M1, and from the corresponding pathway in primates. The study uses the novel approach of optogenetic stimulation of PV afferents in the periphery, which provides an impulselike volley of peripheral spikes, which is useful for studying feedforward circuit dynamics. These are primarily proprioceptors, so results could differ for specific mechanoreceptor populations, but this is a reasonable tool to probe basic circuit activation. Mice are awake but not engaged in a somatosensory task, which is sufficient for the study goals.

      The main results are:

      (1) brief peripheral activation drives brief sensory-evoked responses at ~ 15 ms latency in S1 and ~25 ms latency in M1, which is consistent with classical fast propagation on the subcortical pathway to S1, followed by slow propagation on the polysynaptic, non-myelinated pathway from S1 to M1;

      (2) each peripheral impulse evokes a triphasic activation-suppression-rebound response in both S1 and M1;

      (3) PV interneurons carry the major component of spike modulation for each of these phases; (4) activation of PV neurons in each area (M1 or S1) drives suppression and rebound both in the local area and in the other downstream area;

      (5) peripheral-evoked neural activity in M1 is at least partially dependent on transmission through S1.

      All conclusions are well-supported and reasonably interpreted. There are no major new findings that were not expected from standard models of somatosensory pathways or from prior work in the whisker system.

      Strengths:

      This is a well-conducted and analyzed study in which the findings are clearly presented. This will provide important baseline knowledge from which studies of more complex sensorimotor processing can build.

      Weaknesses:

      A few minor issues should be addressed to improve clarity of presentation and interpretation:

      (1) It is critical for interpretation that the stimulus does not evoke a motor response, which could induce reafference-based activity that could drive, or mask, some of the triphasic response. Figure S1 shows that no motor response is evoked for one example session, but this would be stronger if results were analyzed over several mice.

      As noted above, we have performed additional experiments to address this point.

      (2) The recordings combine single and multi-units, which is fine for measures of response modulation, but not for absolute evoked firing rate, which is only interpretable for single units. For example, evoked firing rate in S1 could be higher than M1, if spike sorting were more difficult in S1, resulting in a higher fraction of multi-units relative to M1. Because of this, if reporting of absolute firing rates is an essential component of the paper, Figs 3D and 4E should be recalculated just for single units.

      Thank you for noting this. Although the absolute firing rates are not essential for the main findings or conclusions (which as noted focus on response modulations and relative differences) we agree that analyzing the single-unit response amplitudes is useful. Therefore, changes in the manuscript now include: revised Figure 3, and associated text edits in the Methods (page 12, lines 543-545), Results (page 3, lines 115-119), and Discussion (page 7, lines 305-311) sections.

      (3) In Figure 5B, the average light-evoked firing rate of PV neurons seems to come up before time 0, unlike the single-trial rasters above it. Presumably, this reflects binning for firing rate calculation. This should be corrected to avoid confusion.

      Yes, this reflects the binning. We agree that this is potentially confusing and have removed these average plots below the raster plots, as the rasters alone suffice to demonstrate the result (i.e., that PV units are strongly activated and thus tagged by optogenetic stimulation). Changes are now reflected in revised Figure 6.

      (4) In Figure 6A bottom, please clarify what legends "W. suppression" and "W. rebound" mean.

      In the figure plot legends, the “W.” has been removed. Changes are now reflected in revised Figure 7 and Figure 7 – figure supplement 1.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Did you filter the neural signals during acquisition? If so, please include these details in the results.

      Signals were bandpass-filtered (2.5 Hz to 7.6 KHz) at the hardware level at acquisition (with no additional software filtering applied), as now clarified in the Methods Electrophysiological recordings section as requested (page 12, lines: 525-526).

      Reviewer #2 (Recommendations for the authors):

      (1) Some studies have found evidence for excitatory projection neurons expressing PV and in particular some excitatory pyramidal cells can be labelled in PV-Cre mice. The authors might want to check if this is the case in their study, and if so, whether that might impact any conclusions.

      Please see above for our response to this issue.

      (2) I think the analysis shown in Figure S1 apparently reporting the absence of movements evoked by the forepaw stimulation could be strengthened. It is unclear what is shown in the various panels. I would imagine that an average of many stimulus repetitions would be needed to indicate whether there is an evoked movement or not. This could also be state-dependent and perhaps more likely to happen early in a recording session. Videography could also be helpful.

      Please see above for our response to this issue.

      (3) Some similar aspects of the evoked responses, including triphasic dynamics, have been reported in whisker S1 and M1, and the authors might want to cite Sreenivasan et al., 2016.

      As noted above, we now cite this study.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors discovered MYL3 of marine medaka (Oryzias melastigma) as a novel NNV entry receptor, elucidating its facilitation of RGNNV entry into host cells through macropinocytosis, mediated by the IGF1R-Rac1/Cdc42 pathway.

      Strengths:

      In this manuscript, the authors have performed in vitro and in vivo experiments to prove that MnMYL3 may serve as a receptor for NNV via macropinocytosis pathway. These experiments with different methods include Co-IP, RNAi, pulldown, SPR, flow cytometry, immunofluorescence assays, and so on. In general, the results are clearly presented in the manuscript.

      Weaknesses:

      For the writing in the introduction and discussion sections, the author Yao et al mainly focus on the viral pathogens and fish in Aquaculture, the meaning and novelty of results provided in this manuscript are limited, and not broad in biology. The authors should improve the likely impact of their work on the viral infection field, maybe also in the evolutionary field with the fish model.

      (1) Myosin is a big family, why did authors choose MYL3 as a candidate receptor for NNV?

      We appreciate your insightful question. We selected MYL3 as a candidate receptor based on a combination of proteomic screening and literature evidence, and functional validation. Increasing evidence indicated that myosins have been implicated in viral infections. For instance, myosin heavy chain 9 plays a role in multiple viral infections (Li et al., 2018), and non-muscle myosin heavy chain IIA has been identified as an entry receptor for herpes simplex virus-1 (Arii et al., 2010). Furthermore, myosin II light chain activation is essential for influenza A virus entry via macropinocytosis (Banerjee et al., 2014). Our previous studies hinted at a potential interaction between MYL3 and CP (Zhang et al., 2020). Huang et al also reported that Epinephelus coioides MYL3 might interact with native NNV CP by proteomic analysis of immunoprecipitation (IP) assay (Huang et al., 2020). Our Co-IP and SPR analyses confirmed a direct interaction between MYL3 and the RGNNV CP. Based on these studies, we selected MYL3 as a candidate receptor for NNV.

      References

      Huang PY, Hsiao HC, Wang SW, Lo SF, Lu MW, Chen LL. 2020. Screening for the Proteins That Can Interact with Grouper Nervous Necrosis Virus Capsid Protein. Viruses 12:1–20.

      Li L, Xue B, Sun W, Gu G, Hou G, Zhang L, Wu C, Zhao Q, Zhang Y, Zhang G, Hiscox JA, Nan Y, Zhou EM. 2018. Recombinant MYH9 protein C-terminal domain blocks porcine reproductive and respiratory syndrome virus internalization by direct interaction with viral glycoprotein 5. Antiviral Res 156:10–20.

      Arii J, Goto H, Suenaga T, Oyama M, Kozuka-Hata H, Imai T, Minowa A, Akashi H, Arase H, Kawaoka Y, Kawaguchi Y. 2010. Non-muscle myosin IIA is a functional entry receptor for herpes simplex virus-1.

      Banerjee I, Miyake Y, Philip Nobs S, Schneider C, Horvath P, Kopf M, Matthias P, Helenius A, Yamauchi Y. 2014. Influenza A virus uses the aggresome processing machinery for host cell entry. Science (80- ) 346:473–477.

      (2) What is the relationship between MmMYL3 and MmHSP90ab1 and other known NNV receptors? Why does NNV have so many receptors? Which one is supposed to serve as the key entry receptor?

      We acknowledge the functional diversity of receptors for NNV. MmHSP90ab1 and MmHSC70 have been identified as receptors involved in NNV entry through clathrin-mediated endocytosis (CME), whereas MYL3 facilitates entry via macropinocytosis. These pathways serve as complementary mechanisms for the virus to enter host cells, potentially enhancing infection efficiency. While HSP90ab1 facilitates CME, MYL3 promotes macropinocytosis, both of which are critical for viral internalization, but through distinct endocytic mechanisms.

      NNV likely utilizes multiple receptors to increase its host range and infection efficiency. The diversity of receptors ensures that the virus can infect a wide variety of host species. By employing HSP90ab1, HSC70, and MYL3, NNV can exploit different cellular pathways for entry, making it more adaptable to various host environments.

      Regarding the identification of a key entry receptor, we agree this is a critical unresolved question. While HSP90ab1/HSC70 appear essential for CME-mediated entry, our data suggest MYL3 plays a distinct role in macropinocytic uptake. To systematically evaluate receptor hierarchy, we initially proposed comparative knockout studies targeting these candidate genes. However, we must acknowledge that current technical limitations in marine fish models – particularly the extended generation time for stable knockout cell lines and challenges in maintaining viable cell cultures post-editing – have delayed these experiments. Nevertheless, we are actively exploring strategies to overcome these obstacles and will continue to refine our approach to address these questions in future research.

      (3) In vivo knockout of MYL3 using CRISPR-Cas9 should be conducted to verify whether the absence of MYL3 really inhibits NNV infection. Although it might be difficult to do it in marine medaka as stated by the authors, the introduction of zebrafish is highly recommended, since it has already been reported that zebrafish could serve as a vertebrate model to study NNV (doi: 10.3389/fimmu.2022.863096).

      As noted in our manuscript from line 374 to 384, marine medaka is a relatively new model for studying viral infections and is not yet optimized for CRISPR-Cas9-mediated gene knockout. The technical challenges related to precise embryo microinjection and off-target effects using CRISPR-Cas9 in marine medaka complicate the establishment of knockout lines. These limitations, including the time required for multiple breeding generations and molecular screening, currently make this approach difficult to implement.

      We fully agree with your suggestion to consider zebrafish as an alternative model. Zebrafish have been well-established as a vertebrate model for studying NNV, and their genetic tractability and well-developed CRISPR-Cas9 protocols provide a more accessible and efficient platform for generating knockout models. In our future studies, we plan to conduct CRISPR-Cas9-mediated knockout experiments targeting multiple NNV receptors in zebrafish. This will allow us to systematically evaluate the role of different receptors in NNV infection and elucidate their potential interactions. The findings from these studies will be included in a future publication, which will provide a more comprehensive understanding of the molecular mechanisms underlying NNV infection in vertebrate models.

      (4) The results shown in Figure 6 are not enough to support the conclusion that "RGNNV triggers macropinocytosis mediated by MmMYL3". Additional electron microscopy of macropinosomes (sizes, morphological characteristics, etc.) will be more direct evidence.

      Previous study has reported that dragon grouper nervous necrosis virus (DGNNV) enters SSN-1 cells primarily through micropinocytosis and macropinocytosis pathways. Electron microscopy observations revealed several kinds of membrane ruffling and large disproportionate macropinosomes were observed in DGNNV infected cells, indicating NNV infection could triggers micropinocytosis (Liu et al., 2005). In our study, the data from inhibitor treatments, co-localization of MmMYL3 with RGNNV CP, and dextran uptake assays also provide compelling evidence for the involvement of macropinocytosis in RGNNV entry via MmMYL3. These methods are well-established in the literature and have been used extensively to study viral entry pathways (Lingemann et al., 2019). Specifically, the dextran uptake assay has been widely utilized as a marker for macropinocytosis and has provided clear evidence of RGNNV internalization via this pathway. The use of macropinocytosis inhibitors, such as EIPA and Rottlerin, significantly reduced RGNNV entry, further supporting our conclusion. Nonetheless, we acknowledge the potential value of additional electron microscopy studies and will consider this approach in our future research.

      References

      Liu W, Hsu CH, Hong YR, Wu SC, Wang CH, Wu YM, Chao CB, Lin CS. 2005. Early endocytosis pathways in SSN-1 cells infected by dragon grouper nervous necrosis virus, J Gen Virol.

      Lingemann M, McCarty T, Liu X, Buchholz UJ, Surman S, Martin SE, Collins PL, Munir S. 2019. The alpha-1 subunit of the Na+,K+-ATPase (ATP1A1) is required for macropinocytic entry of respiratory syncytial virus (RSV) in human respiratory epithelial cells, PLoS Pathogens.

      (5) MYL3 is "predominantly found in muscle tissues, particularly the heart and skeletal muscles". However, NNV is a virus that mainly causes necrosis of nervous tissues (brain and retina). If MYL3 really acts as a receptor for NNV, how does it balance this difference so that nervous tissues, rather than muscle tissues, have the highest viral titers?

      While MYL3 is highly expressed in cardiac and skeletal muscles, studies have shown that MYL3, like other myosin light chains, can also be present in non-muscle tissues. Additionally, proteins involved in viral entry do not always need to be the most highly expressed in the final target tissue, as long as they facilitate the initial infection process. For instance, rabies virus is a rhabdovirus which exhibits a marked neuronotropism in infected animals. Transferrin receptor protein 1 can serve as a receptor for rabies virus through CME pathway, but TfR1 expressed most abundantly in liver tissue not nervous system (Wang et al., 2023).

      Viral tropism is often determined not only by the presence of an entry receptor but also by co-receptors, cellular factors, and post-entry mechanisms. While MYL3 may act as a receptor for NNV, other factors, such as cell-specific proteases, signaling molecules, and intracellular trafficking pathways, likely contribute to NNV’s preferential replication in the brain and retina.

      Reference

      Wang Xinxin, Wen Z, Cao H, Luo J, Shuai L, Wang C, Ge J, Wang Xijun, Bu Z, Wang J. 2023. Transferrin Receptor Protein 1 Is an Entry Factor for Rabies Virus. J Virol 97. doi:10.1128/jvi.01612-22

      Reviewer #2 (Public review):

      Summary:

      The manuscript offers an important contribution to the field of virology, especially concerning NNV entry mechanisms. The major strength of the study lies in the identification of MmMYL3 as a functional receptor for RGNNV and its role in macropinocytosis, mediated by the IGF1R-Rac1/Cdc42 signaling axis. This represents a significant advance in understanding NNV entry mechanisms beyond previously known receptors such as HSP90ab1 and HSC70. The data, supported by comprehensive in vitro and in vivo experiments, strongly justify the authors' claims about MYL3's role in NNV infection in marine medaka.

      Strengths:

      (1) The identification of MmMYL3 as a functional receptor for RGNNV is a significant contribution to the field. The study fills a crucial gap in understanding the molecular mechanisms governing NNV entry into host cells.

      (2) The work highlights the involvement of IGF1R in macropinocytosis-mediated NNV entry and downstream Rac1/Cdc42 activation, thus providing a thorough mechanistic understanding of NNV internalization process. This could pave the way for further exploration of antiviral targets.

      Thanks for your review.

      Reviewer #3 (Public review):

      Summary:

      The manuscript presents a detailed study on the role of MmMYL3 in the viral entry of NNV, focusing on its function as a receptor that mediates viral internalization through the macropinocytosis pathway. The use of both in vitro assays (e.g., Co-IP, SPR, and GST pull-down) and in vivo experiments (such as infection assays in marine medaka) adds robustness to the evidence for MmMYL3 as a novel receptor for RGNNV. The findings have important implications for understanding NNV infection mechanisms, which could pave the way for new antiviral strategies in aquaculture.

      Strengths:

      The authors show that MmMYL3 directly binds the viral capsid protein, facilitates NNV entry via the IGF1R-Rac1/Cdc42 pathway, and can render otherwise resistant cells susceptible to infection. This multifaceted approach effectively demonstrates the central role of MmMYL3 in NNV entry.

      Thanks for your review.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line94: SPR analysis? The full name should be provided when it first shows.

      We have defined SPR when it first appears at line 97 in the revised manuscript.

      (2) Moreover, is it too many for a manuscript to have a total of nine figures in the main text? Some of them might be moved to the supplementary file.

      We have merged the previous Fig 4 and Fig 5 and combined Fig 8 and Fig 9, reducing the number of figures to seven. For the specific details of the figure adjustments, please refer to the corresponding figure legends.

      Reviewer #2 (Recommendations for the authors):

      (1) Expand on the potential therapeutic implications of targeting MYL3 or the IGF1R pathway in aquaculture settings. Including a discussion of how inhibitors could be developed or tested in future research would give practical context to the findings.

      Thanks for your valuable suggestion to expand on the therapeutic implications of targeting MYL3 and the IGF1R pathway in aquaculture. In response, we have discussed potential strategies for developing inhibitors, such as small molecules, peptides, or monoclonal antibodies targeting MYL3 to block its interaction with the viral capsid, and IGF1R inhibitors to prevent macropinocytosis-mediated viral entry. We propose using virtual screening platforms to identify these inhibitors, followed by in vivo testing in aquaculture models. Additionally, combining MYL3 and IGF1R inhibitors could provide a synergistic approach to enhance antiviral efficacy. The relevant discussions have been supplemented at lines 358 to 368 in the revised manuscript.

      (2) It is recommended to include the data regarding the lack of interaction between the CMNV CP and MmMYL3 as a supplementary figure.

      We have included supplementary data demonstrating that CMNV CP does not interact with MmMYL3, highlighting the specificity of MYL3 for RGNNV. For detailed information, please refer to Fig. S4.

      Reviewer #3 (Recommendations for the authors):

      Consider discussing the broader implications of these findings, particularly whether MYL3 might serve as a receptor for other viruses.

      We appreciate this suggestion. It is important to note that viral receptors typically exhibit specificity for specific types of viruses. Receptor recognition is typically highly specific, and the binding interactions between viral proteins and host receptors often depend on the structural compatibility between the viral capsid/ viral envelope and the host receptor. Our study demonstrates that MYL3 serves as a receptor for NNV based on its direct interaction with the NNV capsid protein (CP). However, when we tested whether MYL3 interacts with CMNV (Covert Mortality Nodavirus), which is phylogenetically closer to NNV, we found that CMNV CP does not bind to MYL3. Given the lack of interaction between MYL3 and CMNV, it is unlikely that MYL3 serves as a receptor for more distantly related viruses. Since MYL3 does not interact with CMNV, a virus more closely related to NNV, it is less likely to function as a receptor for viruses that are more distantly related to NNV. The relevant discussions have been supplemented at lines 306 to 310 in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Diarrheal diseases represent an important public health issue. Among the many pathogens that contribute to this problem, Salmonella enterica serovar Typhimurium is an important one. Due to the rise in antimicrobial resistance and the problems associated with widespread antibiotic use, the discovery and development of new strategies to combat bacterial infections is urgently needed. The microbiome field is constantly providing us with various health-related properties elicited by the commensals that inhabit their mammalian hosts. Harnessing the potential of these commensals for knowledge about host-microbe interactions as well as useful properties with therapeutic implications will likely remain a fruitful field for decades to come. In this manuscript, Wang et al use various methods, encompassing classic microbiology, genomics, chemical biology, and immunology, to identify a potent probiotic strain that protects nematode and murine hosts from S. enterica infection. Additionally, authors identify gut metabolites that are correlated with protection, and show that a single metabolite can recapitulate the effects of probiotic administration.

      We gratefully appreciate your positive and professional comments.

      Strengths:

      The utilization of varied methods by the authors, together with the impressive amount of data generated, to support the claims and conclusions made in the manuscript is a major strength of the work. Also, the ability to move beyond simple identification of the active probiotic, also identifying compounds that are at least partially responsible for the protective effects, is commendable.

      We gratefully appreciate your positive and professional comments.

      Weaknesses:

      Although there is a sizeable amount of data reported in the manuscript, there seems to be a chronic issue of lack of details of how some experiments were performed. This is particularly true in the figure legends, which for the most part lack enough details to allow comprehension without constant return to the text. Additionally, 2 figures are missing. Figure 6 is a repetition of Figure 5, and Figure S4 is an identical replicate of Figure S3.

      We gratefully appreciate your professional comments. Additional details to perform the related experiments had been added in Materials and methods section and figure legends (e.g., see Line 478-487, Line 996-1001, Line 1010-1012, Line 1019-1020, Line 1031-1033, Line 1041-1042, Line 1051-1053, Line 1082-1083, Line 1087-1088, Line 1093-1094, Line 1105-1107, Line 1113-1114,). Furthermore, we sincerely apologize for the mistakes and the inconvenience in the evaluating process of your review, and we have added the correct Figure 6 (see Line 1043-1053) and Figure S4 (see Line 1084-1088). We will carefully and thoroughly check the whole submitted manuscript along with supplementary information to avoid such mistakes in the future.

      Reviewer #2 (Public review):

      In this work, the investigators isolated one Lacticaseibacillus rhamnosus strain (P118), and determined this strain worked well against Salmonella Typhimurium infection. Then, further studies were performed to identify the mechanism of bacterial resistance, and a list of confirmatory assays was carried out to test the hypothesis.

      We gratefully appreciate your positive and professional comments.

      Strengths:

      The authors provided details regarding all assays performed in this work, and this reviewer trusted that the conclusion in this manuscript is solid. I appreciate the efforts of the authors to perform different types of in vivo and in vitro studies to confirm the hypothesis.

      We gratefully appreciate your positive and professional comments.

      Weaknesses:

      I have two main questions about this work.

      (1) The authors provided the below information about the sources from which Lacticaseibacillus rhamnosus was isolated. More details are needed. What are the criteria to choose these samples? Where did these samples originate from? How many strains of bacteria were obtained from which types of samples?

      Sorry for the ambiguous and limited information, more details had been added in Materials and methods section (see Line 480-496). We gratefully appreciate your professional comments.

      Lines 486-488: Lactic acid bacteria (LAB) and Enterococcus strains were isolated from the fermented yoghurts collected from families in multiple cities of China and the intestinal contents from healthy piglets without pathogen infection and diarrhoea by our lab.

      Sorry for the ambiguous and limited information, we had carefully revised this section and more details had been added in Materials and methods section (see Line 480-496). We gratefully appreciate your professional comments.

      Lines 129-133: A total of 290 bacterial strains were isolated and identified from 32 samples of the fermented yoghurt and piglet rectal contents collected across diverse regions within China using MRS and BHI medium, which consist s of 63 Streptococcus strains, 158 Lactobacillus/ Lacticaseibacillus Limosilactobacillus strains, and 69 Enterococcus strains.

      Sorry for the ambiguous information, we had carefully revised this section and more details had been added in this section (see Line 129-132). We gratefully appreciate your professional comments.

      (2) As a probiotic, Lacticaseibacillus rhamnosus has been widely studied. In fact, there are many commercially available products, and Lacticaseibacillus rhamnosus is the main bacteria in these products. There are also ATCC type strains such as 53103.

      I am sure the authors are also interested to know whether P118 is better as a probiotic candidate than other commercially available strains. Also, would the mechanism described for P118 apply to other Lacticaseibacillus rhamnosus strains?

      It would be ideal if the authors could include one or two Lacticaseibacillus rhamnosus which are currently commercially used, or from the ATCC. Then, the authors can compare the efficacy and antibacterial mechanisms of their P118 with other strains. This would open the windows for future work.

      We gratefully appreciate your professional comments and valuable suggestions. We deeply agree that it will be better and make more sense to include well-known/recognized/commercial probiotics as a positive control to comprehensively evaluate the isolated P118 strain as a probiotic candidate, particularly in comparison to other well-established probiotics, and also help assess whether the mechanisms described for P118 are applicable to other L. rhamnosus strains or lactic acid bacteria in general. Those issues will be fully taken into consideration and included in the further works.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line 28 - The sentence "with great probiotic properties" suggests that this strain was already known to have probiotic properties. Is that the case?

      We gratefully appreciate your professional comments. This sentence "with great probiotic properties" in this context was intended as a summary of our findings, emphasizing that L. rhamnosus P118 exerts great probiotic properties after evaluating by traditional and C. elegans-infection screening strategies. We had revised this sentence (see Line27-30).

      (2) Line 30 - What exactly do authors mean by "traditional"? They should add a bit more information here as to what these methods would be.

      We gratefully appreciate your professional comments. By "traditional" methods, we refer to time-consuming and labor-intensive strategies for screening probiotic candidates with heavy works, which include bacterial isolation, culturing, phenotypic characterization, randomized controlled trials, and various in vitro and in vivo tests to assess probiotic properties (Sun et al., 2022). We had indicated this strategy in Line 91-94.

      Reference:

      Sun Y, Li HC, Zheng L, Li JZ, Hong Y, Liang PF, Kwok LY, Zuo YC, Zhang WY, Zhang HP. Iprobiotics: A machine learning platform for rapid identification of probiotic properties from whole-genome primary sequences. Briefings in Bioinformatics 2022;23.

      (3) Line 37 - I believe "harmful microbes" is not the correct term here. I suggest authors use "potentially harmful".

      Done as requested (see Line 36, 209, 212, 217, 381). We gratefully appreciate your valuable suggestions.

      (4) Line 75 - What exactly do authors mean by "irregular dietary consumption"?

      "irregular dietary consumption" means "irregular dietary habits" or " eating irregularly " or "abnormal eating behaviors". We had change to "irregular dietary habits" (see Line 76). We gratefully appreciate your professional comments.

      (5) Line 85 - What exactly do authors mean by "without residues in raw food products"?

      Here, "without residues in raw food products" means that probiotics barely remain in food animal products (e.g., meat, eggs, dairy) after dietary with probiotics in feeds by livestock and poultry. We gratefully appreciate your professional comments.

      (6) Line 86 - Please, give a specific example of yeast.

      Done as requested (see Line 85-86), “yeast (e.g., Saccharomyces boulardii, S. cerevisiae)”. We gratefully appreciate your valuable suggestions.

      (7) Line 112 - Lactobacillus reuteri should be written out, since this is the first time the species name appears in the main text.

      Done as requested (see Line 112). We gratefully appreciate your valuable suggestions.

      (8) Lines 115-118 - Please, rewrite for clarity.

      Done as requested (see Line 115-118). We gratefully appreciate your valuable suggestions.

      (9) Line 118 -Lacticaseibacillus rhamnosus should be written out, since this is the first time the species name appears in the main text.

      Done as requested (see Line 118). We gratefully appreciate your valuable suggestions.

      (10) Line 119 - Throughout the text authors make it seem like strain P118 was previously known. Is that the case? If yes, how was it isolated again? This should be briefly mentioned in the introduction.

      Sorry for the misunderstand caused by this statement, P118 strain was isolated and its probiotic properties were evaluated by our lab, not previously known, and we have revised this sentence (see Line 118-120). We gratefully appreciate your professional comments.

      (11) Line 131 - How were strains identified?

      Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry (MALDI-TOF MS) method was employed to identify of bacterial species (He et al., 2022). This information was indicated in Materials and methods section (see Line 485-489). We gratefully appreciate your professional comments.

      Reference

      He D, Zeng W, Wang Y, Xing Y, Xiong K, Su N, Zhang C, Lu Y, Xing X. Isolation and characterization of novel peptides from fermented products of lactobacillus for ulcerative colitis prevention and treatment. Food Science and Human Wellness 2022;11:1464-74.

      (12) Figure 1 - Legend needs a lot more info. Where are legends to panels PQ? Also, some of the text is too small to read.

      Sorry for the limited info, we have revised Figure 1 legend and added more info (see Line 1000-1019), and we also provide vector graphic of Figure 1. We gratefully appreciate your professional comments.

      (13) Line 136 - All strains were screened and 27 strains were positive, right?

      Yes, all strains were screened and 27 strains were positive. We gratefully appreciate your professional comments.

      (14) Figure 2 - What do authors mean by "spleen index" and "liver index"? This should be described in more detail. Also, p values for 'a', 'b', 'ab' should be given.

      The organ index (spleen index, liver index) were calculated according to the formula: organ index = organ weight (g) / body weight (g) *1000, indicating in Materials and methods section (see Line 587-588). “Different lowercase letters ('a', 'b') indicate a significant difference (P < 0.05)” had been added in Line 1020-1029. We gratefully appreciate your professional comments.

      (15) Line 212-214 - Again, I suggest authors use "potentially harmful" and "potentially beneficial".

      Done as requested (see Line 36, 210, 213, 218, 383). We gratefully appreciate your valuable suggestions.

      (16) Figure 3 - Which groups were tested in panels CD? Is this based on color? Legends should be restated in panels or clearly marked in the legend.

      Sorry for this mistake, we have revised and added group info in Figure 3C-D (see Line 1013-1020). We gratefully appreciate your professional comments.

      (17) Figure 4 - Lacks details.

      Sorry for the mistakes, we have revised and added group info in Figure 4D-E and legend (see Line 1031-1037). We gratefully appreciate your professional comments.

      (18) Figure 6 - This is a repetition of Figure 5.

      Sorry for the mistakes, we have added the correct Figure 6 (see Line 1060-1070). We gratefully appreciate your professional comments.

      (19) Lines 329-330 - C. elegans does not "mimic" animal intestinal physiology.

      Sorry for the mistakes, we have revised this statement (see Line 139-142, 324-325). We gratefully appreciate your professional comments.

      (20) Lines 358 and 418 - What do authors mean by "metabolic dysfunction" and "metabolic disorder"? I assume they mean changes in fecal metabolites. However, these are terms that may have different interpretations in the field of human metabolism. Therefore, I would suggest that the authors specify that they mean changes in fecal metabolite profiles when using these terms.

      Sorry for the mistakes caused by this statement, we have revised this statement in the revised version (see Line 34-35, 122, 353-354, 413). We gratefully appreciate your professional comments.

      (21) Line 475 - What do authors mean by "superficial effects"?

      Sorry for the mistakes, we had change to “beneficial/protective effects” (see Line 469, Line 1074). We gratefully appreciate your professional comments.

      (22) Line 486 - Were all yogurts artisanal? Where were piglets from? How were samples collected? Feces, rectal swabs? Does the ethics statement at the end of the manuscript also cover work with piglets?

      Yes, all yogurts were artisanal. The 6 healthy piglet rectal content samples without pathogen infection and diarrhea were from a pig farm of Zhejiang province. Yes, the ethics statement at the end of the manuscript also cover the work with piglets.

      (23) Line 490 - Which MALDI platform was used? The database used can have important implications for strain identification. What was the confidence of ID? This should be included.

      Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry (MALDI-TOF MS, Bruker Daltonik GmbH, Bremen, Germany) was employed to identify of bacterial species with a confidence level > 90%. This information was indicated in Materials and methods section (see Line 487-489). We gratefully appreciate your professional comments.

      (24) Line 501 - Is this a widely used method to characterize probiotics? Please, add a reference.

      Done as requested (see Line 498). Many probiotics or microbes can produce milk clotting enzyme to clot milk. It's an important measurement in the dairy industry, especially when making cheese (Zhang et al., 2023; Arbita et al., 2024; Shieh et al., 2009). The milk-clotting activity analysis is usually used for evaluating the potential ability of candidate probiotic isolates in clotting milk into cheeses.

      Reference:

      Zhang Y, Wang J, He J, Liu X, Sun J, Song X, Wu Y. Characteristics and application in cheese making of newly isolated milk-clotting enzyme from bacillus megaterium ly114. Food Res Int 2023;172:113202.

      Arbita AA, Zhao J. Milk clotting enzymes from marine resources and their role in cheese-making: A mini review. Crit Rev Food Sci Nutr. 2024;64(27):10036-10047.

      Chwen-Jen Shieh, Lan-Anh Phan Thi, Ing-Lung Shih. Milk-clotting enzymes produced by culture of Bacillus subtilis natto. Biochemical Engineering Journal. 2009;1(43): 85-91.

      (25) Line 713 - How were fecal metabolites extracted?

      Sorry for the missed information, the fecal metabolites extracted information had been added we have revised and added Materials and methods section (see Line 705-706). We gratefully appreciate your professional comments.

      (26) Figure 7 - Please correct "macrophages".

      Done as requested (see Figure 7, Line 1072). We gratefully appreciate your valuable suggestions.

      (27) Table 1 - Should read "number of strains", not size.

      Done as requested (see Line1084). We gratefully appreciate your valuable suggestions.

      (28) Figure S1B - Is this data for P118?

      Sorry for the mistakes, we have revised Figure S1 legend (see Line 1086-1088). We gratefully appreciate your professional comments.

      (29) Figure S3 - Legends C, S, PS, P are not specified.

      Sorry for the missed information, we have revised and added group info in Figure S3 legend (see Line 1095-1101). We gratefully appreciate your professional comments.

      (30) Figure S3B - What is the "clinical symptom score"? How was this determined?

      Sorry for the lack information, and the detailed information had been added in Materials and methods section (see Line 659-661, Table S7). We gratefully appreciate your professional comments.

      (31) Figure S4 - This is an identical copy of Figure S3.

      Sorry for the mistakes, we have added the correct Figure S4 (see Line 1103-1106). We gratefully appreciate your professional comments.

      (32) Figure S5 - Legend lacks details.

      Sorry for the missed information, we have revised and added group info in Figure S5 legend (see Line 1107-1112). We gratefully appreciate your professional comments.

      (33) Figure S8 - What is "GM"? Since it inhibits growth to a greater extent than the highest metabolite concentration used, I imagine it must be an antibiotic (gentamycin?) as a positive control. This needs to be clearly stated.

      Sorry for the missed information, GM: 100 μg/mL gentamicin (see Line 1134). We gratefully appreciate your professional comments.

      (34) Figure S9 - Labels for panels are missing.

      Sorry for the missed information, labels had been added (see Line 1135-1139). We gratefully appreciate your professional comments.

      Reviewer #2 (Recommendations for the authors):

      (1) This reviewer appreciates the efforts of the authors to provide the details related to this work. In the meantime, the manuscript shall be written in a way that is easy for the readers to follow.

      We had tried our best to revise and make improve the whole manuscript to make it easy for the readers to follow (e.g., see Line 27-30, Line 115-120, Line 129-132, Line 480-496). We gratefully appreciate your valuable suggestions.

      (2) For example, under the sections of Materials and Methods, there are 19 sub-titles. The authors could consider combining some sections, and/or citing other references for the standard procedures.

      We gratefully appreciate your professional comments and valuable suggestions. Some sections had been combined according to the reviewer’s suggestions (see Line 497-530, Line 637-671).

      (3) Another example: the figures have great resolution, but they are way too busy. Figures 1 and 2 have 14-18 panels. Figure 5 has 21 panels. Please consider separating into more figures, or condensing some panels.

      We deeply agree with you that some submitted figures are way too busy, but it’s not easy to move some results into supplementary information sections, because all of them are essential for fully supporting our hypothesis and conclusions. Nonetheless, some panels had been combined or condensed according to the reviewer’s suggestions (see Line 1000-1020, Line 1052-1071). We gratefully appreciate your professional comments and valuable suggestions.

      (4) Line 30: spell out "C." please.

      Done as requested (see Line 31). We gratefully appreciate your valuable suggestions.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This valuable work explores how synaptic activity encodes information during memory tasks. All reviewers agree that the quality of the work is high. Although experimental data do support the possibility that phospholipase diacylglycerol signaling and synaptotagmin 7 (Syt7) dynamically regulate the vesicle pool required for presynaptic release, concerns remain that the central finding of paired pulse depression at very short intervals was more likely caused by Ca<sup>2+</sup> channel inactivation than pool depletion. Overall, this is a solid study with valuable findings, but the results warrant consideration of alternative interpretations.

      We greatly appreciate invaluable and constructive comments from Editors and Reviewers. We also thank for their time and patience. We are pleased for our manuscript to have been assessed valuable and solid.

      One of the most critical concerns was a possible involvement of Ca<sup>2+</sup> channel inactivation in the strong paired pulse depression (PPD). Meanwhile, we have measured total (free plus buffered) calcium increments induced by each of first four APs in 40 Hz trains at axonal boutons of prelimbic layer 2/3 pyramidal cells. We found that first four Ca<sup>2+</sup> increments were not different from one another, arguing against possible contribution of Ca<sup>2+</sup> channel inactivation to PPD. Please see our reply to the 2nd issue in the Weakness section of Reviewer #3.

      The second critical issue was on the definition of ‘vesicular probability’. Previously, vesicular probability (p<sub>v</sub>) has been used with reference to the releasable vesicle pool which includes not only tightly docked vesicles but also reluctant vesicles. On the other hand, the meaning of p<sub>v</sub> in the present study is the release probability of tightly docked vesicles. We clarified this point in our replies to the 1st issues in the Weakness sections of Reviewer #2 and Reviewer #3.

      We below described our point-by-point replies to the Reviewers’ comments.

      Public Reviews:

      Reviewer #1 (Public review):

      Shin et al. conduct extensive electrophysiological and behavioral experiments to study the mechanisms of short-term synaptic plasticity at excitatory synapses in layer 2/3 of the rat medial prefrontal cortex. The authors interestingly find that short-term facilitation is driven by progressive overfilling of the readily releasable pool, and that this process is mediated by phospholipase C/diacylglycerol signaling and synaptotagmin-7 (Syt7). Specifically, knockdown of Syt7 not only abolishes the refilling rate of vesicles with high fusion probability, but it also impairs the acquisition of trace fear memory. Overall, the authors offer novel insight to the field of synaptic plasticity through well-designed experiments that incorporate a range of techniques.

      Reviewer #2 (Public review):

      Summary:

      Shin et al aim to identify in a very extensive piece of work a mechanism that contributes to dynamic regulation of synaptic output in the rat cortex at the second time scale. This mechanism is related to a new powerful model is well versed to test if the pool of SV ready for fusion is dynamically scaled to adjust supply demand aspects. The methods applied are state-of-the-art and both address quantitative aspects with high signal to noise. In addition, the authors examine both excitatory output onto glutamatergic and GABAergic neurons, which provides important information on how general the observed signals are in neural networks, The results are compellingly clear and show that pool regulation may be predominantly responsible. Their results suggests that a regulation of release probability, the alternative contender for regulation, is unlikely to be involved in the observed short term plasticity behavior (but see below). Besides providing a clear analysis pof the underlying physiology, they test two molecular contenders for the observed mechanism by showing that loss of Synaptotagmin7 function and the role of the Ca dependent phospholipase activity seems critical for the short term plasticity behavior. The authors go on to test the in vivo role of the mechanism by modulating Syt7 function and examining working memory tasks as well as overall changes in network activity using immediate early gene activity. Finally, they model their data, providing strong support for their interpretation of TS pool occupancy regulation.

      Strengths:

      This is a very thorough study, addressing the research question from many different angles and the experimental execution is superb. The impact of the work is high, as it applies recent models of short term plasticity behavior to in vivo circuits further providing insights how synapses provide dynamic control to enable working memory related behavior through nonpermanent changes in synaptic output.

      Weaknesses:

      (1) While this work is carefully examined and the results are presented and discussed in a detailed manner, the reviewer is still not fully convinced that regulation of release provability is not a putative contributor to the observed behavior. No additional work is needed but in the moment I am not convinced that changes in release probability are not in play. One solution may be to extend the discussion of changes in release probability as an alternative.

      Quantal content (m) depends on n * p<sub>v</sub>, where n = RRP size and p<sub>v</sub> =vesicular release probability. The value for p<sub>v</sub> critically depends on the definition of RRP size. Recent studies revealed that docked vesicles have differential priming states: loosely or tightly docked state (LS or TS, respectively). Because the RRP size estimated by hypertonic solution or long presynaptic depolarization is larger than that by back extrapolation of a cumulative EPSC plot (Moulder & Mennerick, 2005; Sakaba, 2006) in glutamatergic synapses, the former RRP (denoted as RRP<sub>hyper</sub>) may encompass not only AP-evoked fast-releasing vesicles (TS vesicle) but also reluctant vesicles (LS vesicles). Because we measured p<sub>v</sub> based on AP-evoked EPSCs such as strong paired pulse depression (PPD) and associated failure rates, p<sub>v</sub> in the present study denotes vesicular fusion probability of TS vesicles, not that of LS plus TS vesicles.

      Recent studies suggest that release sites are not fully occupied by TS vesicles in the baseline (Miki et al., 2016; Pulido and Marty, 2018; Malagon et al., 2020; Lin et al., 2022). Instead, the occupancy (p<sub>occ</sub>) by TS vesicles is subject to dynamic regulation by reversible rate constants (denoted by k<sub>1</sub> and b<sub>1</sub>, respectively). The number of TS vesicles (n) can be factored into the number of release sites (N) and p<sub>occ</sub>, among which N is a fixed parameter but p<sub>occ</sub> depends on k<sub>1</sub>/(k<sub>1</sub>+b<sub>1</sub>) under the framework of the simple refilling model (see Methods). Because these refilling rate constants are regulated by Ca<sup>2+</sup> (Hosoi, et al., 2008), p<sub>occ</sub> is not a fixed parameter. Therefore, release probability should be re-defined as p<sub>occ</sub> * p<sub>v</sub>. Given that N is fixed, the increase in release probability is a major player in STF. Our study asserts that STF by 2.3 times can be attributed to an increase in p<sub>occ</sub> rather than p<sub>v</sub>, because p<sub>v</sub> is close to unity (Fig. S8). Moreover, strong PPD was observed not only in the baseline but also at the early and in the middle of a train (Fig. 2 and 7) and during the recovery phase (Fig. 3), arguing against a gradual increase in p<sub>v</sub> of reluctant vesicles.

      We imagine that the Reviewer meant vesicular release or fusion probability (p<sub>v</sub>) by ‘release probability’. If so, p<sub>v</sub> (of TS vesicles) cannot be a major player in STF, because the baseline p<sub>v</sub> is already higher than 0.8 even if it is most parsimoniously estimated (Fig. 2). Moreover, considering very high refilling rate (23/s), the high double failure rate cannot be explained without assuming that p<sub>v</sub> is close to unity (Fig. S8).

      Conventional models for facilitation assume a post-AP residual Ca<sup>2+</sup>-dependent step increase in p<sub>v</sub> of RRP (Dittman et al., 2000) or reluctant vesicles (Turecek et al., 2016). Given that p<sub>v</sub> of TS vesicles is close to one, an increase in p<sub>v</sub> of TS vesicles cannot account for facilitation. The possibility for activity-dependent increase in fusion probability of LS vesicles (denoted as p<sub>v,LS</sub>) should be considered in two ways depending on whether LS and TS vesicles reside in distinct pools or in the same pool. Notably, strong PPD at short ISI implies that p<sub>v,LS</sub> is near zero at the resting state. Whereas LS vesicles do not contribute to baseline transmission, short-term facilitation (STF) may be mediated by cumulative increase in p<sub>v v,LS </sub> that reside in a distinct pool. Because the increase in p<sub>v,LS</sub> during facilitation recruits new release sites (increase in N), the variance of EPSCs should become larger as stimulation frequency increases, resulting in upward deviation from a parabola in the V-M plane, as shown in recent studies (Valera et al., 2012; Kobbersmed et al., 2020). This prediction is not compatible with our results of V-M analysis (Fig. 3), showing that EPSCs during STF fell on the same parabola regardless of stimulation frequencies. Therefore, it is unlikely that an increase in fusion probability of reluctant vesicles residing in a distinct release pool mediates STF in the present study.

      For the latter case, in which LS and TS vesicles occupy in the same release sites, it is hard to distinguish a step increase in fusion probability of LS vesicles from a conversion of LS vesicles to TS. Nevertheless, our results do not support the possibility for gradual increase in p<sub>v,LS</sub> that occurs in parallel with STF. Strong PPD, indicative of high p<sub>v</sub>, was consistently found not only in the baseline (Fig. 2 and Fig. S6) but also during post-tetanic augmentation phase (Fig. 3D) and even during the early development of facilitation (Fig. 2D-E and Fig. 7), arguing against gradual increase in p<sub>v,LS</sub>. One may argue that STF may be mediated by a drastic step increase of p<sub>v,LS</sub> from zero to one, but it is not distinguishable from conversion of LS to TS vesicles.

      To address the reviewer’s concern, we incorporated these perspectives into Discussion and further clarified the reasoning behind our conclusions.

      References

      Moulder KL, Mennerick S (2005) Reluctant vesicles contribute to the total readily releasable pool in glutamatergic hippocampal neurons. J Neurosci 25:3842–3850.

      Sakaba, T (2006) Roles of the fast-releasing and the slowly releasing vesicles in synaptic transmission at the calyx of Held. J Neurosci 26(22): 5863-5871.

      Please note that papers cited in the manuscript are not repeated here.

      (2) Fig 3 I am confused about the interpretation of the Mean Variance analysis outcome. Since the data points follow the curve during induction of short term plasticity, aren't these suggesting that release probability and not the pool size increases? Related, to measure the absolute release probability and failure rate using the optogenetic stimulation technique is not trivial as the experimental paradigm bias the experiment to a given output strength, and therefore a change in release probability cannot be excluded.

      Under the recent definition of release probability, it can be factored into p<sub>v</sub> and p<sub>occ</sub>, which are fusion probability of TS vesicles and the occupancy of release sites by TS vesicles, respectively. With this regard, our interpretation of the Variance-Mean results is consistent with conventional one: different data points along a parabola represent a change in release probability (= p<sub>occ</sub> x p<sub>v</sub>). Our novel finding is that the increase in release probability should be attributed to an increase in p<sub>occ</sub>, not to that in p<sub>v</sub>.

      (3) Fig4B interprets the phorbol ester stimulation to be the result of pool overfilling, however, phorbol ester stimulation has also been shown to increase release probability without changing the size of the readily releasable pool. The high frequency of stimulation may occlude an increased paired pulse depression in presence of OAG, which others have interpreted in mammalian synapses as an increase in release probability.

      To our experience in the calyx of Held synapses, OAG, a DAG analogue, increased the fast releasing vesicle pool (FRP) size (Lee JS et al., 2013), consistent with our interpretation (pool overfilling). Once the release sites are overfilled in the presence of OAG, it is expected that the maximal STF (ratio of facilitated to baseline EPSCs) becomes lower as long as the number of release sites (N) are limited. As aforementioned, the baseline p<sub>v</sub> is already close to one, and thus it cannot be further increased by OAG. Instead, the baseline p<sub>occ</sub> seems to be increased by OAG.

      Reference

      Lee JS, et al., Superpriming of synaptic vesicles after their recruitment to the readily releasable pool. Proc Natl Acad Sci U S A, 2013. 110(37): 15079-84.

      (4) The literature on Syt7 function is still quite controversial. An observation in the literature that loss of Syt7 function in the fly synapse leads to an increase of release probability. Thus the observed changes in short term plasticity characteristics in the Syt7 KD experiments may contain a release probability component. Can the authors really exclude this possibility? Figure 5 shows for the Syt7 KD group a very prominent depression of the EPSC/IPSC with the second stimulus, particularly for the short interpulse intervals, usually a strong sign of increased release probability, as lack of pool refilling can unlikely explain the strong drop in synaptic output.

      The reviewer raises an interesting point regarding the potential link between Syt7 KD and increased initial p<sub>v</sub>, particularly in light of observations in Drosophila synapses (Guan et al., 2020; Fujii et al., 2021), in which Syt7 mutants exhibited elevated initial p<sub>v</sub>. However, it is important to note that these findings markedly differ from those in mammalian systems, where the role of Syt7 in regulating initial p<sub>v</sub> has been extensively studied. In rodents, consistent evidence indicates that Syt7 does not significantly affect initial p<sub>v</sub>, as demonstrated in several studies (Jackman et al., 2016; Chen et al., 2017; Turecek and Regehr, 2018). Furthermore, in our study of excitatory synapses in the mPFC layer 2/3, we observed an initial p<sub>v</sub> already near its maximal level, approaching a value of 1. Consequently, it is unlikely that the loss of Syt7 could further elevate the initial p<sub>v</sub>. Instead, such effects are more plausibly explained by alternative mechanisms, such as alterations in vesicle replenishment dynamics, rather than a direct influence on p<sub>v</sub>.

      References

      Chen, C., et al., Triple Function of Synaptotagmin 7 Ensures Efficiency of High-Frequency Transmission at Central GABAergic Synapses. Cell Rep, 2017. 21(8): 2082-2089.

      Fujii, T., et al., Synaptotagmin 7 switches short-term synaptic plasticity from depression to facilitation by suppressing synaptic transmission. Scientific reports, 2021. 11(1): 4059.

      Guan, Z., et al., Drosophila Synaptotagmin 7 negatively regulates synaptic vesicle release and replenishment in a dosage-dependent manner. Elife, 2020. 9: e55443.

      Jackman, S.L., et al., The calcium sensor synaptotagmin 7 is required for synaptic facilitation. Nature, 2016. 529(7584): 88-91.

      Turecek, J. and W.G. Regehr, Synaptotagmin 7 mediates both facilitation and asynchronous release at granule cell synapses. Journal of Neuroscience, 2018. 38(13): 3240-3251.

      Reviewer #3 (Public review):

      Summary:

      The report by Shin, Lee, Kim, and Lee entitled "Progressive overfilling of readily releasable pool underlies short-term facilitation at recurrent excitatory synapses in layer 2/3 of the rat prefrontal cortex" describes electrophysiological experiments of short-term synaptic plasticity during repetitive presynaptic stimulation at synapses between layer 2/3 pyramidal neurons and nearby target neurons. Manipulations include pharmacological inhibition of PLC and actin polymerization, activation of DAG receptors, and shRNA knockdown of Syt7. The results are interpreted as support for the hypothesis that synaptic vesicle release sites are vacant most of the time at resting synapses (i.e., p_occ is low) and that facilitation (and augmentation) components of short-term enhancement are caused by an increase in occupancy, presumably because of acceleration of the transition from not-occupied to occupied. The report additionally describes behavioural experiments where trace fear conditioning is degraded by knocking down syt7 in the same synapses.

      Strengths:

      The strength of the study is in the new information about short-term plasticity at local synapses in layer 2/3, and the major disruption of a memory task after eliminating short-term enhancement at only 15% of excitatory synapses in a single layer of a small brain region. The local synapses in layer 2/3 were previously difficult to study, but the authors have overcome a number of challenges by combining channel rhodopsins with in vitro electroporation, which is an impressive technical advance.

      Weaknesses:

      (1) The question of whether or not short-term enhancement causes an increase in p_occ (i.e., "readily releasable pool overfilling") is important because it cuts to the heart of the ongoing debate about how to model short term synaptic plasticity in general. However, my opinion is that, in their current form, the results do not constitute strong support for an increase in p_occ, even though this is presented as the main conclusion. Instead, there are at least two alternative explanations for the results that both seem more likely. Neither alternative is acknowledged in the present version of the report.

      The evidence presented to support overfilling is essentially two-fold. The first is strong paired pulse depression of synaptic strength when the interval between action potentials is 20 or 25 ms, but not when the interval is 50 ms. Subsequent stimuli at frequencies between 5 and 40 Hz then drive enhancement. The second is the observation that a slow component of recovery from depression after trains of action potentials is unveiled after eliminating enhancement by knocking down syt7. Of the two, the second is predicted by essentially all models where enhancement mechanisms operate independently of release site depletion - i.e., transient increases in p_occ, p_v, or even N - so isn't the sort of support that would distinguish the hypothesis from alternatives (Garcia-Perez and Wesseling, 2008, https://doi.org/10.1152/jn.01348.2007).

      The apparent discrepancy in interpretation of post-tetanic augmentation between the present and previous papers [Sevens Wesseling (1999), Garcia-Perez and Wesseling (2008)] is an important issue that should be clarified. We noted that different meanings of ‘vesicular release probability’ in these papers are responsible for the discrepancy. We added an explanation to Discussion on the difference in the meaning of ‘vesicular release probability’ between the present study and previous studies [Sevens Wesseling (1999), Garcia-Perez and Wesseling (2008)]. In summary, the p<sub>v</sub> in the present study was used for vesicular release probability of TS vesicles, while previous studies used it as vesicular release probability of vesicles in the RRP, which include LS and TS vesicles. Accordingly, p<sub>occ</sub> in the present study is the occupancy of release sites by TS vesicles.

      Not only double failure rate but also other failure rates upon paired pulse stimulation were best fitted at p<sub>v</sub> close to 1 (Fig. S8 and associated text). Moreover, strong PPD, indicating release of vesicles with high p<sub>v</sub>, was observed not only at the beginning of a train but also in the middle of a 5 Hz train (Fig. 2D), during the augmentation phase after a 40 Hz train (Fig 3D), and in the recovery phase after three pulse bursts (Fig. 7). Given that p<sub>v</sub> is close to 1 throughout the EPSC trains and that N does not increase during a train (Fig. 3), synaptic facilitation can be attained only by the increase in p<sub>occ</sub> (occupancy of release sites by TS vesicles). In addition, it should be noted that Fig. 7 demonstrates strong PPD during the recovery phase after depletion of TS vesicles by three pulse bursts, indicating that recovered vesicles after depletion display high p<sub>v</sub> too. Knock-down of Syt7 slowed the recovery of TS vesicles after depletion of TS vesicles, highlighting that Syt7 accelerates the recovery of TS vesicles following their depletion.

      As addressed in our reply to the first issue raised by Reviewer #2 and the third issue raised by Reviewer #3, our results do not support possibilities for recruitment of new release sites (increase in N) having low p<sub>v</sub> or for a gradual increase in p<sub>v</sub> of reluctant vesicles during short-term facilitation.  

      Following statement was added to Discussion in the revised manuscript

      “Previous studies suggested that an increase in p<sub>v</sub> is responsible for post-tetanic augmentation (Stevens and Wesseling, 1999; Garcia-Perez and Wesseling, 2008) by observing invariance of the RRP size after tetanic stimulation. In these studies, the RRP size was estimated by hypertonic sucrose solution or as the sum of EPSCs evoked 20 Hz/60 pulses train (denoted as ‘RRP<sub>hyper</sub>’). Because reluctant vesicles (called LS vesicles) can be quickly converted to TS vesicles (16/s) and are released during a train (Lee et al., 2012), it is likely that the RRP size measured by these methods encompasses both LS and TS vesicles. In contrast, we assert high p<sub>v</sub> based on the observation of strong PPD and failure rates upon paired stimulations at ISI of 20 ms (Fig. 2 and Fig. S8). Given that single AP-induced vesicular release occurs from TS vesicles but not from LS vesicles, p<sub>v</sub> in the present study indicates the fusion probability of TS vesicles. From the same reasons, p<sub>occ</sub> denotes the occupancy of release sites by TS vesicles. Note that our study does not provide direct clue whether release sites are occupied by LS vesicles that are not tapped by a single AP, although an increase in the LS vesicle number may accelerate the recovery of TS vesicles. As suggested in Neher (2024), even if the number of LS plus TS vesicles are kept constant, an increase in p<sub>occ</sub> (occupancy by TS vesicles) would be interpreted as an increase in ‘vesicular release probability’ as in the previous studies (Stevens and Wesseling (1999); Garcia-Perez and Wesseling (2008)) as long as it was measured based on RRP<sub>hyper</sub>.”

      (2) Regarding the paired pulse depression: The authors ascribe this to depletion of a homogeneous population of release sites, all with similar p_v. However, the details fit better with the alternative hypothesis that the depression is instead caused by quickly reversing inactivation of Ca<sup>2+</sup> channels near release sites, as proposed by Dobrunz and Stevens to explain a similar phenomenon at a different type of synapse (1997, PNAS, https://doi.org/10.1073/pnas.94.26.14843). The details that fit better with Ca<sup>2+</sup> channel inactivation include the combination of the sigmoid time course of the recovery from depression (plotted backwards in Fig1G,I) and observations that EGTA (Fig2B) increases the paired-pulse depression seen after 25 ms intervals. That is, the authors ascribe the sigmoid recovery to a delay in the activation of the facilitation mechanism, but the increased paired pulse depression after loading EGTA indicates, instead, that the facilitation mechanism has already caused p_r to double within the first 25 ms (relative to the value if the facilitation mechanism was not active). Meanwhile, Ca<sup>2+</sup> channel inactivation would be expected to cause a sigmoidal recovery of synaptic strength because of the sigmoidal relationship between Ca<sup>2+</sup>-influx and exocytosis (Dodge and Rahamimoff, 1967, https://doi.org/10.1113/jphysiol.1967.sp008367).

      The Ca<sup>2+</sup>-channel inactivation hypothesis could probably be ruled in or out with experiments analogous to the 1997 Dobrunz study, except after lowering extracellular Ca<sup>2+</sup> to the point where synaptic transmission failures are frequent. However, a possible complication might be a large increase in facilitation in low Ca<sup>2+</sup> (Fig2B of Stevens and Wesseling, 1999, https://doi.org/10.1016/s0896-6273(00)80685-6).

      We appreciate the reviewer's thoughtful comment regarding the potential role of Ca<sup>2+</sup> channel inactivation in the observed paired-pulse depression (PPD). As noted by the Reviewer, the Dobrunz and Stevens (1997) suggested that the high double failure rate at short ISIs in synapses exhibiting PPD can be attributed to Ca<sup>2+</sup> channel inactivation. This interpretation seems to be based on a premise that the number of RRP vesicles are not varied trial-by-trial. The number of TS vesicles, however, can be dynamically regulated depending on the parameters k<sub>1</sub> and b<sub>1</sub>, as shown in Fig. S8, implying that the high double failure rate at short ISIs cannot be solely attributed to Ca<sup>2+</sup> channel inactivation. Nevertheless, we acknowledge the possibility that Ca<sup>2+</sup> channel inactivation may contribute to PPD, and therefore, we have further investigated this possibility. Specifically, we measured action potential (AP)-evoked Ca<sup>2+</sup> transients at individual axonal boutons of layer 2/3 pyramidal cells in the mPFC using two-dye ratiometry techniques. Our analysis revealed no evidence for Ca<sup>2+</sup> channel inactivation during a 40 Hz train of APs. This finding indicates that voltage-gated Ca<sup>2+</sup> channel inactivation is unlikely to contribute to the pronounced PPD.

      Figure 2—figure supplement 2 shows how we measured the total Ca<sup>2+</sup> increments at axonal boutons. First we estimated endogenous Ca<sup>2+</sup>-binding ratio from analyses of single AP-induced Ca<sup>2+</sup> transients at different concentrations of Ca<sup>2+</sup> indicator dye (panels A to E). And then, using the Ca<sup>2+</sup> buffer properties, we converted free [Ca<sup>2+</sup>] amplitudes to total calcium increments for the first four AP-evoked Ca<sup>2+</sup> transients in a 40 Hz train (panels G-I). We incorporated these results into the revised version of our manuscript to provide evidence against the Ca<sup>2+</sup> channel inactivation.

      (3) On the other hand, even if the paired pulse depression is caused by depletion of release sites rather than Ca<sup>2+</sup>-channel inactivation, there does not seem to be any support for the critical assumption that all of the release sites have similar p_v. And indeed, there seems to be substantial emerging evidence from other studies for multiple types of release sites with 5 to 20-fold differences in p_v at a wide variety of synapse types (Maschi and Klyachko, eLife, 2020, https://doi.org/10.7554/elife.55210; Rodriguez Gotor et al, eLife, 2024, https://doi.org/10.7554/elife.88212 and refs. therein). If so, the paired pulse depression could be caused by depletion of release sites with high p_v, whereas the facilitation could occur at sites with much lower p_v that are still occupied. It might be possible to address this by eliminating assumptions about the distribution of p_v across release sites from the variance-mean analysis, but this seems difficult; simply showing how a few selected distributions wouldn't work - such as in standard multiple probability fluctuation analyses - wouldn't add much.

      We appreciate the reviewer’s insightful comments regarding the potential increase in p<sub>fusion</sub> of reluctant vesicles. It should be noted, however, that Maschi and Klyachko (2020) showed a distribution of release probability (p<sub>r</sub>) within a single active zone rather than a heterogeneity in p<sub>fusion</sub> of individual docked vesicles. Therefore both p<sub>occ</sub> and p<sub>v</sub> of TS vesicles would contribute to the p<sub>r</sub> distribution shown in Maschi and Klyachko (2020). 

      The Reviewer’s concern aligns closely with the first issue raised by Reviewer #2, to which we addressed in detail. Briefly, new release site may not be recruited during facilitation or post-tetanic augmentation, because variance of EPSCs during and after a train fell on the same parabola (Fig. 3). Secondly, strong PPD was observed not only in the baseline but also during early and late phases of facilitation, indicating that vesicles with very high p<sub>v</sub> contribute to EPSC throughout train stimulations (Fig. 2, 3, and 7). These findings argue against the possibilities for recruitment of new release sites harboring low p<sub>v</sub> vesicles and for a gradual increase in fusion probability of reluctant vesicles.

      To address the reviewers’ concern, we incorporated the perspectives into Discussion and further clarified the reasoning behind our conclusions.

      (4) In any case, the large increase - often 10-fold or more - in enhancement seen after lowering Ca<sup>2+</sup> below 0.25 mM at a broad range of synapses and neuro-muscular junctions noted above is a potent reason to be cautious about the LS/TS model. There is morphological evidence that the transitions from a loose to tight docking state (LS to TS) occur, and even that the timing is accelerated by activity. However, 10-fold enhancement would imply that at least 90 % of vesicles start off in the LS state, and this has not been reported. In addition, my understanding is that the reverse transition (TS to LS) is thought to occur within 10s of ms of the action potential, which is 10-fold too fast to account for the reversal of facilitation seen at the same synapses (Kusick et al, 2020, https://doi.org/10.1038/s41593-020-00716-1).

      As the Reviewer suggested, low external Ca<sup>2+</sup> concentration can lower release probability (p<sub>r</sub>). Given that both p<sub>v</sub> and p<sub>occ</sub> are regulated by [Ca<sup>2+</sup>]<sub>i</sub>, low external [Ca<sup>2+</sup>] may affect not only p<sub>v</sub> but also p<sub>occ</sub>, both of which would contribute to low p<sub>r</sub>. Under such conditions, it would be plausible that the baseline p<sub>r</sub> becomes much lower than 0.1 due to low p<sub>v</sub> and p<sub>occ</sub> (for instance, p<sub>v</sub> decreases from 1 to 0.5, and p<sub>occ</sub> from 0.3 to 0.1, then p<sub>r</sub> = 0.05), and then p<sub>r</sub> (= p<sub>v</sub> x p<sub>occ</sub>) has a room for an increase by a factor of ten (0.5, for example) by short-term facilitation as cytosolic [Ca<sup>2+</sup>] accumulates during a train.

      If p<sub>v</sub> is close to one, p<sub>r</sub> depends p<sub>occ</sub>, and thus facilitation depends on the number of TS vesicles just before arrival of each AP of a train. Thus, post-train recovery from facilitation would depend on restoration of equilibrium between TS and LS vesicles to the baseline. Even if transition between LS and TS vesicles is very fast (tens of ms), the equilibrium involved in de novo priming (reversible transitions between recycling vesicle pool and partially docked LS vesicles) seems to be much slower (13 s in Fig. 5A of Wu and Borst 1999). Thus, we can consider a two-step priming model (recycling pool -> LS -> TS), which is comprised of a slow 1st step (-> LS) and a fast 2nd step (-> TS). Under the framework of the two-step model, the slow 1st step (de novo priming step) is the rate limiting step regulating the development and recovery kinetics of facilitation. Given that on and off rate for Ca<sup>2+</sup> binding to Syt7 is slow, it is plausible that Syt7 may contribute to short-term facilitation (STF) by Ca<sup>2+</sup>-dependent acceleration of the 1st step (as shown in Fig. 9). During train stimulation, the number of LS vesicles would slowly accumulate in a Syt7 and Ca<sup>2+</sup>-dependent manner, and this increase in LS vesicles would shift LS/TS equilibrium towards TS, resulting in STF. After tetanic stimulation, the recovery kinetics from facilitation would be limited by slow recovery of LS vesicles.

      Reference

      Wu, L.-G. and Borst J.G.G. (1999) The reduced release probability of releasable vesicles during recovery from short-term synaptic depression. Neuron, 23(4): 821-832.

      Please note that papers cited in the manuscript are not repeated here.

      Individual points:

      (1) An additional problem with the overfilling hypothesis is that syt7 knockdown increases the estimate of p_occ extracted from the variance-mean analysis, which would imply a faster transition from unoccupied to occupied, and would consequently predict faster recovery from depression. However, recovery from depression seen in experiments was slower, not faster. Meanwhile, the apparent decrease in the estimate of N extracted from the mean-variance analysis is not anticipated by the authors' model, but fits well with alternatives where p_v varies extensively among release sites because release sites with low p_v would essentially be silent in the absence of facilitation.

      Slower recovery from depression observed in the Syt7 knockdown (KD) synapses (Fig. 7) may results from a deficiency in activity-dependent acceleration of TS vesicle recovery. Although basal occupancy was higher in the Syt7 KD synapses, this does not indicate a faster activity-dependent recovery.

      Higher baseline occupancy does not always imply faster recovery of PPR too. Actually PPR recovery was slower in Syt7 KD synapses than WT one (18.5 vs. 23/s). Under the framework of the simple refilling model (Fig. S8Aa), the baseline occupancy and PPR recovery rate are calculated as k<sub>1</sub> / (k<sub>1</sub> + b<sub>1</sub>) and (k<sub>1</sub> + b<sub>1</sub>), respectively. The baseline occupancy depends on k<sub>1</sub>/b<sub>1</sub>, while the PPR recovery on absolute values of k<sub>1</sub> and b<sub>1</sub>. Based on p<sub>occ</sub> and PPR recovery time constant of WT and KD synapses, we expect higher k<sub>1</sub>/b<sub>1</sub> but lower values for (k<sub>1</sub> + b<sub>1</sub>) in Syt7 KD synapses compared to WT ones.

      Lower release sites (N) in Syt7-KD synapses was not anticipated. As you suggested, such low N might be ascribed to little recruitment of release sites during a train in KD synapses. But our results do not support this model. If silent release sites are recruited during a train, the variance should upwardly deviate from the parabola predicted under a fixed N (Valera et al., 2012; Kobbersmed et al. 2020). Our result was not the case (Fig. 3). In the first version of the manuscript, we have argued against this possibility in line 203-208.

      As discussed in both the Results and Discussion sections, the baseline EPSC was unchanged by KD (Fig. S3) because of complementary changes in the number of docking sites and their baseline occupancy (Fig. 6). These findings suggest that Syt7 may be involved in maintaining additional vacant docking sites, which could be overfilled during facilitation. It remains to be determined whether the decrease in docking sites in Syt7 KD synapses is related to its specific localization of Syt7 at the plasma membrane of active zones, as proposed in previous studies (Sugita et al., 2001; Vevea et al., 2021).

      (2) Figure S4A: I like the TTX part of this control, but the 4-AP part needs a positive control to be meaningful (e.g., absence of TTX).

      The reason why we used 4-AP in the presence of TTX was to increase the length constant of axon fibers and to facilitate the conduction of local depolarization in the illumination area to axon terminals. The lack of EPSC in the presence of 4-AP and TTX indicates that illumination area is distant from axon terminals enough for optic stimulation-induced local depolarization not to evoke synaptic transmission. This methodology has been employed in previous studies including the work of Little and Carter (2013).

      Reference

      Little JP and Carter AG (2013) Synaptic mechanisms underlying strong reciprocal connectivity between the medial prefrontal cortex and basolateral amygdala. J Neurosci, 33(39): 15333-15342.

      (3) Line 251: At least some of the previous studies that concluded these drugs affect vesicle dynamics used logic that was based on some of the same assumptions that are problematic for the present study, so the reasoning is a bit circular.

      (4) Line 329 and Line 461: A similar problem with circularity for interpreting earlier syt7 studies.

      (Reply to #3 and #4) We selected the target molecules as candidates based on their well-characterized roles in vesicle dynamics, and aimed to investigate what aspects of STP are affected by these molecules in our experimental context. For example, we could find that the baseline p<sub>occ</sub> and short-term facilitation (STF) are enhanced by the baseline DAG level and train stimulation-induced PLC activation, respectively. Notably, the effect of dynasore informed us that slow site clearing is responsible for the late depression of 40 Hz train EPSC. The knock-down experiments also provided us with information on the critical role of Syt7 in replenishment of TS vesicles. These approaches do not deviate from standard scientific reasoning but rather builds upon prior knowledge to formulate and test hypotheses.

      Importantly, our conclusions do not rely solely on the assumption that altering the target molecule impacts synaptic transmission. Instead, our conclusions are derived from a comprehensive analysis of diverse outcomes obtained through both pharmacological and genetic manipulations. These interpretations align closely with prior literature, further validating our conclusions.

      Therefore, the use of established studies to guide candidate selection and the consistency of our findings with existing knowledge do not represent a logical circularity but rather a reinforcement of the proposed mechanism through converging lines of evidence.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Comments:

      (1) While the authors claim that Syt7-mediated facilitation is connected to the behavioral deficits they observed, this link is still somewhat speculative. This manuscript could benefit from further discussions of other alternative mechanisms to consider.

      We added following statement to Discussion of the revised manuscript:

      “The acquisition of trace fear memory was impaired by inhibition of persistent activity in mPFC during trace period (Gilmartin et al., 2013). The similar deficit observed in Syt7 KD animals is consistent with the hypothesis that STF provides bi-stable ensemble activity in a recurrent network (Mongillo et al., 2012). Nevertheless, alternative mechanisms may be responsible for the behavioral deficit. Not only recurrent network but also long-range loop between the mPFC and the mediodorsal (MD) thalamus play a critical role in maintaining persistent activity within the mPFC especially for a delay period longer than 10 s (Bolkan et al., 2017). Prefrontal L2/3 is heavily innervated by MD thalamus, and L2/3-PCs subsequently relay signals to L5 cortico-thalamic (CT) neurons (Collins et al., 2018). Given that L2/3 is an essential component of the PFC-thalamic loop, loss of STF at recurrent synapses between L2/3 PCs may lead to insufficient L2/3 inputs to L5 CT neurons and failure in the reverberant PFC-MD thalamic feedback loop. Therefore, not only L2/3 recurrent network but also its output to downstream network should be considered as a possible network mechanism underlying behavioral deficit caused by Syt7 KD L2/3.”

      (2) The authors mention that Syt7 contributes to persistent activity during working memory tasks but focus on using only a trace fear conditioning task. However, it would be interesting to see if their results are generalizable to other working memory tasks (i.e. a delayed alternation task).

      We thank to Reviewer for the insightful suggestion. Trace fear conditioning (tFC) shares behavioral properties with working memory (WM) tasks in that tFC is vulnerable to attentional distraction and to the load of WM task. In general WM tasks including delayed alternation tasks such as a T-maze task need persistent activity of ensemble neurons representing target-specific information among multiple choices. Different from such WM tasks, tFC is not appropriate to examine target-specific ensemble activity. Because it is not trivial to examine in vivo recordings in KD animals during delayed alternation tasks, it will be appropriate to study the effect of Syt7 KD in a separate study. 

      (3) The figure legend in Figure 6A and 6B mentions dotted lines and broken lines in the figure. However, this is confusing, and it is unclear as to what these lines are referring to in the figure.

      To avoid the confusion in the figure legend for Figure 6A and 6B, we corrected “dotted line” to " vertical broken line", and “broken lines” to “dashed parabolas”.

      (4) The manuscript can benefit from close reading and editing to catch typos and improve general readability (i.e. line 173: the word "are" is repeated twice).

      We corrected typographical errors throughout the manuscript and carefully read the manuscript to improve readability. A revised version reflecting these corrections has been prepared and will be resubmitted for your consideration.

      Reviewer #3 (Recommendations for the authors):

      The points in this section are all minor.

      (1) Line 44: Define release probability (p_r) more clearly. Authors use it to mean p<sub>v</sub>*p<sub>occ</sub>, but others routinely use it to mean p<sub>v</sub>*p<sub>occ</sub>*N.

      We understand that the Reviewer meant “others routinely use it to mean p<sub>v</sub>”. At this statement, we meant conventional definition of release probability, which is release probability among vesicles of RRP. We think that it is not appropriate to re-define release probability as p<sub>v</sub> * p<sub>occ</sub> in this first paragraph of Introduction. Therefore we clarified this issue in Discussion as we mentioned in our reply to the 1st weakness issue raised by Reviewer #3.   

      (2) Line 82: For clarity, define better what recurrent excitatory synapses are. It seems that synapses between L2/3 PCs and local targets may all be recurrent?

      Each of L2/3 and L5 of the prefrontal cortical layers harbors intralaminar recurrent excitatory synapses between pyramidal cells, called a recurrent network. Previous theoretical studies have proposed that a single layer recurrent network model can have bi-stable E/I balanced states (up- and down-states) if recurrent excitatory synapses display short-term facilitation (STF), and thus is able to temporally hold an information once external input shifts the network to the up-state. In this theory, synapses to local targets across layers are not considered and specific roles of L2/3 and L5 in working memory tasks are still elusive. For clarity, we added a statement at the beginning of the paragraph (line 82): “Each of layer 2/3 (L2/3) and layer 5 (L5) of neocortex displays intralaminar excitatory synapses between pyramidal cells comprising a recurrent network (Holmgren et al., 2003; Thomson and Lamy, 2007)”

      (3) Cite earlier studies of short-term synaptic plasticity at synapses between L2/3 pyramidal neurons and local targets in mPFC. If there are none, take more explicit credit for being first.

      As we mentioned in Introduction, previous studies on short-term plasticity (STP) at neocortical excitatory recurrent synapses have focused on synapses between L5 pyramidal cells (PCs) (Hemple et al. 2000; Wang et al. 2006; Morishima et al., 2011; Yoon et al., 2020). The local connectivity between L2/3 PCs in the somatosensory cortex has been elucidated by Homgren et al. (2003) and Ko et al. (2011). Although these study showed STP of EPSPs, it was at a fixed frequency or stimulus pattern at high external [Ca<sup>2+</sup>] (2 mM). There is a study on the frequency-dependence of STP of EPSP between L2/3-PCs (Feldmyer et al., 2006). Different from our study, Feldmyer et al., (2006) observed monotonous STD at all frequencies less than 50 Hz, but this study was done in the somatosensory cortex and at high external [Ca<sup>2+</sup>] (2 mM). To our knowledge, no previous study have investigated STP at recurrent excitatory synapses of L2/3 pyramidal cells of the mPFC especially at physiological external [Ca<sup>2+</sup>]. The present study, therefore, represents the first extensive investigation of STP at recurrent excitatory synapses in L2/3 of the mPFC under physiologically relevant external [Ca<sup>2+</sup>].

      References

      Feldmeyer D, Lubke J, Silver RA, Sakmann B (2002) Synaptic connections between layer 4 spiny neurone-layer 2/3 pyramidal cell pairs in juvenile rat barrel cortex: physiology and anatomy of interlaminar signalling within a cortical column. J Physiol 538:803-822.

      Holmgren C, Harkany T, Svennenfors B, Zilberter Y (2003) Pyramidal cell communication within local networks in layer 2/3 of rat neocortex. J Physiol 551:139-153.

      Ko H, Hofer SB, Pichler B, Buchanan KA, Sjöström PJ, Mrsic-Flogel TD (2011) Functional specificity of local synaptic connections in neocortical networks. Nature 473:87-91.

      Morishima M, Morita K, Kubota Y, Kawaguchi Y (2011) Highly differentiated projection-specific cortical subnetworks. Journal of Neuroscience 31:10380-10391.

      Wang Y, Markram H, Goodman PH, Berger TK, Ma J, Goldman-Rakic PS (2006) Heterogeneity in the pyramidal network of the medial prefrontal cortex. Nat Neurosci 9:534-542.

      (4) I couldn't figure out the significance of Figure S3. Perhaps this could be explained better.

      Optical minimal stimulation methods have not been previously documented in detail. This figure illustrates what parameters we should carefully examine in order to attain optical minimal stimulation, which hopefully stimulates a single afferent fiber. A single fiber stimulation by optical minimal stimulation is supported by the similarity of our estimate for the number of release sites (N) as the previous morphological estimate (Holler et al., 2021). For minimal stimulation, we used a collimated DMD-coupled LED was employed to restrict 470 nm illumination to a small and well-defined region within layer 2/3 of the prelimbic mPFC, and carefully adjusted the illumination radius such that one step smaller (by 1 μm) illumination results in failure to evoke EPSCs. Our typical illumination area ranged between 3–4 μm, as shown in Figure S3A. Under this minimal illumination area, we confirmed unimodal distributions for the EPSC parameters (amplitude, rise time, decay time and time to peak; Figure 3B-E). Otherwise, we excluded the recordings from analysis. We hope this explanation provides a clearer understanding of the figure's significance.

      (5) Note that CTZ seems to alter p_r at some synapses.

      We acknowledge that CTZ can increase release probability by blocking presynaptic K<sup>+</sup> currents. Indeed, Ishikawa and Takahashi (2001) reported that CTZ slowed the repolarizing phase of presynaptic action potentials and the frequency of miniature EPSCs in the calyx synapses. Consistently, we observed a slight increase in the baseline EPSC amplitude, from 33.3 pA to 41.9 pA (p=0.045) following the application of 50 µM CTZ. However, given that vesicular release probability (p<sub>v</sub>) is already close to 1 at the synapse of our interest, we believe that the observed effect is more likely attributed to an increase in release sites occupancy (p<sub>occ</sub>), which would be reflected as an increase in miniature EPSC frequency in Ishikawa and Takahashi (2001). Given that PPR depends on p<sub>v</sub> rather than p<sub>occ</sub>, this increase in p<sub>occ</sub> would not critically change our conclusion that AMPA receptor desensitization is not responsible for the strong PPD.

      Reference

      Ishikawa, T., & Takahashi, T. (2001). Mechanisms underlying presynaptic facilitatory effect of cyclothiazide at the calyx of Held of juvenile rats. The Journal of Physiology, 533(2), 423-431.

      (6) Figure 8B. The result in Figure 8C seems important, but I couldn't figure out why behaviour was not altered during the acquisition phase summarized in Figure 8B. Perhaps this could be explained more clearly for non-experts.

      Little difference in freezing behavior during acquisition has been also observed when prelimbic persistent firing was optogenetically inhibited (Gilmartin, 2013). Not only CS (tone) but also other sensory inputs (visual and olfactory etc.) and the spatial context could be a cue predicting US (shock). Moreover, during the acquisition phase, the presence of the electric shock inherently induces a freezing response as a natural defensive behavior, which may obscure specific behavioral changes related to the associative learning process. Therefore, the freezing behavior during acquisition cannot be regarded as a sign for specific association of CS and US. Instead, on the next day, we specifically evaluated the CS-US association of the conditioned animals by measuring freezing behavior in response to CS in a distinct context. We explicitly documented little difference between WT and KD animals during the acquisition phase in the relevant paragraph (line 397).

  2. Apr 2025
    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary:

      It seems as if the main point of the paper is about the new data related to rat fish although your title is describing it as extant cartilaginous fishes and you bounce around between the little skate and ratfish. So here's an opportunity for you to adjust the title to emphasize ratfish is given the fact that leader you describe how this is your significant new data contribution. Either way, the organization of the paper can be adjusted so that the reader can follow along the same order for all sections so that it's very clear for comparative purposes of new data and what they mean. My opinion is that I want to read, for each subheading in the results, about the the ratfish first because this is your most interesting novel data. Then I want to know any confirmation about morphology in little skate. And then I want to know about any gaps you fill with the cat shark. (It is ok if you keep the order of "skate, ratfish, then shark, but I think it undersells the new data).

      The main points of the paper are 1) to define terms for chondrichthyan skeletal features in order to unify research questions in the field, and 2) add novel data on how these features might be distributed among chondrichthyan clades. However, we agree with the reviewer that many readers might be more interested in the ratfish data, so we have adjusted the order of presentation to emphasize ratfish throughout the manuscript.

      Strengths:

      The imagery and new data availability for ratfish are valuable and may help to determine new phylogenetically informative characters for understanding the evolution of cartilaginous fishes. You also allude to the fossil record.

      Thank you for the nice feedback.

      Opportunities:

      I am concerned about the statement of ratfish paedomorphism because stage 32 and 33 were not statistically significantly different from one another (figure and prior sentences). So, these ratfish TMDs overlap the range of both 32 and 33. I think you need more specimens and stages to state this definitely based on TMD. What else leads you to think these are paedomorphic? Right now they are different, but it's unclear why. You need more outgroups.

      Sorry, but we had reported that the TMD of centra from little skate did significantly increase between stage 32 and 33. Supporting our argument that ratfish had features of little skate embryos, TMD of adult ratfish centra was significantly lower than TMD of adult skate centra (Fig1). Also, it was significantly higher than stage 33 skate centra, but it was statistically indistinguishable from that of stage 33 and juvenile stages of skate centra. While we do agree that more samples from these and additional groups would bolster these data, we feel they are sufficiently powered to support our conclusions for this current paper.

      Your headings for the results subsection and figures are nice snapshots of your interpretations of the results and I think they would be better repurposed in your abstract, which needs more depth.

      We have included more data summarized in results sub-heading in the abstract as suggested (lines 32-37).

      Historical literature is more abundant than what you've listed. Your first sentence describes a long fascination and only goes back to 1990. But there are authors that have had this fascination for centuries and so I think you'll benefit from looking back. Especially because several of them have looked into histology and development of these fishes.

      I agree that in the past 15 years or so a lot more work has been done because it can be done using newer technologies and I don't think your list is exhaustive. You need to expand this list and history which will help with your ultimate comparative analysis without you needed to sample too many new data yourself.

      We have added additional recent and older references: Kölliker, 1860; Daniel, 1934; Wurmbach, 1932; Liem, 2001; Arratia et al., 2001.

      I'd like to see modifications to figure 7 so that you can add more continuity between the characters, illustrated in figure 7 and the body of the text.

      We address a similar comment from this reviewer in more detail below, hoping that any concerns about continuity have been addressed with inclusion of a summary of proposed characters in a new Table 1, re-writing of the Discussion, and modified Fig7 and re-written Fig7 legend.

      Generally Holocephalans are the outgroup to elasmobranchs - right now they are presented as sister taxa with no ability to indicate derivation. Why isn't the catshark included in this diagram?

      While a little unclear exactly what was requested, we restructured the branches to indicate that holocephalans diverged earlier from the ancestors that led to elasmobranchs. Also in response to this comment, we added catshark (S. canicula) and little skate (L. erinacea) specifically to the character matrix.

      In the last paragraph of the introduction, you say that "the data argue" and I admit, I am confused. Whose data? Is this a prediction or results or summary of other people's work? Either way, could be clarified to emphasize the contribution you are about to present.

      Sorry for this lack of clarity, and we have changed the wording in this revision to hopefully avoid this misunderstanding.

      Reviewer #1 (Recommendations For The Authors):

      Further Strengths and Opportunities:

      Your headings for the results subsection and figures are nice snapshots of your interpretations of the results and I think they would be better repurposed in your abstract, which needs more depth. It's a little unusual to try and state an interpretation of results as the heading title in a results section and the figures so it feels out of place. You could also use the headings as the last statement of each section, after you've presented the results. In order I would change these results subheadings to:

      Tissue Mineral Density (TMD)

      Tissue Properties of Neural Arches

      Trabecular mineralization

      Cap zone and Body zone Mineralization Patterns

      Areolar mineralization

      Developmental Variation

      Sorry, but we feel that summary Results sub-headings are the best way to effectively communicate to readers the story that the data tell, and this style has been consistently used in our previous publications. No changes were made.

      You allude to the fossil record and that is great. That said historical literature is more abundant than what you've listed. Your first sentence describes a long fascination and only goes back to 1990. But there are authors that have had this fascination for centuries and so I think you'll benefit from looking back. Especially because several of them have looked into histology of these fishes. You even have one sentence citing Coates et al. 2018, Frey et al., 2019 and ørvig 1951 to talk about the potential that fossils displayed trabecular mineralization. That feels like you are burying the lead and may have actually been part of the story for where you came up with your hypothesis in the beginning... or the next step in future research. I feel like this is really worth spending some more time on in the intro and/or the discussion.

      We’ve added older REFs as pointed out above. Regarding fossil evidence for trabecular mineralization, no, those studies did not lead to our research question. But after we discovered how widespread trabecular mineralization was in extant samples, we consulted these papers, which did not focus on the mineralization patterns per se, but certainly led us to emphasize how those patterns fit in the context of chondrichthyan evolution, which is how we discussed them.

      I agree that in the past 15 years or so a lot more work has been done because it can be done using newer technologies. That said there's a lot more work by Mason Dean's lab starting in 2010 that you should take a look at related to tesserae structure... they're looking at additional taxa than what you did as well. It will be valuable for you to be able to make any sort of phylogenetic inference as part of your discussion and enhance the info your present in figure 7. Go further back in time... For example:

      de Beer, G. R. 1932. On the skeleton of the hyoid arch in rays and skates. Quarterly Journal of Microscopical Science. 75: 307-319, pls. 19-21.

      de Beer, G. R. 1937. The Development of the Vertebrate Skull. The University Press, Oxford.

      Indeed, we have read all of Mason’s work, citing 9 of his papers, and where possible, we have incorporated their data on different species into our Discussion and Fig7. Thanks for the de Beer REFs. While they contain histology of developing chondrichthyan elements, they appear to refer principally to gross anatomical features, so were not included in our Intro/Discussion.

      Most sections within the results, read more like a discussion than a presentation of the new data and you jump directly into using an argument of those data too early. Go back in and remove the references or save those paragraphs for the discussion section. Particularly because this journal has you skip the method section until the end, I think it's important to set up this section with a little bit more brevity and conciseness. For instance, in the first section about tissue mineral density, change that subheading to just say tissue mineral density. Then you can go into the presentation of what you see in the ratfish, and then what you see in the little skate, and then that's it. You save the discussion about what other elasmobranch's or mineralizing their neural arches, etc. for another section.

      We dramatically reduced background-style writing and citations in each Results section (other than the first section of minor points about general features of the ratfish, compared to catshark and little skate), keeping only a few to briefly remind the general reader of the context of these skeletal features.

      I like that your first sentence in the paragraph is describing why you are doing. a particular method and comparison because it shows me (the reader) where you're sampling from. Something else is that maybe as part of the first figure rather than having just each with the graph have a small sketch for little skate and catch shark to show where you sampled from for comparative purposes. That would relate back, then to clarifying other figures as well.

      Done (also adding a phylogenetic tree).

      Second instance is your section on trabecular mineralization. This has so many references in it. It does not read like results at all. It looks like a discussion. However, the trabecular mineralization is one of the most interesting aspect of this paper, and how you are describing it as a unique feature. I really just want a very clear description of what the definition of this trabecular mineralization is going to be.

      In addition to adding Table 1 to define each proposed endoskeletal character state, we have changed the structure of this section and hope it better communicates our novel trabecular mineralization results. We also moved the topic of trabecular mineralization to the first detailed Discussion point (lines 347-363) to better emphasize this specific topic.

      Carry this reformatting through for all subsections of the results.

      As mentioned above, we significantly reduced background-style writing and citations in each Results section.

      I'd like to see modifications to figure 7 so that you can add more continuity between the characters, illustrated in figure 7 and the body of the text. I think you can give the characters a number so that you can actually refer to them in each subsection of the results. They can even be numbered sequentially so that they are presented in a standard character matrix format, that future researchers can add directly to their own character matrices. You could actually turn it into a separate table so it doesn't taking up that entire space of the figure, because there need to be additional taxa referred to on the diagram. Namely, you don't have any out groups in figure 7 so it's hard to describe any state specifically as ancestral and wor derived. Generally Holocephalans are the outgroup to elasmobranchs - right now they are presented as sister taxa with no ability to indicate derivation. Why isn't the catshark included in this diagram?

      The character matrix is a fantastic idea, and we should have included it in the first place! We created Table 1 summarizing the traits and terminology at the end of the Introduction, also adding the character matrix in Fig7 as suggested, including specific fossil and extant species. For the Fig7 branching and catshark inclusion, please see above.

      You can repurpose the figure captions as narrative body text. Use less narrative in the figure captions. These are your results actually, so move that text to the results section as a way to truncate and get to the point faster.

      By figure captions, we assume the reviewer refers to figure legends. We like to explain figures to some degree of sufficiency in the legends, since some people do not read the main text and simply skim a manuscript’s abstract, figures, and figure legends. That said, we did reduce the wording, as requested.

      More specific comments about semantics are listed here:

      The abstract starts negative and doesn't state a question although one is referenced. Potential revision - "Comprehensive examination of mineralized endoskeletal tissues warranted further exploration to understand the diversity of chondrichthyans... Evidence suggests for instance that trabecular structures are not common, however, this may be due to sampling (bring up fossil record.) We expand our understanding by characterizing the skate, cat shark, and ratfish... (Then add your current headings of the results section to the abstract, because those are the relevant takeaways.)"

      We re-wrote much of the abstract, hoping that the points come across more effectively. For example, we started with “Specific character traits of mineralized endoskeletal tissues need to be clearly defined and comprehensively examined among extant chondrichthyans (elasmobranchs, such as sharks and skates, and holocephalans, such as chimaeras) to understand their evolution”. We also stated an objective for the experiments presented in the paper: “To clarify the distribution of specific endoskeletal features among extant chondrichthyans”.

      In the last paragraph of the introduction, you say that "the data argue" and I admit, I am confused. Whose data? Is this a prediction or results or summary of other people's work? Either way, could be clarified to emphasize the contribution you are about to present.

      Sorry for this lack of clarity, and we have changed the wording in this revision to hopefully avoid this misunderstanding.

      In the second paragraph of the TMD section, you mention the synarcual comparison. I'm not sure I follow. These are results, not methods. Tell me what you are comparing directly. The non-centrum part of the synarcual separate from the centrum? They both have both parts... did you mean the comparison of those both to the cat shark? Just be specific about which taxon, which region, and which density. No need to go into reasons why you chose those regions here.. Put into methods and discussion for interpretation.

      We hope that we have now clarified wording of that section.

      Label the spokes somehow either in caption or on figure direction. I think I see it as part of figure 4E, I, and J, but maybe I'm misinterpreting.

      Based upon histological features (e.g., regions of very low cellularity with Trichrome unstained matrix) and hypermineralization, spokes in Fig4 are labelled with * and segmented in blue. We detailed how spokes were identified in main text (lines 241-243; 252-254) and figure legend (lines 597-603).

      Reviewer #2 (Public Review):

      General comment:

      This is a very valuable and unique comparative study. An excellent combination of scanning and histological data from three different species is presented. Obtaining the material for such a comparative study is never trivial. The study presents new data and thus provides the basis for an in-depth discussion about chondrichthyan mineralised skeletal tissues.

      Many thanks for the kind words

      I have, however, some comments. Some information is lacking and should be added to the manuscript text. I also suggest changes in the result and the discussion section of the manuscript.

      Introduction:

      The reader gets the impression almost no research on chondrichthyan skeletal tissues was done before the 2010 ("last 15 years", L45). I suggest to correct that and to cite also previous studies on chondrichthyan skeletal tissues, this includes studies from before 1900.

      We have added additional older references, as detailed above.

      Material and Methods:

      Please complete L473-492: Three different Micro-CT scanners were used for three different species? ScyScan 117 for the skate samples. Catshark different scanner, please provide full details. Chimera Scncrotron Scan? Please provide full details for all scanning protocols.

      We clarified exact scanners and settings for each micro-CT experiment in the Methods (lines 476-497).

      TMD is established in the same way in all three scanners? Actually not possible. Or, all specimens were scanned with the same scanner to establish TMD? If so please provide the protocol.

      Indeed, the same scanner was used for TMD comparisons, and we included exact details on how TMD was established and compared with internal controls in the Methods. (lines 486-488)

      Please complete L494 ff: Tissue embedding medium and embedding protocol is missing. Specimens have been decalcified, if yes how? Have specimens been sectioned non-decalcified or decalcified?

      Please complete L506 ff: Tissue embedding medium and embedding protocol is missing. Description of controls are missing.

      Methods were updated to include these details (lines 500-503).

      Results:

      L147: It is valuable and interesting to compare the degree of mineralisation in individuals from the three different species. It appears, however, not possible to provide numerical data for Tissue Mineral Density (TMD). First requirement, all specimens must be scanned with the same scanner and the same calibration values. This in not stated in the M&M section. But even if this was the case, all specimens derive from different sample locations and have, been preserved differently. Type of fixation, extension of fixation time in formalin, frozen, unfrozen, conditions of sample storage, age of the samples, and many more parameters, all influence TMD values. Likewise the relative age of the animals (adult is not the same as adult) influences TMD. One must assume different sampling and storage conditions and different types of progression into adulthood. Thus, the observation of different degrees of mineralisation is very interesting but I suggest not to link this observation to numerical values.

      These are very good points, but for the following reasons we feel that they were not sufficiently relevant to our study, so the quantitative data for TMD remain scientifically valid and critical for the field moving forward. Critically, 1) all of the samples used for TMD calculations underwent the same fixation protocols, and 2) most importantly, all samples for TMD were scanned on the same micro-CT scanner using the same calibration phantoms for each scanning session. Finally, while the exact age of each adult was not specified, we note for Fig1 that clear statistically significant differences in TMD were observed among various skeletal elements from ratfish, shark, and skate. Indeed, ratfish TMD was considerably lower than TMD reported for a variety of fishes and tetrapods (summarized in our paper about icefish skeletons, who actually have similar TMD to ratfish: https://doi.org/10.1111/joa.13537).

      In response, however, we added a caveat to the paper’s Methods (lines 466-469), stating that adult ratfish were frozen within 1 or 2 hours of collection from the wild, staying frozen for several years prior to thawing and immediate fixation.

      Parts of the results are mixed with discussion. Sometimes, a result chapter also needs a few references but this result chapter is full of references.

      As mentioned above, we reduced background-style writing and citations in each Results section.

      Based on different protocols, the staining characteristics of the tissue are analysed. This is very good and provides valuable additional data. The authors should inform the not only about the staining (positive of negative) abut also about the histochemical characters of the staining. L218: "fast green positive" means what? L234: "marked by Trichrome acid fuchsin" means what? And so on, see also L237, L289, L291

      We included more details throughout the Results upon each dye’s first description on what is generally reflected by the specific dyes of the staining protocols. (lines 178, 180, 184, 223, 227, and 243-244)

      Discussion

      Please completely remove figure 7, please adjust and severely downsize the discussion related to figure 7. It is very interesting and valuable to compare three species from three different groups of elasmobranchs. Results of this comparison also validate an interesting discussion about possible phylogenetic aspects. This is, however, not the basis for claims about the skeletal tissue organisation of all extinct and extant members of the groups to which the three species belong. The discussion refers to "selected representatives" (L364), but how representative are the selected species? Can there be a extant species that represents the entire large group, all sharks, rays or chimeras? Are the three selected species basal representatives with a generalist life style?

      These are good points, and yes, we certainly appreciate that the limited sampling in our data might lead to faulty general conclusions about these clades. In fact, we stated this limitation clearly in the Introduction (lines 126-128), and we removed “representative” from this revision. We also replaced general reference to chondrichthyans in the Title by listing the specific species sampled. However, in the Discussion, we also compare our data with previously published additional species evaluated with similar assays, which confirms the trend that we are concluding. We look forward to future papers specifically testing the hypotheses generated by our conclusions in this paper, which serves as a benchmark for identifying shared and derived features of the chondrichthyan endoskeleton.

      Please completely remove the discussion about paedomorphosis in chimeras (already in the result section). This discussion is based on a wrong idea about the definition of paedomorphosis. Paedomorphosis can occur in members of the same group. Humans have paedormorphic characters within the primates, Ambystoma mexicanum is paedormorphic within the urodeals. Paedomorphosis does not extend to members of different vertebrate branches. That elasmobranchs have a developmental stage that resembles chimera vertebra mineralisation does not define chimera vertebra centra as paedomorphic. Teleost have a herocercal caudal fin anlage during development, that does not mean the heterocercal fins in sturgeons or elasmobranchs are paedomorphic characters.

      We agree with the reviewer that discussion of paedomorphosis should apply to members of the same group. In our paper, we are examining paedomorphosis in a holocephalan, relative to elasmobranch fishes in the same group (Chrondrichthyes), so this is an appropriate application of paedomorphosis. In response to this comment, we clarified that our statement of paedomorphosis in ratfish was made with respect to elasmobranchs (lines 37-39; 418-420).

      L432-435: In times of Gadow & Abott (1895) science had completely wrong ideas bout the phylogenic position of chondrichthyans within the gnathostomes. It is curious that Gadow & Abott (1895) are being cited in support of the paedomorphosis claim.

      If paedomorphosis is being examined within Chondrichthyes, such as in our paper and in the Gadow and Abbott paper, then it is an appropriate reference, even if Gadow and Abbott (and many others) got the relative position of Chondrichthyes among other vertebrates incorrect.

      The SCPP part of the discussion is unrelated to the data obtained by this study. Kawaki & WEISS (2003) describe a gene family (called SCPP) that control Ca-binding extracellular phosphoproteins in enamel, in bone and dentine, in saliva and in milk. It evolved by gene duplication and differentiation. They date it back to a first enamel matrix protein in conodonts (Reif 2006). Conodonts, a group of enigmatic invertebrates have mineralised structures but these structure are neither bone nor mineralised cartilage. Cat fish (6 % of all vertebrate species) on the other hand, have bone but do not have SCPP genes (Lui et al. 206). Other calcium binding proteins, such as osteocalcin, were initially believed to be required for mineralisation. It turned out that osteocalcin is rather a mineralisation inhibitor, at best it regulates the arrangement collagen fiber bundles. The osteocalcin -/- mouse has fully mineralised bone. As the function of the SCPP gene product for bone formation is unknown, there is no need to discuss SCPP genes. It would perhaps be better to finish the manuscript with summery that focuses on the subject and the methodology of this nice study.

      We completely agree with the reviewer that many papers claim to associate the functions of SCPP genes with bone formation, or even mineralization generally. The Science paper with the elephant shark genome made it very popular to associate SCPP genes with bone formation, but we feel that this was a false comparison (for many reasons)! In response to the reviewer’s comments, however, we removed the SCPP discussion points, moving the previous general sentence about the genetic basis for reduced skeletal mineralization to the end of the previous paragraph (lines 435-439). We also added another brief Discussion paragraph afterwards, ending as suggested with a summary of our proposed shared and derived chondrichthyan endoskeletal traits (lines 440-453).

      Reviewer #2 (Recommendations For The Authors):

      Other comments

      L40: remove paedomorphism

      No change; see above

      L53: down tune languish, remove "severely" and "major"

      Done (lines 57-59)

      L86: provide species and endoskeletal elements that are mineralized

      No change; this paragraph was written generally, because the papers cited looked at cap zones of many different skeletal elements and neural arches in many different species

      L130: remove TMD, replace by relative, descriptive, values

      No change; see above

      L135: What are "segmented vertebral neural arches and centra" ?

      Changed to “neural arches and centra of segmented vertebrae” (lines 140-141)

      L166: L168 "compact" vs. "irregular". Partial mineralisation is not necessarily irregular.

      Thanks for pointing out this issue; we changed wording, instead contrasting “non-continuous” and “continuous” mineralization patterns (lines 171-174)

      L192: "several endoskeletal regions". Provide all regions

      All regions provided (lines 198-199)

      L269: "has never been carefully characterized in chimeras". Carefully means what? Here, also only one chimera is analyses, not several species.

      Sentence removed

      302: Can't believe there is no better citation for elasmobranch vertebral centra development than Gadow and Abott (1895)

      Added Arriata and Kolliker REFs here (lines 293-295)

      L318 ff: remove discussion from result chapter

      References to paedomorphism were removed from this Results section

      L342: refer to the species studied, not to the entire group.

      Sorry, the line numbering for the reviewer and our original manuscript have been a little off for some reason, and we were unclear exactly to which line of text this comment referred. Generally in this revision, however, we have tried to restrict our direct analyses to the species analyzed, but in the Discussion we do extrapolate a bit from our data when considering relevant published papers of other species.

      346: "selected representative". Selection criteria are missing

      “selected representative” removed

      L348: down tune, remove "critical"

      Done

      L351: down tune, remove "critical"

      Done

      L 364: "Since stem chondrichthyans did not typically mineralize their centra". Means there are fossil stem chondrichthyans with full mineralised centra?

      Re-worded to “Stem chondrichthyans did not appear to mineralize their centra” (lines 379)

      L379: down tune and change to: "we propose the term "non-tesseral trabecular mineralization. Possibly a plesiomorphic (ancestral) character of chondrichthyans"

      No change; sorry, but we feel this character state needs to be emphasized as we wrote in this paper, so that its evolutionary relationship to other chondrichthyan endoskeletal features, such as tesserae, can be clarified.

      L407: suggests so far palaeontologist have not been "careful" enough?

      Apologies; sentence re-worded, emphasizing that synchrotron imaging might increase details of these descriptions (lines 406-408)

      414: down tune, remove "we propose". Replace by "possibly" or "it can be discussed if"

      Sentence re-worded and “we propose” removed (lines 412-415)

      L420: remove paragraph

      No action; see above

      L436: remove paragraph

      No action; see above

      L450: perhaps add summery of the discussion. A summery that focuses on the subject and the methodology of this nice study.

      Yes, in response to the reviewer’s comment, we finished the discussion with a summary of the current study. (lines 440-453)

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The manuscript investigates the role of the membrane-deforming cytoskeletal regulator protein Abba in cortical development and its potential implications for microcephaly. It is a valuable contribution to the understanding of Abba's role in cortical development. The strengths and weaknesses identified in the manuscript are outlined below:

      Clinical Relevance:

      The authors identified a patient with microcephaly and intellectual disability patient harboring a mutation in the Abba variant (R671W), adding a clinically relevant dimension to the study.

      Mechanistic Insights:

      The study offers valuable mechanistic insights into the development of microcephaly by elucidating the role of Abba in radial glial cell proliferation, radial fiber organization, and the migration of neuronal progenitors. The identification of Abba's involvement in the cleavage furrow during cell division, along with its interaction with Nedd9 and positive influence on RhoA activity, adds depth to our understanding of the molecular processes governing cortical development.

      In Vivo Validation:

      The overexpression of mutant Abba protein (R671W), which results in phenotypic similarities to Abba knockdown effects, supports the significance of Abba in cortical development.

      Weaknesses:

      The findings in the study suggest that heterozygous expression of the R671W variant may exert a dominant-negative effect on ABBA's role, disrupting normal brain development and leading to microcephaly and cognitive delay. However, evidence also points to a possible gain-of-function effect, as the mutation does not decrease RhoA activity or PH3 expression in vivo. Additionally, the impact of ABBA depletion on cell fate is not fully addressed. While abnormal progenitor accumulation in the ventricular and subventricular zones is observed, the transition of progenitors to neuroblasts and their ability to support neuroblast migration remains unclear. Impaired cleavage furrow ingression and disrupted Nedd9 and RhoA signaling could lead to structural abnormalities in radial glial progenitors, affecting their scaffold function and neuroblast progression.  The manuscript lacks an exploration of the loss or decrease in interaction between Abba and NEDD9 in the case of the pathogenic patient-derived mutation in Abba. Furthermore, addressing the changes in localization and ineraction in for NEDD9 following over-expression of the mutant are important to further mehcanistically characterizxe this interaction in future studies. These gaps suggest the need for further exploration of ABBA's role in progenitor cell fate and neuroblast migration to clarify its mechanistic contributions to cortical development.

      (1) Response to statement on dominant-negative vs. gain-of-function effect of R671W variant:

      We appreciate the reviewer’s thoughtful analysis of the potential mechanisms underlying the R671W variant. We agree that the heterozygous expression of the human R671W mutation may initially suggest a dominant-negative effect. However, our data indicate that this variant may instead exert a gain-of-function effect. As highlighted in the discussion section, overexpression of ABBA-R671W in cells that also express wild-type ABBA did not result in a dominant-negative decrease in RhoA activation nor affect PH3 expression in vivo. These findings suggest that the R671W mutation does not impair the canonical ABBA-mediated activation of RhoA, and instead, the resulting phenotype may involve post-mitotic processes, such as altered cell migration. This interpretation is further supported by previous clinical studies reporting additional patients with the same mutation and phenotypic outcomes.

      (2) Response to statement on ABBA depletion and progenitor-to-neuroblast transition:

      We agree that the question of how ABBA depletion affects cell fate and the progression of radial glial progenitors (RGPs) to neuroblasts is of significant importance. Our findings suggest that ABBA knockdown disrupts cleavage furrow ingression, which may block radial glial cells prior to abscission. This likely contributes to the observed accumulation of cells in the ventricular and subventricular zones, as seen in Figures 2A and 4D. Additionally, disrupted Nedd9 expression and impaired RhoA signaling appear to alter the structural integrity of RGPs, leading to detachment of apical and basal endfeet (Supplementary Figure 3). These structural abnormalities compromise the ability of RGPs to function as scaffolds for neuroblast migration. Although direct live imaging of neuroblast migration was beyond the scope of the current dataset, we believe our evidence strongly supports a model in which ABBA depletion disrupts progenitor structure and migration. Future studies will address these transitions more directly using live imaging and fate-mapping strategies

      (3) Response to statement on loss of interaction between ABBA and NEDD9 with the R671W mutation:

      We fully agree with the importance of investigating whether the R671W mutation alters ABBA’s interaction with NEDD9. While our study provides evidence for a role of NEDD9 in mediating ABBA function, we acknowledge that we did not directly test whether the R671W mutation disrupts this interaction. We apologize if our manuscript conveyed the impression that this point had been fully addressed. Due to technical limitations, particularly the poor performance of anti-NEDD9 antibodies in slice immunohistochemistry, we were unable to reliably assess the interaction or localization changes in vivo. Nevertheless, this remains a priority for future studies aimed at better understanding the mechanistic underpinnings of the R671W mutation.

      (4) Response to statement on future directions for mechanistic characterization of NEDD9 localization and interaction:

      We agree with the reviewer that further investigation into NEDD9 localization and its interaction with the ABBA R671W mutant is essential to better define the molecular consequences of this mutation. Unfortunately, as mentioned above, the current tools available to us did not permit reliable immunohistochemical detection of NEDD9 in tissue. We fully intend to pursue alternative approaches, such as tagging strategies or the use of more sensitive detection platforms, to determine whether the R671W mutation affects the subcellular localization or stability of the ABBA-NEDD9 interaction. These experiments will be critical to elucidate the pathway through which ABBA regulates progenitor cell behavior and cortical development.

      Reviewer #2 (Public review):

      Summary:

      Carabalona and colleagues investigated the role of the membrane-deforming cytoskeletal regulator protein Abba (MTSS1L/MTSS2) in cortical development to better understand the mechanisms of abnormal neural stem cell mitosis. The authors used short hairpin RNA targeting Abba20 with a fluorescent reporter coupled with in utero electroporation of E14 mice to show changes to neural progenitors. They performed flow cytometry for in-depth cell cycle analysis of Abba-shRNA impact to neural progenitors and determined an accumulation in S phase. Using culture rat glioma cells and live imaging from cortical organotypic slides from mice in utero electroporated with Abba-shRNA, the authors found Abba played a prominent role in cytokinesis. They then used a yeast-two-hybrid screen to identify three high confidence interactors: Beta-Trcp2, Nedd9, and Otx2. They used immunoprecipitation experiments from E18 cortical tissue coupled with C6 cells to show Abba requirement for Nedd9 localization to the cleavage furrow/cytokinetic bridge. The authors performed an shRNA knockdown of Nedd9 by in utero electroporation of E14 mice and observed similar results as with the Abba-shRNA. They tested a human variant of Abba using in utero electroporation of cDNA and found disorganized radial glial fibers and misplaced, multipolar neurons, but lacked the impact of cell division seen in the shRNA-Abba model.

      Strengths:

      Fundamental question in biology about the mechanics of neural stem cell division.

      Directly connecting effects in Abba protein to downstream regulation of RhoA via Nedd9.

      Incorporation of human mutation in ABBA gene.

      Use of novel technologies in neurodevelopment and imaging.

      Weaknesses:

      Unexplored components of the pathway (such as what neurogenic populations are impacted by Abba mutation) and unleveraged aspects of their data (such as the live imaging) limit the scope of their findings and left significant questions about the effect of ABBA on radial glia development.

      (1) Claim of disorganized radial glial fibers lacks quantifications.

      - On page 11, the authors claim that knockdown of Abba lead to changes in radial glial morphology observed with vimentin staining. Here they claim misoriented apical processes, detached end feet, and decreased number of RGP cells in the VZ. However, they no not provide quantification of process orientation to better support their first claim. Measurements of radial glia fiber morphology (directionality, length) and of angle of division would be metrics that can be applied to data. Some of these analysis could be done in their time-lapse microscopy images, such as to quantify the number of cell division during their period of analysis (though that is short-15 hours).

      Response to: Lack of quantification of disorganized radial glial fibers and cell divisions in time-lapse data

      We appreciate the reviewer’s insightful comment regarding the need for quantification of radial glial (RG) fiber morphology. In the revised manuscript, we have addressed this by providing new quantification of changes in vimentin staining, specifically measuring the dispersion of the signal as a proxy for fiber disorganization (see Supplementary Figure 1). These data support the observed morphological changes, including misoriented apical processes and detachment of endfeet, following Abba knockdown.

      Regarding time-lapse analysis to track cell divisions, we attempted to follow individual cells during the 15-hour imaging window. However, due to the relatively short duration and limited number of cells that could be reliably tracked, the dataset did not allow for statistically meaningful conclusions. As an alternative approach, we performed live-cell imaging using Anillin-GFP, a reliable marker of mitotic progression. The distribution and accumulation of Anillin-GFP were analyzed in ABBA-shRNA3 and control conditions, and the results (now included in Supplementary Figure 3) indicate an increased number of cells arrested in late mitosis upon ABBA knockdown. This supports the notion of disrupted cytokinesis as a consequence of Abba depletion.

      (2) Unclear where effect is:

      - In RG or neuroblasts? Is it in cell cleavage that results in accumulation of cells at VZ (as sometimes indicated by their data like in Fig 2A or 4D)? Interrogation of cell death (such as by cleaved caspase 3) would also help. Given their time lapse, can they identify what is happening to the RG fiber? The authors describe a change in "migration" but do not show evidence for this for either progenitor or neuroblast populations. Given they have nice time-lapse imaging data, could they visualize progenitor versus young neuron migration? Analysis of neuroblasts (such as with doublecortin expression in the tissue) would also help understand any issues in migration (of neurons v stem cells).

      - At cleaveage furrow? In abscission? There is high resolution data that highlights the cleavage furrow as the location of interest (fig 3A), however there is also data (fig 3B) to suggest Abba is expressed elsewhere as well and there is an overall soma decrease. More detail of the localization of Abba during the division process would be helpful-for example, could cleavage furrow proteins, such as Aurora B, co-localization (and potentially co-IP) help delineate subpopulations of Abba protein? Furthermore, the FRET imaging is unique way to connect their mutation with function-could they measure/quantify differences at furrow compared to rest of soma to further corroborate that Abba-associated RhoA effect was furrow-enriched?

      - The data highlights nicely that a furrow doesn't clearly form when ABBA expression and subsequent RhoA activity are decreased (in Fig 3 or 5A). Does this lead to cells that can't divide because of poor abscission, especially since "rounding" still occurs? Or abnormal progenitors (with loss of fiber or inability to support neuroblast migration)? Or abnormal progression of progenitors to neuroblasts?

      Response to: Unclear location of the effect (RG vs. neuroblasts; cleavage furrow/abscission; migration issues)

      We thank the reviewer for this comprehensive and thought-provoking set of questions.

      a) Site of the effect – Radial Glia vs. Neuroblasts:

      Our data suggest that the primary effect of ABBA depletion occurs in radial glial progenitors (RGPs), specifically prior to abscission. We observed accumulation of electroporated cells in the ventricular zone (VZ), which we interpret as a result of cytokinetic failure (e.g., Figure 2A, 4D). We also documented detachment of apical and basal endfeet (see Supplementary Figure 3), further supporting structural disruption of RG fibers.

      b) Cell death analysis:

      We considered using cleaved caspase-3 as a marker for apoptosis, but due to its transient and non-specific activation during development, we opted to assess overall survival via quantification of RGP cell numbers and localization. This approach better reflects the developmental impact of ABBA knockdown (Supplementary Figure 3).

      c) Migration defects:

      We agree that distinguishing between progenitor and neuroblast migration would be highly informative. Although we did not perform doublecortin or similar staining to differentiate these populations in this dataset, the accumulation of electroporated cells in VZ/SVZ strongly suggests a migration deficit. Addressing this in detail will require new experiments using lineage-specific markers and longer time-lapse recordings, which we plan to explore in future studies.

      d) Cleavage furrow and abscission:

      Our high-resolution imaging of Anillin-GFP and FRET-based RhoA activity shows that ABBA localizes predominantly at the cleavage furrow. New quantifications of RhoA activity (now in Figure 5) show that the reduction in signaling is most pronounced at the furrow in ABBA knockdown cells. These findings align with the hypothesis that ABBA, through Nedd9 and RhoA, is essential for proper furrow formation and abscission.

      e) Mechanistic implications:

      As the reviewer notes, ABBA knockdown leads to cells that "round" but do not complete division, likely due to poor cleavage furrow ingression. This could generate abnormal progenitors that are structurally compromised (detached fibers) and thus unable to support neuroblast migration or proper differentiation. The cumulative result is disrupted progression from RGPs to neuroblasts, impaired structural scaffolding, and possibly reduced cell viability.

      (3) Limited to a singular time point of mouse cortical development

      On page 13, the authors outline the results of their Y2H screen with the identification of three high confidence interactors. Notably, they used a E10.5-E12.5 mouse brain embryo library rather than one that includes E14, the age of their in utero electroporation mice. Many of the authors' claims focus on in utero electroporation of shRNA-Abba of E14 mice that are then evaluated at E16-18. Justification for the focus on this age range should be included to support that their findings can then be applied to all of mouse corticogenesis.

      Response to: Use of E10.5–E12.5 library for yeast-two-hybrid (Y2H) screen

      We appreciate the reviewer’s concern regarding the developmental stage of the Y2H library. We chose the E10.5–E12.5 brain embryo library based on prior work demonstrating that ABBA expression is strongest during early cortical development, particularly in radial glia at these stages (see Saarikangas et al., J Cell Sci 2008). The radial glia-specific expression of ABBA was previously validated using RC2 and Tuj1 markers at E12.5. Thus, the library we used is well-suited for identifying interactors relevant to radial glial function, including Nedd9. We have clarified this rationale in the revised manuscript.

      (4) Detail of the effect of the human variant of the ABBA mutation in mouse is lacking.

      Their identification of the R671W mutation is interesting and the IUE model warrants more characterization, as they did with their original KD experiments.

      - Could they show that Abba protein levels are decreased (in either cell lines or electroporated tissue)?

      - While time-lapse morphology might not have been performed, more analysis on cell division phenotype (such as plane of division and radial glia morphology) would be helpful.

      Response to: Lack of detail on R671W human variant effects

      We thank the reviewer for encouraging further characterization of the R671W variant. In the revised manuscript, we now provide additional data on interkinetic nuclear migration (INM) defects resulting from R671W overexpression (see Supplementary Figure 3). These changes are consistent with disrupted radial glial organization and mirror aspects of the ABBA knockdown phenotype.

      a) Protein levels:

      We quantified ABBA expression in cells overexpressing the R671W variant (Supplementary Figure 5) and found no significant reduction compared to wild-type. This argues against a loss-of-function mechanism and supports a gain-of-function or dominant-interfering effect.

      b) Morphological and division phenotyping:

      While time-lapse imaging of R671W-expressing cells was not available in our dataset, we acknowledge that analyses such as division angle or radial glial morphology would be informative. Unfortunately, we were unable to perform these with the current data, but we agree these are important goals for future work.

      Reviewer 2 conclusion:

      The resubmission has addressed many of the questions raised.

      I have a few comments that should be addressed:

      (1) The authors maintain a deficit in "migration of immature neurons" which remains unsubstantiated. In their resonse, they state: "we believe that the data showing the accumulation of migrating electroporated cells in the ventricular (V) and subventricular (SV) zones provide compelling evidence of abnormal migration in ABBA-shRNA electroporated cells. "

      - Firstly, they do not demonstrate that it's immature neurons, not RGs, that are affected. Secondly, accumulation of cells at the V-SVZ could be due to soley the inability for the RGC to undergo mitosis, therefore remaining stuck"

      The commentary of migration, especially of neurons, should be modified.

      We appreciate the reviewer’s careful reading and valid concern regarding our use of the term "migration of immature neurons." We fully agree that the current dataset does not definitively distinguish whether the accumulated cells in the ventricular (V) and subventricular (SV) zones are immature neurons or radial glial progenitors (RGPs) arrested in mitosis.

      To clarify, our observations indicate that electroporated cells accumulate in the VZ/SVZ following ABBA knockdown (Figures 2A and 4D), and this was interpreted as evidence of impaired migration. However, we now recognize that this accumulation may primarily reflect a block in cell cycle progression—specifically, at the stage of cleavage furrow ingression and abscission—rather than a migratory defect per se. This is supported by our new data using Anillin-GFP (Supplementary Figure 3), which show increased accumulation of cells with persistent Anillin expression, consistent with mitotic arrest. Furthermore, the detachment of apical and basal processes (also shown in Supplementary Figure 3) suggests that ABBA knockdown affects the structural integrity of RGPs, potentially compromising their scaffold function.

      In light of these points, we have revised the manuscript to temper our conclusions regarding “migration defects.” Specifically, we now refer to the phenotype as “abnormal accumulation of progenitor cells” and clarify that, while these findings are consistent with impaired cell progression or scaffolding required for migration, we do not directly demonstrate impaired migration of immature neurons. As suggested, addressing this would require additional analyses, such as time-lapse imaging of post-mitotic cells or staining with markers like Doublecortin, which are beyond the scope of the current dataset but will be a focus of future investigations.

      We thank the reviewer again for encouraging a more precise interpretation of our findings

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Supplementary Fig 4B - The figure doesn't show an increase in percentage of PH3 positive cells in the NEDD9-shRNA condition. The control images are also missing for comparison. The figure legend needs to be corrected to match with the figure showing no significant changes.

      Thank you for this comment. This has been amended in the revised manuscript in the form of a new revised Supplementary Fig 4.

      Reviewer #2 (Recommendations for the authors):

      Minor annotations for slice culture assay

      The authors should make note of ages of slice cultures in text and have better annotations of slice cultures (for example, in Fig 4-where is mitosis?)

      We are sorry for the mistake it's not mitosis, it's the cleavage furrow stage.  In addition, a new amended Figure 4 is provided. 

      The effects are hard to see in lower mag slice images in Fig. 6. Would recommend focusing on higher mag to highlight RG differences.

      Thank you for this comment. This has been amended in the revised manuscript in the form of a new revised Figure 6.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, Xiao et al. classified retroperitoneal liposarcoma (RPLS) patients into two subgroups based on whole transcriptome sequencing of 88 patients. The G1 group was characterized by active metabolism, while the G2 group exhibited high scores in cell cycle regulation and DNA damage repair. The G2 group also displayed more aggressive molecular features and had worse clinical outcomes compared to G1. Using a machine learning model, the authors simplified the classification system, identifying LEP and PTTG1 as the key molecular markers distinguishing the two RPLS subgroups. Finally, they validated these markers in a larger cohort of 241 RPLS patients using immunohistochemistry. Overall, the manuscript is clear and well-organized, with its significance rooted in the large sample size and the development of a classification method.

      Thank you for your positive assessment of our study on classifying RPLS patients based on whole transcriptome sequencing. We appreciate your recognition of the distinct characteristics of the G1 and G2 groups, as well as the significance of our simplified classification system and the identification of LEP and PTTG1 as key molecular markers. Your acknowledgment of the clarity and organization of our manuscript, along with the importance of the large sample size, is greatly appreciated. We will continue to refine our work based on your feedback as we prepare for resubmission.

      Weakness:

      (1) While the authors suggest that LEP and PTTG1 serve as molecular markers for the two RPLS groups, the process through which these genes were selected remains unclear. The authors should provide a detailed explanation of the selection process.

      The selection criteria for identifying LEP and PTTG1 as biomarkers involved selecting prognostic genes that were highly expressed in C1 and C2, respectively, and achieved the highest AUC value in distinguishing the two RPLS groups (Page17 lines 288-290).

      (2) To ensure the broader applicability of LEP and PTTG1 as classification markers, the authors should validate their findings in one or two external datasets.

      We sincerely appreciate your insightful suggestion regarding the external validation of LEP and PTTG1 as classification biomarkers. To address this concern, we performed an independent validation using an external liposarcoma cohort (GSE30929; Page 6, Lines 104-105)), which comprises 140 primary liposarcoma samples with annotated clinicopathological and survival data. This dataset was selected due to its relevance to RPLS (N=63, 45%) and the availability of distant recurrence-free survival (DRFS) outcomes, aligning with the clinical focus of our study. 

      Applying our previously established prognostic model (Risk value = 2.182 × PTTG1 - 2.204 × LEP) to this cohort, we stratified patients into high- and low-risk groups using the median risk score as the cutoff. Consistent with our original findings, the high-risk group exhibited significantly worse DRFS compared to the low-risk group. The ROC curves based on the 1-, 3-, 5-year survival status of patients demonstrated that this model can effectively predict patient DRFS (log-rank P < 0.001, Figure S3A-B). Furthermore, the high-risk group demonstrated a higher proportion of high-grade histology (P < 0.001, Fisher’s exact test, Figure S3C-D).

      These results validate the robustness and generalizability of our risk stratification model across distinct liposarcoma cohorts. The external dataset’s alignment with our findings underscores the potential of LEP and PTTG1 as reproducible biomarkers for prognosis and therapeutic stratification in liposarcoma. We have incorporated these validation results into the revised manuscript (Page 18, Lines 305-315) to strengthen the clinical applicability of our conclusions.

      (3) Since molecular subtyping is often used to guide personalized treatment strategies, it is recommended that the authors evaluate therapeutic responses in the two distinct groups. Additionally, they should validate these predictions using cell lines or primary cells.

      We sincerely appreciate your insightful comments and suggestions regarding the evaluation of therapeutic responses and the validation of our predictions using cell lines or primary cells. We would like to address these points in detail below:

      (1) Purpose of the PTTG1- and LEP-based RPLS Classification Model

      The primary objective of our study was to develop a molecular subtyping model based on PTTG1 and LEP to guide personalized treatment strategies for patients with RPLS, particularly those classified as low-grade by traditional histopathological criteria but exhibiting poor prognosis. This subgroup of patients may benefit from more aggressive surgical resection, which is a potentially curative approach for RPLS. Our model aims to identify these high-risk patients to ensure complete tumor resection, thereby improving their clinical outcomes.

      (2) Therapeutic Response Evaluation in Distinct Groups

      In both our validation cohort and external validation cohort, surgical resection was the primary treatment modality for RPLS. After stratifying patients using our model, we observed significant differences in surgical outcomes between the two groups: the high-risk group exhibited poor prognosis, while the low-risk group showed favorable outcomes (Figure 5D-E and Figure S3A-B). Importantly, our model successfully identified low-grade histopathological cases with poor prognosis, who might otherwise be undertreated (Figure 5G-I and Figure S3C-D). By advocating for more thorough surgical resection in these high-risk patients, we aim to improve their prognosis. This achievement aligns with the primary goal of our study, which is to provide a molecular tool for personalized treatment guidance.

      (3) Future Validation and Functional Exploration of PTTG1 and LEP

      Our study has identified PTTG1 and LEP as key biomarkers for RPLS classification, and we recognize the urgent need to elucidate their molecular functions in RPLS pathogenesis. Here, we are pleased to report that we have already initiated cellular and animal experiments to investigate the roles of PTTG1 and LEP in RPLS. These experiments aim to validate our predictions and explore the underlying mechanisms by which these biomarkers contribute to tumor behavior and treatment response. We anticipate that the results of these studies will provide further mechanistic insights and will be submitted for publication in a suitable journal in the near future.

      Reviewer #2 (Public review):

      Surgical resection remains the most effective treatment for retroperitoneal liposarcoma. However, postoperative recurrence is very common and is considered the main cause of disease-related death. Considering the importance and effectiveness of precision medicine, the identification of molecular characteristics is particularly important for the prognosis assessment and individualized treatment of RPLS. In this work, the authors described the gene expression map of RPLS and illustrated an innovative strategy of molecular classification. Through the pathway enrichment of differentially expressed genes, characteristic abnormal biological processes were identified, and RPLS patients were simply categorized based on the two major abnormal biological processes. Subsequently, the classification strategy was further simplified through nonnegative matrix factorization. The authors finally narrowed the classification indicators to two characteristic molecules LEP and PTTG1, and constructed novel molecular prognosis models that presented obviously a great area under the curve. A relatively interpretable logistic regression model was selected to obtain the risk scoring formula, and its clinical relevance and prognostic evaluation efficiency were verified by immunohistochemistry. Recently, prognostic model construction has been a hot topic in the field of oncology. The interesting point of this study is that it effectively screened characteristic molecules and practically simplified the typing strategy on the basis of ensuring high matching clinical relevance. Overall, the study is well-designed and will serve as a valuable resource for RPLS research.

      Thank you for your insightful feedback on our manuscript. We appreciate your recognition of the importance of precision medicine and molecular characteristics in improving prognosis and individualized treatment for RPLS.

      We are pleased that you found our gene expression mapping and innovative molecular classification strategy valuable. Your positive remarks on our pathway enrichment analysis and the categorization of RPLS patients based on abnormal biological processes affirm our approach.

      We are also grateful for your acknowledgment of our focus on the characteristic molecules LEP and PTTG1, as well as the development of novel molecular prognosis models with significant predictive capability.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Joint Public Review:

      Summary

      In this manuscript, Dong et al. study the directed cell migration of tracheal stem cells in Drosophila pupae. The authors study how the directionality of these cells is regulated along the dorsal trunk. They show that inter-organ communication between the tracheal stem cells and the nearby fat body plays a role in posterior migration. They provide compelling evidence that Upd2 production in the fat body and JAK/STAT activation in the tracheal stem cells play a role. Moreover, they show that JAK/STAT signalling might induce the expression of apicobasal and planar cell polarity genes in the tracheal stem cells which appear to be needed to ensure unidirectional migration. Finally, the authors suggest that trafficking and vesicular transport of Upd2 from the fat body towards the tracheal cells might be important.

      Strengths

      The manuscript is well written and presents extensive and varied experimental data to show a link between Upd2-JAK/STAT signaling from the fat body and tracheal progenitor cell migration. The authors provide convincing evidence that the fat body, located near the trachea, secretes vesicles containing the Upd2 cytokine and that affecting JAK-STAT signaling results in aberrant migration of some of the tracheal stem cells towards the anterior. Using ChIP-seq as well as analysis of GFP-protein trap lines of planar cell polarity genes in combination with RNAi experiments, the authors show that STAT92E likely regulates the transcription of planar cell polarity genes and some apicobasal cell polarity genes in tracheal stem cells which appear to be needed for unidirectional migration. The work presented here provides some novel insights into the mechanism that ensures polarized migration of tracheal stem cells, preventing bidirectional migration. This might have important implications for other types of directed cell migration in invertebrates or vertebrates including cancer cell migration. Overall, the authors have substantially improved their manuscript since the first submission but there are still some weaknesses.

      Weaknesses

      Overall, the manuscript lacks insights into the potential significance of the observed phenotypes and of the proposed new signaling model. Most of our concerns could be dealt with by adjusting the text (explaining some parts better and toning down some statements).

      (1) Directional migration of tracheal progenitors is only partially compromised, with some cells migrating anteriorly and others maintaining their posterior migration, a quite discrete phenotype.

      The strongest migration defects quantified in graphs (e.g. 100 μm) are not shown in images, since they would be out of frame, it would be beneficial to see them. In addition, the consequence of defects in polarized migration on tracheal development is not clear and data showing phenotypes on the final trachea morphology in pupae are not explained nor linked to the previous phenotypes.

      We agree with you that it is informative to show strong anterior migration (> 100 μm). Accordingly, we have shown examples in Figure 3B and Figure 7R-S. In addition, we have also discuss on the links between migration defects and the consequential phenotypes of the animal at a later developmental stage in the revised manuscript. The undisciplined migration leads to insufficient regeneration and incomplete remodeling of airway and causes pupal lethality.

      (2) Some important information is lacking, such as the origin of mutant and UAS-RNAi lines, which are not reported in the material and methods. For instance, mutants for components of the JAK-STAT pathway are used but not described. Are they all viable at the pupal stage? Otherwise, pupae would not be homozygous mutants. From the figure legend, it seems that the Stat92EF allele has been used, which is a point mutation, thus not leading to an absence of protein. If the hopTUM allele has been used, as mentioned in the legend, it is a gain-of-function allele. Thus, the authors should not conclude that "The aberrant anterior migration of tracheal progenitors in the absence of JAK/STAT components led to impairment of tracheal integrity and caused melanization in the trachea (Figure 3-figure supplement 1E-I)".

      We apologize for inadequate description of the experimental materials and methods. We have listed the stock number of mutant and RNAi alleles in Key resource table and Materials. The mutant alleles that we chose to examine can survive to pupal stage, which is key to the success of our subsequent characterization of these mutants. According to your suggestion, we modified the statement for accuracy.

      (3) The authors observe that tracheal progenitors display a polarized distribution of Fat that is controlled by JAK-STAT signaling. However, this conclusion is made from a single experiment using only 3 individuals with no statistics. This is insufficient to support the claim that "JAK/STAT signaling promotes the expression of genes involved in planar cell polarity leading to asymmetric localization of Fat in progenitor cells", as mentioned in the abstract, or that "the activated tracheal progenitors establish a disciplined migration through the asymmetrical distribution of polarity proteins which is directed by an Upd2-JAK/STAT signaling stemming from the remote organ of fat body."

      We performed multiple biological replicates for Ft distribution experiments and observed similar trend, although we only showed three representative samples. In the revised text, we have included n number for statistic representation and statistic test.

      (4) The authors demonstrate that Upd2 is transported through vesicles from the fat body to the tracheal progenitors. It remains somewhat unclear in the proposed model how Upd2 activates JAK-STAT signaling. Are vesicles internalized, as it seems to be proposed, and thus how does Upd2 activate JAK-STAT signaling intracellularly? Or is Upd2 released from vesicles to bind Dome extracellularly to activate the JAK-STAT pathway? Moreover, it is not clear nor discussed what would be the advantage of transporting the ligand in vesicles compared to classical ligand diffusion.

      We do not know whether the association between Upd2 and Lbm is inside or outside vesicles. The vesicular trafficking of Upd2 is our observation and supported by various genetic and biochemical experiments. Our research does not imply the message that this vesicular trafficking has advantage over diffusion.

    1. Author response:

      The following is the authors’ response to the original reviews

      Joint Public Review:

      Idiopathic scoliosis (IS) is a common spinal deformity. Various studies have linked genes to IS, but underlying mechanisms are unclear such that we still lack understanding of the causes of IS. The current manuscript analyzes IS patient populations and identifies EPHA4 as a novel associated gene, finding three rare variants in EPHA4 from three patients (one disrupting splicing and two missense variants) as well as a large deletion (encompassing EPHA4) in a Waardenburg syndrome patient with scoliosis. EPHA4 is a member of the Eph receptor family. Drawing on data from zebrafish experiments, the authors argue that EPHA4 loss of function disrupts the central pattern generator (CPG) function necessary for motor coordination.

      The main strength of this manuscript is the human genetic data, which provides convincing evidence linking EPHA4 variants to IS. The loss of function experiments in zebrafish strongly support the conclusion that EPHA4 variants that reduce function lead to IS.

      The conclusion that disruption of CPG function causes spinal curves in the zebrafish model is not well supported. The authors' final model is that a disrupted CPG leads to asymmetric mechanical loading on the spine and, over time, the development of curves. This is a reasonable idea, but currently not strongly backed up by data in the manuscript. Potentially, the impaired larval movements simply coincide with, but do not cause, juvenile-onset scoliosis. Support for the authors' conclusion would require independent methods of disrupting CPG function and determining if this is accompanied by spine curvature. At a minimum, the language of the manuscript could be toned down, with the CPG defects put forward as a potential explanation for scoliosis in the discussion rather than as something this manuscript has "shown". An additional weakness of the manuscript is that the zebrafish genetic tools are not sufficiently validated to provide full confidence in the data and conclusions.

      We highly appreciate the reviewer’s insightful comments and the acknowledgment of the main values of our study. We agree with the reviewer that further experiments are needed to fully establish the relationship between CPG and scoliosis. In response, we have revised the conclusion in the manuscript to better reflect this. Additionally, we conducted further analyses on the mutants to provide additional evidence supporting this concept.

      Reviewer #1 (Recommendations for the authors):

      Epha4a mutant zebrafish exhibited mild spinal curves, mostly laterally and in the tail. This was 75% of homozyous mutants but also, surprisingly, about 20% of heterozygotes. epha4b mutants also developed some mild scoliosis. If the two zebrafish paralogs can compensate for each other (partial redundancy), we might expect more severe scoliosis in double mutants. Did the authors generate and analyze double mutants? I believe it would be very useful for this study to report the zebrafish phenotype of loss of both paralogs together.

      We appreciate the reviewer’s insightful comment regarding the potential value of reporting the phenotype of eph4a/eph4b double mutants. While we fully agree that this analysis would be valuable, our attempts to generate double mutants have been unsuccessful. These two genes are closely linked on the chromosome, with less than 100 kb separating them, which makes it challenging to generate double mutants through standard genetic crossing. Establishing a double mutant line would require more than a year due to the technical constraints of the process. Although we are unable to address this question directly at this time, we hypothesize that eph4a/eph4b double mutants may exhibit a higher likelihood of body axis abnormalities based on the phenotypes observed in single mutants and the known functions of these genes.

      We hope this perspective will provide some useful context despite the limitations.

      In Figure 1F, a pCDK5 western blot is performed as a readout of EPH4A signaling after either WT or C849Y mutant EPH4A is transfected into HEK 293T cells. It would be useful to mention in the text, or at least the figure legend, how this experiment was performed/where the protein samples came from. It is included in the methods, but in the main text, it simply says "we conducted western blotting" without mentioning whether the protein samples were from cell lines, patients, or another source.

      Sorry for our ignorance. A detailed description of the western blotting conduction was supplemented at both “results” part (page 8, line 187-190) and the Figure 1 legend.

      Was the relative turn angle biased to the left or right side of the fish? (i.e. is a positive angle a rightward or leftward turn?)

      We are sorry for our unclear description. In Figure 3D, positive angle means turning left, while negative angle means turning right. In wild-type larvae, the average turning angle over a 4-minute period is approximately 0, whereas in mutants, this value deviates from 0, indicating a directional preference (positive for leftward and negative for rightward turns) in swimming behavior during the recording period. We have also made the necessary supplementation in the text and figure legend.

      In Figure 4, morpholinos rather than mutants are used, but it is not clear why. Has it been established that the MO used disrupts gene function specifically? Can the effect of the MO be rescued by expressing a wild-type mRNA of Epha4a? Does MO knockdown induce spinal curves if fish are raised? Indeed, this could be a way to determine whether the spinal curves are caused by early events in development (when MOs are active).

      Thanks for the comments. The efficacy of relevant MOs has been well-documented in numerous previous studies (Addison et al., 2018; Cavodeassi et al., 2013; Letelier et al., 2018; Royet et al., 2017). Following this reviewer’s suggestion, we have raised the epha4a morphants into adults, while no scoliosis were observed, suggesting that the spinal curvature formation may be induced by long-term defects in the absence of Epha4a. Additionally, we reconfirmed the abnormal motor neuron activation frequency phenotype in the mutants background. The corresponding data have replaced the original Figure 4 in the manuscript. 

      References

      (1) Addison, M., Xu, Q., Cayuso, J., and Wilkinson, D.G. (2018). Cell Identity Switching Regulated by Retinoic Acid Signaling Maintains Homogeneous Segments in the Hindbrain. Dev Cell 45, 606-620 e603.

      (2) Cavodeassi, F., Ivanovitch, K., and Wilson, S.W. (2013). Eph/Ephrin signalling maintains eye field segregation from adjacent neural plate territories during forebrain morphogenesis. Development 140, 4193-4202.

      (3) Letelier, J., Terriente, J., Belzunce, I., Voltes, A., Undurraga, C.A., Polvillo, R., Devos, L., Tena, J.J., Maeso, I., Retaux, S., et al. (2018). Evolutionary emergence of the rac3b/rfng/sgca regulatory cluster refined mechanisms for hindbrain boundaries formation. Proc Natl Acad Sci U S A 115, E3731-E3740.

      (4) Royet, A., Broutier, L., Coissieux, M.M., Malleval, C., Gadot, N., Maillet, D., Gratadou-Hupon, L., Bernet, A., Nony, P., Treilleux, I., et al. (2017). Ephrin-B3 supports glioblastoma growth by inhibiting apoptosis induced by the dependence receptor EphA4. Oncotarget 8, 23750-23759.

      Reviewer #2 (Recommendations for the authors):

      Supplementary Table 3 is missing.

      Sorry for any inconvenience caused to the reviewers. Due to the size of the supplementary Table 3, we have separately uploaded an Excel file as supplementary materials. We have also double-checked during the resubmission process of the revised manuscript. Thanks for your thorough review.

      The authors report only a single mutant allele for zebrafish epha4a and epha4b. Additionally, they provide no information about how many generations each allele has been outcrossed. The authors should provide some type of validation that the phenotypes they describe result from loss of function of the targeted gene and not from an off-targeting event.

      Thanks for the comments. For epha4a and epha4b mutants, each homozygous mutant was initially derived from the self-crossing of first filial generation heterozygotes, and subsequent homozygous generations were maintained for fewer than three rounds of in-crossing. Interestingly, we observed a reduction in the incidence of scoliosis across successive generations. This trend may be attributed to potential genetic compensation mechanisms, which could mitigate the phenotypic severity over time. To address concerns about possible off-target effects, we synthesized and injected epha4a mRNA to test for phenotypic rescue. Our data show that epha4a mRNA injection partially restored swimming coordination in the mutants (Fig. S5). Moreover, similar motor coordination defects have been reported in Epha4-deficient mice, as documented in previous studies (Kullander et al., 2003; Borgius et al., 2014). These findings collectively strengthen the hypothesis that Epha4a plays a critical role in regulating motor coordination.

      References

      (1) Borgius, L., Nishimaru, H., Caldeira, V., Kunugise, Y., Low, P., Reig, R., Itohara, S., Iwasato, T., and Kiehn, O. (2014). Spinal glutamatergic neurons defined by EphA4 signaling are essential components of normal locomotor circuits. J Neurosci 34, 3841-3853.

      (2) Kullander, K., Butt, S.J., Lebret, J.M., Lundfald, L., Restrepo, C.E., Rydstrom, A., Klein, R., and Kiehn, O. (2003). Role of EphA4 and EphrinB3 in local neuronal circuits that control walking. Science 299, 1889-1892.

      The authors need to provide allele designations for the mutant alleles following accepted nomenclature guidelines.

      Thank you for your careful review! We have reviewed and made revisions to the genes and mutation symbols throughout the entire text.

      The three antisense morpholino oligonucleotides need to be validated for efficacy and specificity.

      Thanks for the comments. The morpholinos were extensively used and validated in previous studies, and the efficacy of these morpholinos has been thoroughly validated in multiple studies (Addison et al., 2018; Cavodeassi et al., 2013; Letelier et al., 2018; Royet et al., 2017). Furthermore, we also performed swimming behavior analysis in the mutant background, which showed similar results as the morphants. Moreover, we also performed rescue experiments to confirm the specificity of the mutants (Fig. S5). Finally, we reconfirmed the abnormal calcium signaling in the mutants (Fig. 4), which further support our previous knockdown results.

      References

      (1) Addison, M., Xu, Q., Cayuso, J., and Wilkinson, D.G. (2018). Cell Identity Switching Regulated by Retinoic Acid Signaling Maintains Homogeneous Segments in the Hindbrain. Dev Cell 45, 606-620 e603.

      (2) Cavodeassi, F., Ivanovitch, K., and Wilson, S.W. (2013). Eph/Ephrin signalling maintains eye field segregation from adjacent neural plate territories during forebrain morphogenesis. Development 140, 4193-4202.

      (3) Letelier, J., Terriente, J., Belzunce, I., Voltes, A., Undurraga, C.A., Polvillo, R., Devos, L., Tena, J.J., Maeso, I., Retaux, S., et al. (2018). Evolutionary emergence of the rac3b/rfng/sgca regulatory cluster refined mechanisms for hindbrain boundaries formation. Proc Natl Acad Sci U S A 115, E3731-E3740.

      (4) Royet, A., Broutier, L., Coissieux, M.M., Malleval, C., Gadot, N., Maillet, D., Gratadou-Hupon, L., Bernet, A., Nony, P., Treilleux, I., et al. (2017). Ephrin-B3 supports glioblastoma growth by inhibiting apoptosis induced by the dependence receptor EphA4. Oncotarget 8, 23750-23759.

      Line 229. "While in consistent with previous reports, the hindbrain rhombomeric boundaries were found to be defective....". This sentence is not clear. Please describe how it is "inconsistent".

      Thanks for the comments and sorry for the unclear description, we have described this more clearly in our revised manuscript (page 9, line 229-230).

      Animals frequently are described as "heterozygous mutants" or "mutants". Please make clear that the latter are homozygous mutant animals.

      Thanks for the comments. In the manuscript, all references to mutants specifically indicate homozygous mutants. Heterozygous mutants are explicitly identified as such.

      The chromatin interaction portion of the Methods does not include any information on how these experiments were conducted or where the data were obtained. This information needs to be provided.

      Thanks for your advice. The detailed information of chromatin interaction mapping has been provided in “Methods and Materials” (page 18-19, line 450-455). Information about the interacting regions was derived from Hi-C datasets of 21 tissues and cell types provided by GSE87112. The significance of interactions for Hi-C datasets was computed by Fit-Hi-C, with an FDR ≤ 10-6 considered significant.

      The authors present single-cell RNA-seq data in Supplementary Figure 5 for which they cite Cavone et al, 2021. This seems like an odd database to use. Can the authors provide an explanation for choosing it? In any case, the citation should also be made in the Supplementary Figure 5 legend.

      Thank you for your rigorous comment, we have cited this literature in the proper place of the revised manuscript. Cavone et al. used the her4.3:GFP line to label ependymo-radial glia (ERG) progenitor cells and performed single-cell RNA-seq on FACS-isolated fluorescent cells. The isolated cells included not only ERG progenitors but also undifferentiated and differentiated neurons and oligodendrocytes. The authors attributed this to the relative stability of the GFP protein, which remained in the progeny of GFP-expressing her4.3+ ERG progenitor cells, thus effectively acting as a short-term cell lineage tracer. Indeed, clustering analysis of this data successfully identifies neural progenitors and other neural clusters. Therefore, we consider that this scRNA-seq data encompasses a comprehensive range of neural cell types and is suitable for analyzing the expression of genes of interest. Furthermore, we downloaded and analyzed the scRNA-seq data of the zebrafish nervous system reported by Scott et al. in 2021 (Fig. S7B) (Scott et al., 2021). Despite differences in the developmental stages of the larvae analyzed (Cavone et al. examined larvae at 4 dpf, whereas Scott et al. analyzed larvae at 24, 36, and 48 hpf), our findings are consistent. Specifically, epha4a and epha4b are expressed in interneurons, whereas efnb3a and efnb3b are enriched in floor plate cells.

      References

      (1) Scott, K., O'Rourke, R., Winkler, C.C., Kearns, C.A., and Appel, B. (2021). Temporal single-cell transcriptomes of zebrafish spinal cord pMN progenitors reveal distinct neuronal and glial progenitor populations. Dev Biol 479, 37-50.

      In Figure Legend 1, "expressed from the EPHA4-mutant plasmid" is not an accurate description of the experiment.

      Sorry for the previous inaccurate description. The description has been revised to accurately reflect the experiment. “Western blot analysis of EPHA4-c.2546G>A variant showing the protein expression levels of EPHA4 and CDK5 and the amount of phosphorylated CDK5 (pCDK5) in HEK293T cells transfected with EPHA4-mutant or EPHA4-WT plasmid”.

      Figure 3 panels J and K need more explanation. I don't understand what the different colors represent nor do I understand what are wild type and what are mutant data.

      Thank you for your valuable feedback. We apologize for the lack of clarity in the original figure legend. To address this, we have revised the legend of Figure 3 to provide a more detailed explanation. In panels J and K, each color-coded curve represents the response of an individual larva from an independent experimental trial to the stimulus. Specifically, panel J depicts the response data for the wild-type larvae, whereas panel K presents the response data for the homozygous epha4a mutants.

      Please provide the genotypes for the images in Figure 5A.

      Thanks for the comments and we are sorry for our unclear description, we have described this more clearly in the Figure 5.

      Figure legend 6B should also note the heterozygote data with the wild type and homozygous mutant data.

      Thanks for the comments, the data are now included in Figure 6B.

      Epha4 and Efnb3 have well-established roles in axon guidance. Although this is noted in the Discussion, I think a more extensive description of prior findings would be helpful.

      Thanks for your valuable feedback. A more detailed description of the roles of Epha4 and Efnb3 in axon guidance was provided in the “Discussion” (page 16, line 388-396).

      The main conclusion of this manuscript is that EPHA4 variants cause IS by disrupting central pattern generator function. I think this is misleading. I think that the more valid conclusion is that EPHA4 loss of function causes axon pathfinding defects that impair locomotion by disrupting CPG activity, thereby leading to IS. I urge the authors to consider this more nuanced interpretation.

      Thank you for your insightful comments. We appreciate your suggestion to refine our main conclusion. We agree that the proposed revision more accurately reflects our findings and will revise the manuscript accordingly to state that “EPHA4 loss of function causes axon pathfinding defects, which impair locomotion by disrupting central pattern generator activity, potentially leading to IS.”

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, Seidenthal et al. investigated the role of the C. elegans Flower protein, FLWR-1, in synaptic transmission, vesicle recycling, and neuronal excitability. They confirmed that FLWR-1 localizes to synaptic vesicles and the plasma membrane and facilitates synaptic vesicle recycling at neuromuscular junctions, albeit in an unexpected manner. The authors observed that hyperstimulation results in endosome accumulation in flwr-1 mutant synapses, suggesting that FLWR-1 facilitates the breakdown of endocytic endosomes, which differs from earlier studies in flies that suggested the Flower protein promotes the formation of bulk endosomes. This is a valuable finding. Using tissue-specific rescue experiments, the authors showed that expressing FLWR-1 in GABAergic neurons restored the aldicarb-resistant phenotype seen in flwr-1 mutants to wild-type levels. In contrast, FLWR-1 expression in cholinergic neurons in flwr-1 mutants did not restore aldicarb sensitivity, yet muscle expression of FLWR-1 partially but significantly recovered the aldicarb-resistant defects. The study also revealed that removing FLWR-1 leads to increased Ca<sup>2+</sup> signaling in motor neurons upon photo-stimulation. Further, the authors conclude that FLWR-1 contributes to the maintenance of the excitation/inhibition (E/I) balance by preferentially regulating the excitability of GABAergic neurons. Finally, SNG-1::pHluorin data imply that FLWR-1 removal enhances synaptic transmission, however, the electrophysiological recordings do not corroborate this finding.

      Strengths:

      This study by Seidenthal et al. offers valuable insights into the role of the Flower protein, FLWR-1, in C. elegans. Their findings suggest that FLWR-1 facilitates the breakdown of endocytic endosomes, which marks a departure from its previously suggested role in forming endosomes through bulk endocytosis. This observation could be important for understanding how Flower proteins function across species. In addition, the study proposes that FLWR-1 plays a role in maintaining the excitation/inhibition balance, which has potential impacts on neuronal activity.

      Weaknesses:

      One issue is the lack of follow-up tests regarding the relative contributions of muscle and GABAergic FLWR-1 to aldicarb sensitivity. The findings that muscle expression of FLWR-1 can significantly rescue aldicarb sensitivity are intriguing and may influence both experimental design and data interpretation. Have the authors examined aldicarb sensitivity when FLWR-1 is expressed in both muscles and GABAergic neurons, or possibly in muscles and cholinergic neurons? Given that muscles could influence neuronal activity through retrograde signaling, a thorough examination of FLWR-1's role in muscle is necessary, in my opinion.

      We thank the reviewer for this suggestion. Indeed, the retrograde inhibition of cholinergic transmission by signals from muscle has been demonstrated by the Kaplan lab in a number of publications. We have now done the experiments that were suggested, see the new Fig. S3B: rescuing FLWR-1 in cholinergic neurons and in muscle did not perform any better in the aldicarb assay, while co-rescue in GABAergic neurons and muscle, like rescue in GABA neurons, led to a complete rescue to wild type levels. Thus, retrograde signaling from muscle to neurons does not contribute to effects on the E/I imbalance caused by the absence of FLWR1. The fact that muscle rescue can partially rescue the flwr-1 phenotype is likely due a cellautonomous effect of FLWR-1 on muscle excitability, facilitating muscle contraction.

      Would the results from electrophysiological recordings and GCaMP measurements be altered with muscle expression of FLWR-1? Most experiments presented in the manuscript compare wild-type and flwr-1 mutant animals. However, without tissue-specific knockout, knockdown, or rescue experiments, it is difficult to separate cell-autonomous roles from non-cell-autonomous effects, in particular in the context of aldicarb assay results. Also, relying solely on levamisole paralysis experiments is not sufficient to rule out changes in muscle AChRs, particularly due to the presence of levamisole-resistant receptors.

      We repeated the Ca<sup>2+</sup> imaging in cholinergic neurons, in response to optogenetic activation, with expression of FLWR-1 in muscle, see Fig. 4E. This did not significantly alter the increased excitability of the flwr-1 mutant. Thus, we conclude that, along with the findings in aldicarb assays, the function of FLWR-1 in muscle is cell-autonomous, and does not indirectly affect its roles in the motor neurons. Also, cholinergic expression of FLWR-1 by itself reduced Ca<sup>2+</sup> levels to those in wild type (Fig. 4E). In addition, we now also assessed the contribution of the N-AChR (ACR-16) to aldicarb-induced paralysis (Fig. S3C), showing that flwr-1 and acr-16 mutations independently mediate aldicarb resistance, and that these effects are additive. Thus, FLWR-1 does not affect the expression level or function of the N-AChR, as otherwise, the flwr1; acr-16 double mutation would not exacerbate the phenotype of the single mutants.

      This issue regarding the muscle role of FLWR-1 also complicates the interpretation of results from coelomocyte uptake experiments, where GFP secreted from muscles and coelomocyte fluorescence were used to estimate endocytosis levels. A decrease in coelomocyte GFP could result from either reduced endocytosis in coelomocytes or decreased secretion from muscles. Therefore, coelomocytespecific rescue experiments seem necessary to distinguish between these possibilities.

      We have performed a rescue of FLWR-1 in coelomocytes to address this, and found that this fully recovered the CC GFP signals to wild type levels. Therefore, the absence of FLWR-1 in muscles does not affect exocytosis of GFP. The data can be found in Fig. 5A, B.

      The manuscript states that GCaMP was used to estimate Ca<sup>2+</sup> levels at presynaptic sites. However, due to the rapid diffusion of both Ca<sup>2+</sup> and GCaMP, it is unclear how this assay distinguishes Ca<sup>2+</sup> levels specifically at presynaptic sites versus those in axons. What are the relative contributions of VGCCs and ER calcium stores here? This raises a question about whether the authors are measuring the local impact of FLWR-1 specifically at presynaptic sites or more general changes in cytoplasmic calcium levels.

      We compared Ca<sup>2+</sup> signals in synaptic puncta versus axon shafts, and did not find any differences. The data previously shown have been replaced by data where the ROIs were restricted to synaptic puncta. The outcome is the same as before. These data are provided in Fig. 4A, B, E, F. We thus conclude that the impact of FLWR-1 is local, in synaptic boutons.

      The experiments showing FLWR-1's presynaptic localization need clarification/improvement. For example, data shown in Fig. 3B represent GFP::FLWR-1 is expressed under its own promoter, and TagRFP::ELKS-1 is expressed exclusively in GABAergic neurons. Given that the pflwr-1 drives expression in both cholinergic and GABAergic neurons, and there are more cholinergic synapses outnumbering GABAergic ones in the nerve cord, it would be expected that many green FLWR-1 puncta do not associate with TagRFP::ELKS-1. However, several images in Figure 3B suggest an almost perfect correlation between FLWR-1 and ELKS-1 puncta. It would be helpful for the readers to understand the exact location in the nerve cord where these images were collected to avoid confusion.

      Thank you for making us aware that the provided images may be misleading. We have now extended this Figure (Fig. 3A-C) and provided more intensity profiles along the nerve cords in Fig. S4A-C. The quantitative analysis of average R<sup>2</sup> for the two fluorescent signals in each neuron type did not show any significant difference between the two, also after choosing slightly smaller ROIs for line scan analysis. We also highlighted the puncta corresponding to FLWR-1 in both neurons types, as well as to ELKS-1 in each specific neuron type, to identify FLWR-1 puncta without co-localized ELKS-1 signal. Also, we indicated the region that was imaged, i.e. the DNC posterior of the vulva, halfway to the posterior end of the nerve cord.

      The SNG-1::pHluorin data in Figure 5C is significant, as they suggest increased synaptic transmission at flwr-1 mutant synapses. However, to draw conclusions, it is necessary to verify whether the total amount of SNG-1::pHluorin present on synaptic vesicles remains the same between flwr-1 mutant and wild-type synapses. Without this comparison, a conclusion on levels of synaptic vesicle release based on changes in fluorescence might be premature, in particular given the results of electrophysiological recordings.

      We appreciate the comment. We now added data and experiments that verify that the basal SNG-1::pHluorin signal in the plasma membrane, measured at synaptic puncta and in adjacent axonal areas, is not different in flwr-1 mutants compared to wild type in the absence of stimulation. This data can be found in Fig. S5A. In addition, we cultured primary neurons from transgenic animals to compare total SNG-1::pHluorin to the vesicular fraction, by adding buffers of defined pH to the external, or buffers that penetrate the cell and fix intracellular pH. These experiments (Fig. S5B, C) showed no difference in the vesicle fraction of the pHluorin signal in wild type vs. flwr-1 mutant cells, demonstrating that flwr-1 mutants do not per se have altered SNG-1::pHluorin in their SV or plasma membranes.

      Finally, the interpretation of the E74Q mutation results needs reconsideration. Figure 8B indicates that the E74Q variant of FLWR-1 partially loses its rescuing ability, which suggests that the E74Q mutation adversely affects the function of FLWR-1. Why did the authors expect that the role of FLWR-1 should have been completely abolished by E74Q? Given that FLWR-1 appears to work in multiple tissues, might FLWR-1's function in neurons requires its calcium channel activity, whereas its role in muscles might be independent of this feature? While I understand there is ongoing debate about whether FLWR1 is a calcium channel, the experiments in this study do not definitively resolve local Ca<sup>2+</sup> dynamics at synapses. Thus, in my opinion, it may be premature to draw firm conclusions about calcium influx through FLWR-1.

      Thank you for bringing this up. We did not expect E74Q to necessarily abolish FLWR-1 function, unless it would be a Ca<sup>2+</sup> channel. Of course the reviewer is right, FLWR-1 might have functions as an ion channel as well as channel-independent functions. Yet, we are quite confident that FLWR-1 is not an ion channel. Instead, we think that E74Q alters stability of the protein (however, in the absence of biochemical data, we removed this conclusion), and that this impairs the function of FLWR-1 as a modulator, or possibly even, accessory subunit of the PMCA MCA-3. This interaction was indicated by a new experiment we added, where we found that FLWR-1 and MCA-3 must be physically very close to each other in the plasma membrane, using bimolecular fluorescence complementation (see new Fig. 9A, B). This provides a reasonable explanation for findings we obtained, i.e. increased Ca<sup>2+</sup> levels in stimulated neurons of the flwr-1 mutant. If FLWR-1 acts as a stimulatory subunit of MCA-3, then its absence may cause reduced MCA-3 function and thus an accumulation of Ca<sup>2+</sup> in the synaptic terminals. In Drosophila, hyperstimulation of neurons led to reduced Ca<sup>2+</sup> levels (Yao et al., 2017, PLoS Biol 15: e2000931), suggesting that Flower is a Ca<sup>2+</sup> channel. Based on our findings, we suggest an alternative explanation. Based on proteomics, the PMCA is a component of SVs (Takamori et al., 2006, Cell 127: 831-846). Increased insertion of PMCA into the plasma membrane during high stimulation, along with impaired endocytosis in flower mutants, would increase the steadystate levels of PMCA in the PM. This could lead to reduced steady state levels of Ca<sup>2+</sup>. This ‘g.o.f.’ in Flower may also impact on Ca<sup>2+</sup> microdomains of the P/Q type VGCC required for SV fusion, which could contribute to the rundown of EPSCs we find during synaptic hyperstimulation (Fig. 5G-J). We acknowledge, though, that Yao et al. (2009, Cell 138: 947– 960), showed increased uptake of Ca<sup>2+</sup> into liposomes reconstituted with purified Flower protein. However, it cannot be ruled out that a protein contaminant could be responsible, as the controls were empty liposomes, not liposomes reconstituted with a mutated Flower protein purified the same way.

      We also tested the E74Q mutant in its ability to rescue the reduced PI(4,5)P<sub>2</sub> levels in coelomocytes (CCs), where we observed no positive effect. While we have not measured Ca<sup>2+</sup> in CCs, we would assume that here a function of FLWR-1 affecting increased PI(4,5)P<sub>2</sub> levels is not linked to a channel function. It was, nevertheless, compromised by E74Q (Fig. 8D).

      Also, the aldicarb data presented in Figures 8B and 8D show notable inconsistencies that require clarification. While Figure 8B indicates that the 50% paralysis time for flwr-1 mutant worms occurs at 3.5-4 hours, Figure 8D shows that 50% paralysis takes approximately 2.5 hours for the same flwr-1 mutants. This discrepancy should be addressed. In addition, the manuscript mentions that the E74Q mutation impairs FLWR-1 folding, which could significantly affect its function. Can the authors show empirical data supporting this claim?

      We performed the aldicarb assays in a consistent manner, but nonetheless note that some variability from day to day can affect such outcomes. Importantly, we always measured each control (wild type, flwr-1) along with each test strain (FLWR-1 point mutants), to ensure the relevant estimate of a point-mutant’s effect. These assays have been repeated, now including the FLWR-1 wild type rescue strain as a comparison. The data are now combined in Fig. 8B. Regarding the assumed instability of the E74Q mutant, as we, indeed, do not have any experimental data supporting this, we removed this sentence.

      Reviewer #2 (Public review):

      Summary:

      The Flower protein is expressed in various cell types, including neurons. Previous studies in flies have proposed that Flower plays a role in neuronal endocytosis by functioning as a Ca<sup>2+</sup> channel. However, its precise physiological roles and molecular mechanisms in neurons remain largely unclear. This study employs C. elegans as a model to explore the function and mechanism of FLWR-1, the C. elegans homolog of Flower. This study offers intriguing observations that could potentially challenge or expand our current understanding of the Flower protein. Nevertheless, further clarification or additional experiments are required to substantiate the study's conclusions.

      Strengths:

      A range of approaches was employed, including the use of a flwr-1 knockout strain, assessment of cholinergic synaptic activity via analyzing aldicarb (a cholinesterase inhibitor) sensitivity, imaging Ca<sup>2+</sup> dynamics with GCaMP3, analyzing pHluorin fluorescence, examination of presynaptic ultrastructure by EM, and recording postsynaptic currents at the neuromuscular junction. The findings include notable observations on the effects of flwr-1 knockout, such as increased Ca<sup>2+</sup> levels in motor neurons, changes in endosome numbers in motor neurons, altered aldicarb sensitivity, and potential involvement of a Ca<sup>2+</sup>-ATPase and PIP2 binding in FLWR-1's function.

      Weaknesses:

      (1) The observation that flwr-1 knockout increases Ca<sup>2+</sup> levels in motor neurons is notable, especially as it contrasts with prior findings in flies. The authors propose that elevated Ca<sup>2+</sup> levels in flwr-1 knockout motor neurons may stem from "deregulation of MCA-3" (a Ca<sup>2+</sup> ATPase in the plasma membrane) due to FLWR-1 loss. However, this conclusion relies on limited and somewhat inconclusive data (Figure 7). Additional experiments could clarify FLWR-1's role in MCA-3 regulation. For instance, it would be informative to investigate whether mutations in other genes that cause elevated cytosolic Ca<sup>2+</sup> produce similar effects, whether MCA-3 physically interacts with FLWR-1, and whether MCA-3 expression is reduced in the flwr-1 knockout.

      We thank the reviewer for bringing up these critical points. As to other mutations that produce elevated cytosolic Ca<sup>2+</sup>: Possible mutations could be g.o.f. mutations of the ryanodine receptor UNC-68, the sarco-endoplasmatic Ca<sup>2+</sup> ATPase, or mutants affecting VGCCs, like the L-type channel EGL-19 or the P/Q-type channel UNC-2. However, any such mutant would affect muscle contractions (as we have shown for r.o.f. mutations in unc-68, egl-19 and unc-2 in Nagel et al. 2005 Curr Biol 15: 2279-84) and thus would affect aldicarb assays (see aldicarb resistance induced by RNAi of these genes in Sieburth et al., 2005, Nature 436: 510). The same should be expected for g.o.f. mutations of any such gene. In neurons, we would expect increased or decreased Ca<sup>2+</sup> levels in response to stimulation.

      Regarding the physical interaction of MCA-3 and FLWR-1, we performed bimolecular fluorescence complementation, with two fragments of mVenus fused to the two proteins. This assay shows mVenus reconstitution (i.e., fluorescence) if the two proteins are found in close vicinity to each other. Testing MCA-3 and FLWR-1 in muscle indeed showed a robust signal, evenly distributed on the plasma membrane. As a control, FLWR-1 did not interact with another plasma membrane protein, the stomatin UNC-1 interacting with gap junction proteins (Chen et al., 2007, Curr Biol 17: 1334-9). FLWR-1 also interacted with the ER chaperone Nicalin (NRA2 in C. elegans), which helps assembling the TM domains of integral membrane proteins in association with the SEC translocon. However, this signal only occurred in the ER membrane, demonstrating the specificity of the BiFC assay. This data is presented in Fig. 9A, B. Additionally, we show that FLWR-1 expression has a function in stabilizing MCA-3 localization at synapses, which is also in line with the idea of a direct interaction (Fig. 9C, D).

      (2) In silico analysis identified residues R27 and K31 as potential PIP2 binding sites in FLWR-1. The authors observed that FLWR-1(R27A/K31A) was less effective than wild-type FLWR-1 in rescuing the aldicarb sensitivity phenotype of the flwr-1 knockout, suggesting that FLWR-1 function may depend on PIP2 binding at these two residues. Given that mutations in various residues can impair protein function non-specifically, additional studies may be needed to confirm the significance of these residues for PIP2 binding and FLWR-1 function. In addition, the authors might consider explicitly discussing how this finding aligns or contrasts with the results of a previous study in flies, where alanine substitutions at K29 and R33 impaired a Flower-related function (Li et al., eLife 2020).

      We further investigated the role of these two residues in an in vivo assay for PIP2 binding and membrane association of a reporter. We used the coelomocytes (CCs), in which a previous publication demonstrated that a GFP variant tagged with a PH domain would be recruited to the CC membrane (Bednarek et al., 2007, Traffic 8: 543-53). This assay was performed in wild type, flwr-1 mutants, and flwr-1 mutants rescued with wild type FLWR-1, the FLWR-1(E74Q) mutant, or the FLWR-1(K27A; R31A) double mutant. The data are shown in Fig. 8C, D. While the wild type FLWR-1 rescued PH-GFP levels at the CC membrane to the wild type control, the FLWR-1(K27A; R31A) double mutant did not rescue the reporter binding, indicating that, at least in CCs, reduced PIP2 levels are associated with non-functional FLWR-1. Mechanistically, this is not clear at present, though we noted a possible mechanism as found for synaptotagmin, that recruits the PIP2 kinase to the plasma membrane via a lysine and arginine containing motif (Bolz et al., 2023, Neuron 111: 3765-3774.e3767). We mention this now in the discussion. We also discussed our data with respect to the findings of Li et al., about the analogous residues K27, R31 (K29, R33) in the discussion section, i.e. lines 667-670, and the differences of our findings in electron microscopy compared to the Drosophila work (more rather than less bulk endosomes) were discussed in lines 713-720.

      (3) A primary conclusion from the EM data was that FLWR-1 participates in the breakdown, rather than the formation, of bulk endosomes (lines 20-22). However, the reasoning behind this conclusion is somewhat unclear. Adding more explicit explanations in the Results section would help clarify and strengthen this interpretation.

      We added a sentence trying to better explain our reasoning. Mainly, the argument is that accumulation of such endosomes of unusually large size is seen in mutants affecting formation of SVs from the endosome (in endophilin and synaptojanin mutants), while mutants affecting mainly endocytosis (dynamin) cause formation of many smaller endocytic structures that stay attached to the plasma membrane (Kittelmann et al., 2013, PNAS 110: E3007-3016). We changed our data analysis in that we collated the data for what we previously termed endosomes and large vesicles. According to the paper by Watanabe, 2013, eLife 2: e00723, endosomes are defined by their location in the synapse, and their size. However, this work used a much shorter stimulus and froze the preparations within a few dozens to hundreds of msec after the stimulus, while we used the protocol of Kittelmann 2013, which uses 30 sec stimulation and freezing after 5 sec. There, endosomes were defined as structures larger than SVs or DCVs, but no larger than 80 nm, with an electron dense lumen, and were very rarely observed. In contrast, large vesicles or ‘100 nm vesicles’, ranged from 50-200 nm diameter, with a clear lumen, were morphologically similar to the bulk endosomes as observed by Li et al., 2021. We thus reordered our data and jointly analyzed these structure as large vesicles / bulk endosomes. The outcome is still the same, i.e. photostimulated flwr-1 mutants showed more LVs than wild type synapses.

      (4) The aldicarb assay results in Figure 3 are intriguing, indicating that reduced GABAergic neuron activity alone accounts for the flwr-1 mutant's hyposensitivity to aldicarb. Given that cholinergic motor neurons also showed increased activity in the flwr-1 mutant, one might expect the flwr-1 mutant to display hypersensitivity to aldicarb in the unc-47 knockout background. However, this was not observed. The authors might consider validating their conclusion with an alternative approach or, at the minimum, providing a plausible explanation for the unexpected result. Since aldicarb-induced paralysis can be influenced by factors beyond acetylcholine release from cholinergic motor neurons, interpreting aldicarb assay results with caution may be advisable. This is especially relevant here, as FLWR-1 function in muscle cells also impacts aldicarb sensitivity (Figure S3B). Previous electrophysiological studies have suggested that aldicarb sensitivity assays may sometimes yield misleading conclusions regarding protein roles in acetylcholine release.

      We tested the unc-47; flwr-1 animals again at a lower concentration of aldicarb, to see if the high concentration may have leveled the differences between unc-47 animals and the double mutant. This experiment is shown in Fig. S3D, demonstrating that the double mutant is significantly less resistant to aldicarb. This verifies that FLWR-1 acts not only in GABAergic neurons, but also in cholinergic neurons (as we saw by electron microscopy and electrophysiology), and that the increased excitability of cholinergic cells leads to more acetylcholine being released. In the double mutant, where GABA release is defective, this conveys hypersensitivity to aldicarb.

      (5) Previous studies have suggested that the Flower protein functions as a Ca<sup>2+</sup> channel, with a conserved glutamate residue at the putative selectivity filter being essential for this role. However, mutating this conserved residue (E74Q) in C. elegans FLWR-1 altered aldicarb sensitivity in a direction opposite to what would be expected for a Ca<sup>2+</sup> channel function. Moreover, the authors observed that E74 of FLWR1 is not located near a potential conduction pathway in the FLWR-1 tetramer, as predicted by Alphafold3. These findings raise the possibility that Flower may not function as a Ca<sup>2+</sup> channel. While this is a potentially significant discovery, further experiments are needed to confirm and expand upon these results.

      As above, we do not exclude that FLWR-1 may constitute a channel, however, based on our findings, AF3 structure predictions and data in the literature, we are considering alternative explanations for the observed effect on Ca<sup>2+</sup> levels of Flower mutants in worms and flies. The observations of increase Ca<sup>2+</sup> levels in stimulated flwr-1 mutant neurons could result from a reduced stimulation of the PMCA, and this was also observed with low stimulation in Drosophila (Yao et al., 2017). This idea is supported by the indications of a direct physical interaction, or proximity, of the two proteins. The reduced Ca<sup>2+</sup> levels after hyperstimulation of Drosophila Flower mutants may have to do with increased levels of non-recycling PMCA in the plasma membrane, indicating that PMCA requires Flower for recycling. This could be underlying the rundown of evoked PSCs we find in worm flwr-1 mutants, and would also be in line with a function of FLWR-1 and MCA-3 in coelomocytes, cells that constantly endocytose, and in which both proteins are required for proper function (our data, Figs. 5A, B; 8D, E) and Bednarek et al., 2007 (Traffic 8: 543-553). CCs need to recycle / endocytose membranes and membrane proteins, and such proteins, likely including FLWR-1 and MCA-3, need to be returned to the PM effectively.

      We thus refrained from testing a putative FLWR-1 channel function in Xenopus oocytes, in part also because we would not be able to acutely trigger possible FLWR-1 gating. A constitutive Ca<sup>2+</sup> current, if it were present, would induce large Cl<sup>-</sup> conductance in oocytes, that would likely be problematic / killing the cells. The demonstration that FLWR-1(E74Q) does not rescue the PI(4,5)P<sub>2</sub> levels in coelomocytes is also more in line with a non-channel function of FLWR-1.

      (6) Phrases like "increased excitability" and "increased Ca<sup>2+</sup> influx" are used throughout the manuscript. However, there is no direct evidence that motor neurons exhibit increased excitability or Ca<sup>2+</sup> influx. The authors appear to interpret the elevated Ca<sup>2+</sup> signal in motor neurons as indicative of both increased excitability and Ca<sup>2+</sup> influx. However, this elevated Ca<sup>2+</sup> signal in the flwr-1 mutant could occur independently of changes in excitability or Ca<sup>2+</sup> influx, such as in cases of reduced MCA-3 activity. The authors may wish to consider alternative terminology that more accurately reflects their findings.

      Thank you, we rephrased the imprecise wording. Ca<sup>2+</sup> influx was meant with respect to the cytosol.

      Reviewer #3 (Public review):

      Summary:

      Seidenthal et al. investigated the role of the Flower protein, FLWR-1, in C. elegans and confirmed its involvement in endocytosis within both synaptic and non-neuronal cells, possibly by contributing to the fission of bulk endosomes. They also uncovered that FLWR-1 has a novel inhibitory effect on neuronal excitability at GABAergic and cholinergic synapses in neuromuscular junctions.

      Strengths:

      This study not only reinforces the conserved role of the Flower protein in endocytosis across species but also provides valuable ultrastructural data to support its function in the bulk endosome fission process. Additionally, the discovery of FLWR-1's role in modulating neuronal excitability broadens our understanding of its functions and opens new avenues for research into synaptic regulation.

      Weaknesses:

      The study does not address the ongoing debate about the Flower protein's proposed Ca<sup>2+</sup> channel activity, leaving an important aspect of its function unexplored. Furthermore, the evidence supporting the mechanism by which FLWR-1 inhibits neuronal excitability is limited. The suggested involvement of MCA-3 as a mediator of this inhibition lacks conclusive evidence, and a more detailed exploration of this pathway would strengthen the findings.

      We added new data showing the likely direct interaction of FLWR-1 with the PMCA, possibly upregulating / stimulating its function. This data is shown now in Fig. 9A, B. Also, we show now that FLWR-1 is required to stabilize MCA-3 expression / localization in the pre-synaptic plasma membrane (Fig. 9C, D). These findings are not supporting the putative function of FLWR-1 as an ion channel, but suggest that increased Ca<sup>2+</sup> levels following neuron stimulation in flwr-1 mutants are due to an impairment of MCA-3 and thus reduced Ca<sup>2+</sup> extrusion.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The authors might consider focusing on one or two key findings from this study and providing robust evidence to substantiate their conclusions.

      We did substantiate the interactions of FLWR-1 and the PMCA, as well as assessing the function of FLWR-1 in the coelomocytes and the function of FLWR-1 in regulating PIP2 levels in the plasma membrane.

      Reviewer #3 (Recommendations for the authors):

      (1) Behavioral Analysis of Locomotion

      In Figure 1, the authors are encouraged to examine whether flwr-1 mutants show altered locomotion behaviors, such as velocity, in a solid medium.

      We performed such an analysis for wild type, comparing to flwr-1 mutants and flwr-1 mutants rescued with FLWR-1 expressed from the endogenous promoter. The data are shown in Fig. S1C. There was no difference. We note that we observed differences in swimming assays also only when we strongly stimulated the cholinergic neurons by optogenetic depolarization, but not during unstimulated, normal swimming.

      (2) Validation of FLWR-1 Tagging

      In Figure 2A, it is recommended that the authors confirm the functionality of the C-terminal-tagged FLWR-1.

      We performed such rescue assays during swimming. The data is shown in Fig. S2S, E. While the GFP::FLWR-1 animals were slightly affected right after the photostimulation, they quickly caught up with the wild type controls, while flwr-1 mutants remained affected even after several minutes.

      (3) Explanation of Differential Rescue in GABAergic Neurons and Muscle

      The authors should provide a rationale for why restoring FLWR-1 in GABAergic neurons fully rescues the aldicarb resistance phenotype, while its restoration in muscle also partially rescues it.

      We think that these effects are independent of each other, i.e. loss of FLWR-1 in muscles increases muscular excitability, which becomes apparent in the behavioral assay that depends on locomotion and muscle contraction. To assess this further, we performed combined GABAergic neuron and muscle rescue assays, as shown in Fig. S3B. The double rescue was not different from wild type, and performed better than the muscle rescue alone.

      (4) Rescue Experiments for Swimming Defect in GABAergic Neurons

      Consider adding rescue experiments to determine whether expressing FLWR-1 specifically in GABAergic neurons can restore the swimming defect phenotype.

      We did not perform this assay as swimming is driven by cholinergic neurons, meaning that we would only indirectly probe GABAergic neuron function and a GABAergic FLWR-1 rescue would likely not improve swimming much. Also, given the importance of the correct E/I balance in the motor neurons, it would likely require achieving expression levels that are very precisely matching endogenous expression levels, which is not possible in a cell-specific manner.

      (5) Further Data on GCaMP Assay for mca-3; flwr-1 Additive Effect

      The additive effect of the mca-3 and flwr-1 mutations on GCaMP signals requires further data for substantiation. Additional GCaMP recordings or statistical analysis would provide stronger support for the proposed interaction between MCA-3 and FLWR-1 in calcium signaling.

      Thank you. We increased the number of observations, and could thus improve the outcome of the assay in that it became more conclusive. Meaning, the double mutation was not exacerbating the effect of either single mutant, demonstrating that FLWR-1 and MCA-3 are acting in the same pathway. The data are in Fig. 7B, C.

      (6) Inclusion of Wild-Type FLWR-1 Rescue in Figures 8B and 8D

      Figures 8B and 8D would benefit from the inclusion of wild-type FLWR-1 as a rescue control.

      We included the FLWR-1 wild type rescue as suggested and summarized the data in Fig. 8B.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Responses to final minor critiques following initial revision

      Reviewer #1 (Recommendations for the authors): 

      The authors have generally done an excellent job of addressing my and the other reviewers' concerns. I have a few additional concerns that the authors could consider addressing through changes to the text: 

      We thank the Reviewer for this assessment and are glad to have addressed the major points.

      - Regarding the gRNA used for NMR studies, I thank the authors for adding additional rationale for their design of the RNA used. However, I still believe that it is misleading to term this RNA as a "gRNA", given that it is mainly composed of a sequence that is arbitrary (the spacer) and the sections of the gRNA that are constant between all gRNAs are truncated in a way that removes secondary structure that is likely essential for specific contacts with the Rec domains. I do not believe the authors need to make alterations to any of their experiments. However, I do think their description of the "gRNA" should be updated to properly reflect that this RNA lacks any of the secondary structure present in a typical gRNA, much of which is necessary to confer specificity of binding between GeoCas9 and the gRNA. As mentioned in my previous review, this may be best achieved by adding a cartoon of the secondary structure of the full-length gRNA and highlighting the region that was used in the truncated "gRNA". 

      We understand the Reviewer’s point. For any experiment in which the gRNA was truncated (i.e. NMR or some MST studies), we have clarified the text and no longer call it a “gRNA.” We state initially that it is a portion of the gRNA and then call it simply an “RNA.” 

      For experiments using the full-length constructs, we have kept the term “gRNA,” as it remains appropriate.

      We have also added a final Supplementary figure (S12) showing the structures of the truncated and full-length RNAs used, based on the _Geo_Cas9 cryo-EM structure and predicted with RNAfold.

      - Lines 256-257: "The ~3-fold decrease in Kd...". I believe the authors are discussing the Kd's of the mutants relative to WT, in which case the Kd increased. Also, the fold-change appears closer to 2fold than to 3-fold. 

      Yes, the Reviewer makes a good catch. We have corrected this.

      - Lines 407-408: "The mutations also diminished the stability of the full-length GeoCas9 RNP complex." This statement seems at odds with the authors' conclusions in the Results section that the full-length GeoCas9 variants had comparable affinities for the gRNAs (lines 376-382) 

      We agree that this seems contradictory. In the absence of full-length structures for all variants, we can’t definitively state what causes this. It could be that the mutation has an interesting allosteric effect on structure that does not affect RNA binding but induces the Cas9 protein to simply fall apart at lower temperatures, rendering the binding interaction moot. We have added a statement to this section.

      - The authors chose to keep "SpCas9" for consistency with their prior work and the work of many several others, including Doudna et al and Zhang et al. However, I will note that their publications on GeoCas9, the Doudna lab did use SpyCas9 to ensure consistent nomenclature within the publications. 

      We have made the change to “_Spy_Cas9”

      Reviewer #3 (Recommendations for the authors): 

      The authors clearly answered most of my concerns. I still have some technical questions about the analysis of CPMG-RD data but the numbers provided now seem to make sense. While I still think that crystal structures of the point mutant would make the conclusions more "bullet proof", I do appreciate the work associated with this and consider that the manuscript can be published as is. 

      We agree that additional magnetic fields could allow for additional models of CPMG data fitting and that additional crystal structures of the mutants could add to the conclusions. We appreciate the Reviewer recognizing the balance of the current results and potential future studies in signing off on publication.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, Nakagawa and colleagues report the observation that YAP is differentially localized, and thus differentially transcriptionally active, in spheroid cultures versus monolayer cultures. YAP is known to play a critical role in the survival of drug-tolerant cancer cells, and as such, the higher levels of basally activated YAP in monolayer cultures lead to higher fractions of surviving drug-tolerant cells relative to spheroid culture (or in vivo culture). The findings of this study, revealed through convincing experiments, are elegantly simple and straightforward, yet they add significantly to the literature in this field by revealing that monolayer cultures may actually be a preferential system for studying residual cell biology simply because the abundance of residual cells in this format is much greater than in spheroid or xenograft models. The potential linkage between matrix density and stiffness and YAP activation, while only speculated upon in this manuscript, is intriguing and a rich starting point for future studies.

      Although this work, like any important study, inspires many interesting follow-on questions, I am limiting my questions to only a few minor ones, which may potentially be explored either in the context of the current study or in separate, follow-on studies.

      We appreciate Reviewer #1's comments that our work is of importance to the field and particularly that it will "...add significantly to the literature in this field by revealing that monolayer cultures may actually be a preferential system for studying residual cell biology..."  We have sought to highlight the importance of how our findings could be applied to study resistance mechanisms at various points in the manuscript.

      Strengths:

      The major strengths of the work are described above.

      Weaknesses:

      Rather than considering the following points as weaknesses, I instead prefer to think of them as areas for future study:

      (1) Given the field's intense interest in the biology and therapeutic vulnerabilities of residual disease cells, I suspect that one major practical implication of this work could be that it inspires scientists interested in working in the residual disease space to model it in monolayer culture. However, this relies upon the assumption that drug-tolerant cells isolated in monolayer culture are at least reasonably similar in nature to drug-tolerant cells isolated from spheroid or xenograft systems. Is this true? An intriguing experiment that could help answer this question would be to perform gene expression profiling on a cell line model in the following conditions: monolayer growth, drug tolerant cells isolated from monolayer growth conditions, spheroid growth, drug tolerant cells isolated from spheroid growth conditions, xenograft tumors, and drug tolerant cells isolated from xenograft tumors. What are the genes and programs shared between drug-tolerant cells cultured in the three conditions above? Which genes and programs differ between these conditions? Data from this exercise could help provide additional, useful context with which to understand the benefits and pitfalls of modeling residual tumor cell growth in monolayer culture.

      We thank the reviewer for suggesting valuable future studies. We agree that the proposed experiments represent important next steps in understanding the role of YAP and other pathways in primary resistance. We believe, however, these experiments are both beyond the scope of the current manuscript and beyond what can reasonably be addressed in a revision. The distinct challenges associated with comparing in vivo and in vitro conditions would require significant optimization of single-cell approaches, especially given the robust cell death driven by afatinib treatment in vivo. Given the complexity of in vivo experimentation, we are concerned that such studies may not guarantee biologically meaningful insights. Nonetheless, we agree that this is a compelling direction for future research. If common gene expression patterns could be identified despite these challenges, such studies could help validate monolayer culture as a relevant model for investigating residual disease.

      (2) In relation to the point above, there is an interesting and established connection between mesenchymal gene expression and YAP/TAZ signaling. For example, analyses of gene expression data from human tumors and cell lines demonstrate an extremely strong correlation between these two gene expression programs. Further, residual persister cancer cells have often been characterized as having undergone an EMT-like transition. From the analysis above, is there evidence that residual tumor cells with increased YAP signaling also exhibit increased mesenchymal gene expression?

      We agree with the reviewer that a connection between YAP/TAZ activity and EMT is likely, given prior studies exploring correlations between these two gene signatures. We believe, however, exploring EMT represents a distinct research direction from the primary focus of the current manuscript.  We are concerned exploration of EMT, especially in the absence of corresponding preclinical models or mechanistic data directly linking EMT to therapy resistance in our models, could distract from the main conclusions of the manuscript. While we plan to stain for EMT-associated markers in the residual cancer tissue from the in vivo studies, it remains unclear whether such data would meaningfully contribute to the revised manuscript, regardless of the outcome.

      Reviewer #2 (Public review):

      The manuscript by Nakagawa R, et al describes a mechanism of how NSCLC cells become resistant to EGFR and KRAS G12C inhibition. Here, the authors focus on the initial cellular changes that occur to confer resistance and identify YAP activation as a non-genetic mechanism of acute resistance.

      The authors performed an initial xenograft study to identify YAP nuclear localization as a potential mechanism of resistance to EGFRi. The increase in the stromal component of the tumors upon Afatinib treatment leads the authors to explore the response to these inhibitors in both 2D and 3D culture. The authors extend their findings to both KRAS G12C and BRAF inhibitors, suggesting that the mechanism of resistance may be shared along this pathway.

      The paper would benefit from additional cell lines to determine the generalizability of the findings they presented. While the change in the localization of YAP upon Afatinib treatment was identified in a xenograft model, the authors do not return to animal models to test their potential mechanism, and the effects of the hyperactivated S127A YAP protein on Afatinib sensitivity in culture are modest. Also, combination studies of YAP inhibitors and EGFR/RAS/RAF inhibitors would have strengthened the studies.

      We thank the reviewer for their insightful comments. In this manuscript, we present data from 5 cell lines representing the EGFR/BRAF/KRAS pathway, demonstrating the generalizability of YAP-driven decreased cancer cell sensitivity to targeted inhibitors when cultured in 2D compared to spheroid counterparts. While expanding this analysis to a larger panel of cell lines is beyond the scope of the current study, we believe our findings provide a strong rationale for future investigations, including high-throughput screens conducted by other research groups and pharmaceutical companies, to recognize the value in screening spheroid cell cultures. We hope this work helps shift the field of cancer therapeutics toward screening approaches that better reflect tumor biology into drug discovery pipelines and believe this could be one of the most impactful and enduring contributions of our study.

      Reviewer #2 also mentions that "...combination studies of YAP inhibitors and EGFR/RAS/RAF inhibitors would have strengthened the studies..."  The concept that YAP/TAZ inhibitors (i.e. TEAD inhibitors) could be additive or synergistic in 2D culture is one that is being actively tested across several groups and in pharma. Several recent examples include a publication by Hagenbeek, et al., Nat. Cancer, 2023 (PMID: 37277530) showing that a TEAD inhibitor overcomes KRASG12C inhibitor resistance. Additional, recent work by Pfeifer, et al., Comm. Biol., 2024 (PMID: 38658677) suggests a similar effect between EGFR inhibitors and a different TEAD inhibitor. While neither of these studies extensively probes cell death pathways in the way performed in our studies, they nevertheless provide strong evidence that indeed TEAD + targeted EGFR/RAF/RAS inhibition in 2D have additive, if not synergistic, effects. We feel that these recent published studies affirm our findings and repeating such experiments is unlikely to add much new information. We thus feel they are beyond the scope of our present studies.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary:

      Olfactory sensory neurons (OSNs) in the olfactory epithelium detect myriads of environmental odors that signal essential cues for survival. OSNs are born throughout life and thus represent one of the few neurons that undergo life-long neurogenesis. Until recently, it was assumed that OSN neurogenesis is strictly stochastic with respect to subtype (i.e. the receptor the OSN chooses to express).

      However, a recent study showed that olfactory deprivation via naris occlusion selectively reduced birthrates of only a fraction of OSN subtypes and indicated that these subtypes appear to have a special capacity to undergo changes in birthrates in accordance with the level of olfactory stimulation. These previous findings raised the interesting question of what type of stimulation influences neurogenesis, since naris occlusion does not only reduce the exposure to potentially thousands of odors but also to more generalized mechanical stimuli via preventing airflow.

      In this study, the authors set out to identify the stimuli that are required to promote the neurogenesis of specific OSN subtypes. Specifically, they aim to test the hypothesis that discrete odorants selectively stimulate the same OSN subtypes whose birthrates are affected. This would imply a highly specific mechanism in which exposure to certain odors can "amplify" OSN subtypes responsive to those odors suggesting that OE neurogenesis serves, in part, an adaptive function.

      To address this question, the authors focused on a family of OSN subtypes that had previously been identified to respond to musk-related odors and that exhibit higher transcript levels in the olfactory epithelium of mice exposed to males compared to mice isolated from males. First, the authors confirm via a previously established cell birth dating assay in unilateral naris occluded mice that this increase in transcript levels actually reflects a stimulus-dependent birthrate acceleration of this OSN subtype family. In a series of experiments using the same assay, they show that one specific subtype of this OSN family exhibits increased birthrates in response to juvenile male exposure while a different subtype shows increased birthrates to adult mouse exposure. In the core experiment of the study, they finally exposed naris occluded mice to a discrete odor (muscone) to test if this odor specifically accelerates the birth rates of OSN types that are responsive to this odor. This experiment reveals a complex relationship between birth rate acceleration and odor concentrations showing that some muscone concentrations affect birth rates of some members of this family and do not affect two unrelated OSN subtypes.

      In addition to the results nicely summarized by the reviewer, which focus on experiments to examine the effects of odor stimulation on unilateral naris occluded (UNO) mice, an important part of the present study are experiments on non-occluded (i.e., non-UNO-treated) mice. These experiments show: 1) that the exposure of non-occluded mice to odors from adolescent male mice selectively increases quantities of newborn OSNs of the musk-responsive subtype Olfr235 (Figure 3G, H; previously Figure 6), 2) the exposure of non-occluded female mice to 2 different musk odorants (muscone, ambretone) selectively increases quantities of newborn OSNs of 3 musk responsive subtypes: Olfr235, Olfr1440 and Olfr1431 (Figure 4D-F; previously Figure 6), and 3) the exposure of non-occluded adult female mice to a musk odorants selectively increases quantities of newborn OSNs of musk responsive subtypes (Figure 5; previously Fig. S7). We have reorganized the revised manuscript to more prominently and clearly present the experimental design and findings of these experiments. We have also made changes to clarify (via schematics) the experimental conditions used (i.e., UNO, non-UNO, odor exposure) in each experiment.

      Strengths:

      The scientific question is valid and opens an interesting direction. The previously established cell birth dating assay in naris occluded mice is well performed and accompanied by several control experiments addressing potential other interpretations of the data.

      Weaknesses:

      (1) The main research question of this study was to test if discrete odors specifically accelerate the birth rate of OSN subtypes they stimulate, i.e. does muscone only accelerate the birth rate of OSNs that express muscone-responsive ORs, or vice versa is the birthrate of muscone-responsive OSNs only accelerated by odors they respond to?

      This question is only addressed in Figure 5 of the manuscript and the results only partially support the above claim. The authors test one specific odor (muscone) and find that this odor (only at certain concentrations) accelerates the birth rate of some musk-responsive OSN subtypes, but not two other unrelated control OSN subtypes. This does not at all show that musk-responsive OSN subtypes are only affected by odors that stimulate them and that muscone only affects the birthrate of musk-responsive OSNs, since first, only the odor muscone was tested and second, only two other OSN subtypes were tested as controls, that, importantly, are shown to be generally stimulus-independent OSN subtypes (see Figure 2 and S2).

      As a minimum the authors should have a) tested if additional odors that do not activate the three musk-responsive subtypes affect their birthrate b) choose 2-3 additional control subtypes that are known to be stimulus-dependent (from their own 2020 study) and test if muscone affects their birthrates.

      We appreciate these suggestions. Within the revised manuscript, we have described and included the results from several new experiments:

      (1) As noted by the reviewer, we had previously tested the effects of exposure to only one exogenous musk odorant, muscone, on quantities of newborn OSNs of the musk-responsive subtypes Olfr235, Olfr1440, and Olfr1431. To test whether the effects observed with muscone exposure occur with other musk odorants, we assessed the effects of exposure to ambretone (5-cyclohexadecenone), a musk odorant previously found to robustly activate musk-responsive OSNs (Sato-Akuhara et al., 2016; Shirasu et al., 2014), on quantities of newborn OSNs of 3 musk-responsive subtypes Olfr235, Olfr1440, and Olfr1431, as well as the SBT-responsive subtype Olfr912, in the OEs of non-occluded female mice. Exposure to ambretone was found to significantly increase quantities of newborn OSNs of all 3 musk-responsive subtypes (Figure 4D-F) but not the SBT-responsive subtype (Figure 4–figure supplement 4C-left), indicating that a variety of musk odorants can accelerate the birthrates of musk responsive subtypes.

      (2) To verify that exogenous non-musk odors do not increase quantities of newborn OSNs of musk responsive OSN subtypes (point a, above), we quantified newborn OSNs of 3 musk-responsive subtypes, Olfr235, Olfr1440, and Olfr1431, in non-occluded female mice that were exposed to the non-musk odorants SBT or IAA. As expected, neither of these odorants significantly affected the birthrates of the subtypes tested (Figure 4D-F).

      (3) To confirm that exogenous musk odors do not accelerate the birthrates of non-musk responsive OSN subtypes that were previously found to undergo stimulation-dependent neurogenesis (point b, above), we quantified newborn OSNs of 2 such subtypes, Olfr827 and Olfr1325, in non-occluded female mice that were exposed to muscone. As expected, exposure to muscone did not significantly affect the birthrates of either of these subtypes (Figure 4–figure supplement 4C-middle, right).

      (4) To provide additional confirmation that only some OSN subtypes have a capacity to exhibit increases in newborn OSN quantities in the presence of odors that activate them, we compared quantities of newborn OSNs of the SBT-responsive subtype Olfr912 in non-occluded females that were either exposed to 0.1% SBT versus unexposed controls. As expected, exposure of SBT caused no significant increase in quantities of newborn Olfr912 OSNs (Figure 4–figure supplement 4C-left).

      (2) The finding that Olfr1440 expressing OSNs do not show any increase in UNO effect size under any muscone concentration (Figure 5D, no significance in line graph for UNO effect sizes, middle) seems to contradict the main claim of this study that certain odors specifically increase birthrates of OSN subtypes they stimulate. It was shown in several studies that olfr1440 is seemingly the most sensitive OR for muscone, yet, in this study, muscone does not further increase birthrates of OSNs expressing olfr1440. The effect size on birthrate under muscone exposure is the same as without muscone exposure (0%).

      In contrast, the supposedly second most sensitive muscone-responsive OR olfr235 shows a significant increase in UNO effect size between no muscone exposure (0%) and 0.1% as well as 1% muscone.

      Findings that quantities of newborn Olfr1440 OSNs do not show a significantly greater UNO effect size in the OEs from mice exposed to muscone compared to control mice was also somewhat surprising to us. We think that there are two potential explanations for this result: 1) Unlike subtype Olfr235, subtype Olfr1440 exhibits a significant open-side bias in newborn OSN quantities in UNO-treated adolescent females even in the absence of exposure to muscone. We speculate that this subtype (as well as subtype Olfr1431) is stimulated by odors that are emitted by female mice at the adolescent stage, and/or by another environmental source. This may limit the influence of muscone exposure on the UNO effect size. 2) There is compelling evidence that odors within the environment can enter the closed side of the OE transnasally [via the nasopharyngeal canal (Kelemen, 1947)] and/or retronasally (via the nasopharynx) in UNO-treated mice [reviewed in (Coppola, 2012)]. Thus, it is conceivable that chronic exposure of UNO-treated mice to muscone results in the eventual entry on the closed side of the OE of muscone at concentrations sufficient to promote neurogenesis. If Olfr1440 is more sensitive to muscone than Olfr235 [e.g., (Sato-Akuhara et al., 2016; Shirasu et al., 2014)], OSNs of this subtype may be especially sensitive to small amounts of odors that enter the closed side of the OE transnasally and/or retronasally. These explanations are supported by the following results:

      - UNO-treated females exposed to 0.1% muscone show higher quantities of newborn Olfr1440 OSNs on both the open and closed sides of the OE in muscone exposed females compared to their unexposed counterparts (Figure 4–figure supplement 1A-middle). Similar results were also observed for newborn Olfr235 OSNs (Figure 4C-middle), albeit to a lesser extent, perhaps due to the lower sensitivity of this subtype to muscone.

      - In non-occluded female mice, exposure to 0.1% muscone was found to significantly increase quantities of newborn Olfr1440 OSNs, as well as newborn Olfr235 and Olfr1431 OSNs (Figure 4D-F in revised manuscript; Figure 6 in original version). Similar results were also observed upon exposure to ambretone, another musk odor (Figure 4D-F). These experiments strongly support the hypothesis that musk odors selectively increase birthrates of OSN subtypes that they stimulate.

      We have addressed these points within the results section of the revised manuscript.

      (3) The authors introduce their choice to study this particular family of OSN subtypes with first, the previous finding that transcripts for one of these musk-responsive subtypes (olfr235) are downregulated in mice that are deprived of male odors. Second, musk-related odors are found in the urine of different species. This gives the misleading impression that it is known that musk-related odors are indeed excreted into male mouse urine at certain concentrations. This should be stated more clearly in the introduction (or cited, if indeed data exist that show musk-related odors in male mouse urine) because this would be a very important point from an ethological and mechanistic point of view.

      In addition, this would also be important information to assess if the chosen muscone concentrations fall at all into the natural range.

      These are important points, which have addressed within the revised manuscript:

      (1) Within the introduction, we have now stated that the emission of musk odors by mice has not been documented. We have also added extensive discussions of what is known about the emission of musk odors by mice in a new subsection within Results, as well as within the Discussion section. Most prominently, we have cited one study (Sato-Akuhara et al., 2016) that noted unpublished evidence for the emission of Olfr1440-activating compounds from male preputial glands: “Indeed, our preliminary experiments suggest that there are unidentified compounds that activate MOR215-1 in mouse preputial gland extracts.” Another study, which used histomorphology, metabolomic and transcriptomic analyses to compare the mouse preputial glands to muskrat scent glands, found that the two glands are similar in many ways, including molecular composition (Han et al., 2022). However, the study did not identify known musk compounds within mouse preputial glands.

      (2) Based on the reviewer’s feedback and our own curiosity, we used GC-MS to analyze both mouse urine and preputial gland extracts for the presence of known musk odorants, particularly those known to activate Olfr235 and Olfr1440 (Sato-Akuhara et al., 2016). Although we were unable to find evidence for known musk odorants in mouse urine extracts (possibly due to insufficient sensitivity of the assay employed), we found that preputial gland extracts contain GC-MS signals that are structurally consistent with known musk odorants. A limitation of this approach, however, is that the conclusive identification of specific musk odorants in extracts derived from mouse urine and tissues requires comparisons to pure standards, many of which we could not readily obtain. For example, we were unable to obtain a pure sample of cycloheptadecanol, a musk molecule with a predicted potential match to a signal identified within preputial gland extracts. Another limitation is that although several known musk odorants have been found to activate Olfr235 and Olfr1440 OSNs, it is conceivable that structurally distinct odorants that have not yet been identified might also activate them. The findings from these experiments have been included in a new figure within the revised manuscript (Appendix 2–figure 1).

      Related: If these are male-specific cues, it is interesting that changes in OR transcripts (Figure 1) can already be seen at the age of P28 where other male-specific cues are just starting to get expressed. This should be discussed.

      We agree that the observed changes in quantities of newborn OSNs of musk-responsive subtypes in mice exposed to juvenile male odors deserves additional discussion. We have included a more extensive discussion of this observation in both the Results and Discussion sections of the revised manuscript.

      (4) Figure 5: Under muscone exposure the number of newborn neurons on the closed sides fluctuates considerably. This doesn't seem to be the case in other experiments and raises some concerns about how reliable the naris occlusion works for strong exposure to monomolecular odors or what other potential mechanisms are at play.

      We agree that the variability in quantities of newborn OSNs of musk-responsive subtypes on the closed side of the OE of UNO-treated mice deserves further discussion. As noted above, we suspect that these fluctuations are due, at least in part, to transnasal and/or retronasal odor transfer via the nasopharyngeal canal (Kelemen, 1947) and nasopharynx, respectively [reviewed in (Coppola, 2012)], which would be expected to result in exposure of the closed OE to odor concentrations that rise with increasing environmental concentrations. In support of this, quantities of newborn Olfr235 and Olfr1440 OSNs increase on both the open and closed sides with increasing muscone concentration (except at the highest concentration, 10%, in the case of Olfr1440) (Figure 4C-middle, Figure 4–figure supplement 1A-middle). It is conceivable that reductions in newborn Olfr1440 OSN quantities observed in the presence of 10% muscone reflect overstimulation-dependent reductions in survival. Our findings from UNO-based experiments are consistent with expectations that naris occlusion does not completely block exposure to odorants on the closed side, particularly at high concentrations. However, they also appear consistent with the hypothesis that exposure to musk odors promotes the neurogenesis of musk-responsive OSN subtypes.

      Considering the limitations of the UNO procedure, it is important to note that the present study also includes experimental exposure of non-occluded animals to both male odors (Figure 3G, H) and exogenous musk odorants (Figures 4D-F). Findings from the latter experiments provide strong evidence that exposure to multiple musk odorants (muscone, ambretone) causes selective increases in the birthrates of multiple musk-responsive OSN subtypes (Olfr235, Olfr1440, Olfr1431).

      We have included within the Results section of the revised manuscript a discussion of how observed effects of muscone exposure of UNO-treated mice may be influenced by transnasal/ retronasal odor transfer to the closed side of the OE.

      (5) In contrast to all other musk-responsive OSN types, the number of newborn OSNs expressing olfr1437 increases on the closed side of the OE relative to the open in UNO-treated male mice (Figure 1). This seems to contradict the presented theory and also does not align with the bulk RNAseq data (Figure S1).

      Subtype Olfr1437 is indeed an outlier among musk-responsive subtypes that were previously found to be more highly represented in the OSN population in 6-month-old sex-separated males compared to females (Appendix 1–figure 1)(C. van der Linden et al., 2018; Vihani et al., 2020). Somewhat unexpectedly, our findings from scRNA-seq experiments show slightly greater quantities of immature Olfr1437 OSNs on the closed side of the OE in juvenile males (Figure 1D, E of the revised manuscript, which now includes data from a second OE). Perhaps more informatively considering the small number of iOSNs of specific subtypes in the scRNA-seq datasets, EdU birthdating experiments show no difference in newborn Orlfr1437 OSN quantities on the 2 sides of the OE from UNO-treated juvenile males (Figure 2G). It is unclear to us why subtype Olfr1437 does not show open-side biases in newborn OSN quantities in juvenile male mice, but potential explanations include:

      - Age: Findings based on bulk RNA-seq that musk responsive OSN subtypes are more highly represented in mice exposed to male odors analyzed mice that were 6 months old (C. van der Linden et al., 2018) or > 9 months old (Vihani et al., 2020) at the time of analysis. By contrast, the present study primarily analyzed mice that were juveniles (PD 28) at the time of scRNA-seq analysis (Figure 1) or EdU labeling (Figure 2G). It is conceivable that different musk-responsive subtypes are selectively responsive to distinct odors that are emitted at different ages. In this scenario, odors that increase the birthrates of Olfr235, Olfr1440, and Olfr1431 OSNs may be emitted starting at the juvenile stage, while those that increase the birthrate of Olfr1437 OSNs may be emitted in adulthood. In potential support of this, juvenile males exposed to their adult parents at the time of EdU labeling showed a slightly greater (although not statistically significantly different) UNO effect size in quantities of newborn Olfr1437 OSNs compared to controls (Figure 3–figure supplement 3).

      - Capacity for stimulation-dependent neurogenesis: It is also conceivable that, unlike other musk-responsive OSN subtypes, Olfr1437 OSNs lack the capacity for stimulation-dependent neurogenesis (like the SBT-responsive subtype Olfr912, for example). If so, this would imply that increased representations of Olfr1437 OSNs observed in mice exposed to male odors for long periods (C. van der Linden et al., 2018; Vihani et al., 2020) may be due to male odor-dependent increases in the lifespans of Olfr1437 OSNs.

      Within the Discussion section of the revised manuscript, we have discussed the findings concerning Olfr1437.

      (6) The authors hypothesize in relation to the accelerated birthrate of musk-responsive OSN subtypes that "the acceleration of the birthrates of specific OSN subtypes could selectively enhance sensitivity to odors detected by those subtypes by increasing their representation within the OE". However, for two other OSN subtypes that detect male-specific odors, they hypothesize the opposite "By contrast, Olfr912 (Or8b48) and Olfr1295 (Or4k45), which detect the male-specific non-musk odors 2-sec-butyl-4,5-dihydrothiazole (SBT) and (methylthio)methanethiol (MTMT), respectively, exhibited lower representation and/or transcript levels in mice exposed to male odors, possibly reflecting reduced survival due to overstimulation."

      Without any further explanation, it is hard to comprehend why exposure to male-derived odors should, on one hand, accelerate birthrates in some OSN subtypes to potentially increase sensitivity to male odors, but on the other hand, lower transcript levels and does not accelerate birth rates of other OSN subtypes due to overstimulation.

      We agree that this point deserves further explanation. Within the revised manuscript, we have expanded the Introduction and Results to describe evidence from previous studies that exposure to stimulating odors causes two categories of changes to specific OSN subtypes: elevated representations or reduced representations within the OSN population. In one study (C. J. van der Linden et al., 2020), UNO treatment was found to cause a fraction of OSN subtypes to exhibit lower birthrates and representations on the closed side of the OE relative to the open. By contrast, another fraction of OSN subtypes exhibited higher representations on the closed side of the OEs of UNO-treated mice, but no difference in birthrates between the two sides. The latter subtypes were found to be distinguished by their receipt of extremely high levels of odor stimulation, suggesting that reduced odor stimulation via naris occlusion may lengthen their lifespans. In support of the possibility that Olfr912 (and Olfr1295), which detect SBT and MTMT, respectively (Vihani et al., 2020), which are emitted specifically by male mice (Lin et al., 2005; Schwende et al., 1986), UNO treatment was previously found to increase total Olfr912 OSN quantities on the closed side compared to the open side in sex-separated males (C. van der Linden et al., 2018), a finding confirmed in the present study (Figure 3–figure supplement 1H).

      Taken together, findings from previous studies as well as the current one indicate that olfactory stimulation can accelerate the birthrates and/or reduced the lifespans of OSNs, depending on the specific subtypes and odors within the environment. As we have now indicated in the Discussion, we do not yet know what distinguishes subtypes that undergo stimulation-dependent neurogenesis, but it is conceivable that they detect odors with a particular salience to mice. Thus, observations that some odorants (e.g., musks) cause stimulation-dependent neurogenesis while others do not (e.g., SBT) might reflect an animal’s specific need to adapt its sensitivity to the former. Alternatively, it is conceivable that stimulation-dependent reductions in representations of subtypes such as Olfr912 and Olfr1295 reflect a fundamentally different mode of plasticity that is also adaptive, as has been hypothesized (C. van der Linden et al., 2018; Vihani et al., 2020).

      Reviewer #1 (Recommendations For The Authors):

      To support the main claim, several controls are necessary as mentioned under point 1 of the public review.

      As outlined in our responses to the public review, new experiments within the revised manuscript indicate the following:

      (1) Accelerated birthrates of 3 different musk responsive OSN subtypes (Olfr235, Olfr1440, Olfr1431) are observed in non-occluded mice following exposure to multiple exogenous musk odorants (muscone, ambretone) (Figure 4D-F).

      (2) Exposure of non-occluded mice to non-musk odors (SBT, IAA) does not accelerate the birthrates of musk responsive OSN subtypes (Olfr235, Olfr1440, Olfr1431) (Figure 4D-F).

      (3) Exposure of mice to exogenous musk odors (muscone, ambretone) does not accelerate the birthrates of non-musk responsive OSN subtypes (e.g., Olfr912), including those previously found to undergo stimulation-dependent neurogenesis (Olfr827, Olfr1325) (Figure 4–figure supplement 4C).

      (4) Only a fraction of OSN subtypes have a capacity to undergo accelerated neurogenesis in the presence of odors that activate them (e.g., Olfr912 birthrates are not accelerated by SBT exposure) (Figure 4–figure supplement 4C-left).

      In addition, this study could be considerably improved by showing that the proposed mechanism applies beyond a single OSN subtype (olfr235), especially since the most sensitive OR subtype (expressing olfr1440) does not align with the main claim. The introduction states that this is difficult because the ligands for many ORs are unknown including all subtypes previously found to undergo stimulation-dependent neurogenesis referring to your 2020 study. While this reviewer agrees that the lack of deorphanization is a significant hurdle in the field, the 2020 study states that about 4% of all ORs (which should equal >40 ORs) show a stimulus-dependent down-regulation on the closed side, not only the 7 ORs which are closer examined (Figure 1). It would tremendously improve the impact of the current study to show that the proposed effect applies also to one of these other >40 ORs.

      We appreciate this question, as it alerted us to some shortcomings in how our findings were presented within the original manuscript. We respectfully disagree that only findings regarding subtype Olfr235 align with the main hypothesis of this study, which is that discrete odors can selectively promote the neurogenesis of sensory neuron subtypes that they stimulate. Specifically, we would like to draw attention to experiments on non-occluded female mice exposed to exogenous musk odorants (muscone, ambretone; revised Figures 4D-F; previously, Figure 6). Findings from these experiments provide compelling evidence that exposure to musk odorants causes selective increases in the birthrates of three different musk-responsive OSN subtypes: Olfr235, Olfr1440, and Olfr1431. Thus, we would suggest that results from the present study already show that the proposed mechanism applies to more than the just Olfr235 subtype. However, we agree with what we think is the essence of the reviewer’s point: that it is important to determine the extent to which this mechanism applies to OSN subtypes that are responsive to other (i.e., non-musk) odorants. While, as noted by the reviewer, our previous study identified several OSN subtypes that undergo stimulation-dependent neurogenesis (as well as many others that predicted to do so)(C. J. van der Linden et al., 2020), we are not aware of ligands that have been identified with high confidence for those subtypes. Although we are in the process of conducting experiments to identify additional odor/subtype pairs to which the mechanism described in this study applies, the early-stage nature of these experiments precludes their inclusion in the present manuscript.

      The ethological and mechanistic relevance of the current study could be significantly improved by showing that musk-related odors that activate olfr235 are actually found in male mouse urine (and additionally are not found in female mouse urine). Otherwise, the implicated link between the acceleration of OSN birthrates by exposure to male odors and acceleration by specific monomolecular odors does not hold, raising the question of any natural relevance (e.g. the proposed adaptive function to increase sensitivity to certain odors).

      As noted in our responses to the public review, we have addressed this important point within the revised manuscript as follows:

      (1) We have included an extensive discussion of what is known about the emission of musk-like odors by mice.

      (2) We have used GC-MS to analyze both mouse urine and preputial gland extracts for the presence of known musk compounds. Although inconclusive, we report that preputial glands contain signals that are structurally consistent with known musk compounds. The findings of these experiments have been included in the revised manuscript (new Appendix 2–figure 1), along with a discussion of their limitations.

      Reviewer #2 (Public Review):

      In their paper entitled "In mice, discrete odors can selectively promote the neurogenesis of sensory neuron subtypes that they stimulate" Hossain et al. address lifelong neurogenesis in the mouse main olfactory epithelium. The authors hypothesize that specific odorants act as neurogenic stimuli that selectively promote biased OR gene choice (and thus olfactory sensory neuron (OSN) identity). Hossain et al. employ RNA-seq and scRNA-seq analyses for subtype-specific OSN birthdating. The authors find that exposure to male and musk odors accelerates the birthrates of the respective responsive OSNs. Therefore, Hossain et al. suggest that odor experience promotes selective neurogenesis and, accordingly, OSN neurogenesis may act as a mechanism for long-term olfactory adaptation.

      We appreciate this summary but would like to underscore that a mechanism involving biased OR gene choice is just one of two possibilities proposed in the Discussion section to explain how odorant stimulation of specific subtypes accelerates the birthrates of those subtypes.

      The authors follow a clear experimental logic, based on sensory deprivation by unilateral naris occlusion, EdU labeling of newborn neurons, and histological analysis via OR-specific RNA-FISH. The results reveal robust effects of deprivation on newborn OSN identity. However, the major weakness of the approach is that the results could, in (possibly large) parts, depend on "downregulation" of OR subtype-specific neurogenesis, rather than (only) "upregulation" based on odor exposure. While, in Figure 6, the authors show that the observed effects are, in part, mediated by odor stimulation, it remains unclear whether deprivation plays an "active" role as well. Moreover, as shown in Figure 1C, unilateral naris occlusion has both positive and negative effects in a random subtype sample.

      In our view, the present study involves two distinct and complementary experimental designs: 1) odor exposure of UNO-treated animals and 2) odor exposure of non-occluded animals. Here we address this comment with respect to each of these designs:

      (1) For experiments performed on UNO-treated animals, we agree that observed differences in birthrates on the open and closed sides of the OE reflect, largely, a deceleration (i.e., downregulation) of the birthrates of these subtypes on the closed side relative to the open (as opposed to an acceleration of birthrates on the open side). Our objective in using this design was to test the extent to which specific OSN subtypes undergo stimulation-dependent neurogenesis under various odor exposure conditions. According to the main hypothesis of this study, a lower birthrate of a specific OSN subtype on the closed side of the OE compared to the open is predicted to reflect a lower level of odor stimulation on the closed side received by OSNs of that subtype. However (and as described in our responses to reviewer #1), a limitation of this design is that environmental odorants, especially at high concentrations, are likely to stimulate responsive OSNs on the closed side of the OE in addition to the open side due to transnasal and/or retronasal air flow.

      (2) Experiments performed on non-occluded animals were designed to provide critical complementary evidence that specific OSN subtypes undergo accelerated neurogenesis in the presence of specific odors. Using this design, we have found compelling evidence that:

      - Exposure of non-occluded mice to male odors causes the selective acceleration of the birthrate of Olfr235 OSNs (Figure 3G, H).

      - Exposure of non-occluded female mice to two different musk odorants (muscone and ambretone) selectively accelerates the birthrates three different musk responsive subtypes: Olfr235, Olfr1440, and Olf1431 (Figure 4D-F and Figure 4–figure supplement 4C).

      We have reorganized the revised manuscript to more clearly present the most important experimental findings using these two experimental designs. We have also highlighted (via schematics) the experimental conditions (e.g., UNO, non-occlusion, odor exposure) used for each experiment.

      Another weakness is that the authors build their model (Figure 8), specifically the concept of selectivity, on a receptor-ligand pair (Olfr912 that has been shown to respond, among other odors, to the male-specific non-musk odors 2-sec-butyl-4,5-dihydrothiazole (SBT)) that would require at least some independent experimental corroboration. At least, a control experiment that uses SBT instead of muscone exposure should be performed.

      We agree that this important concern deserves additional control experiments and discussion. We have addressed this concern within the revised manuscript as follows:

      - Within the Results section, we have added multiple new control experiments (detailed in response to Reviewer #1), including the one recommended above. As suggested, we quantified newborn OSNs of the SBT-responsive subtype Olfr912 in non-occluded females that were either exposed to 0.1% SBT or unexposed controls. Exposure of SBT was found to cause no significant increase in quantities of newborn Olfr912 OSNs (newly added Figure 4–figure supplement 4C-left). These findings further support the model in Figure 7 (previously Figure 8) that only a fraction of OSN subtypes have a capacity to undergo accelerated neurogenesis in the presence of odors that activate them.

      - Also within the Results section, we have made efforts to better highlight relevant control experiments that were included in the original version, particularly those showing that quantities of newborn Olfr912 OSNs are not affected by UNO in mice exposed to male odors (Figure 2H and Figure 3–figure supplement 1G; previously Figure 2F and Figure 3H) or by exposure of non-occluded females to male odors (Figure 3H; previously Figure 6E). Since Olfr235 is responsive to component(s) of male odors (C. van der Linden et al., 2018; Vihani et al., 2020), these results indicate that this subtype does not have the capacity of stimulation-dependent neurogenesis, which is consistent with our previous findings that only a fraction of subtypes have this capacity (C. J. van der Linden et al., 2020).

      In this context, it is somewhat concerning that some results, which appear counterintuitive (e.g., lower representation and/or transcript levels of Olfr912 and Olfr1295 in mice exposed to male odors) are brushed off as "reflecting reduced survival due to overstimulation." The notion of "reduced survival" could be tested by, for example, a caspase3 assay.

      This is a point that we agree deserves further discussion. Please see the explanation that we have outlined above in response to Reviewer #1.

      Within the revised manuscript, we have expanded the Introduction to describe evidence from previous studies that exposure to stimulating odors causes two categories of changes to specific OSN subtypes: elevated representations or reduced representations within the OSN population. We outline evidence from previous studies that Olfr912 and Olfr1295 belong to the latter category, and that the representations of these subtypes are likely reduced by male odor overstimulation-dependent shortening of OSN lifespan.

      Important analyses that need to be done to better be able to interpret the findings are to present (i) the OR+/EdU+ population of olfactory sensory neurons not just as a count per hemisection, but rather as the ratio of OR+/EdU+ cells among all EdU+ cells; and (ii) to the ratio of EdU+ cells among all nuclei (UNO versus open naris). This way, data would be normalized to (i) the overall rate of neurogenesis and (ii) any broad deprivation-dependent epithelial degeneration.

      We have addressed this concern in two ways within the revised manuscript:

      (1) We have noted within the Methods section that the approach of using half-sections for normalization has been used in multiple previous studies for quantifying newborn (OR+/EdU+) and total (OR+) OSN abundances (Hossain et al., 2023; Ibarra-Soria et al., 2017; C. van der Linden et al., 2018; C. J. van der Linden et al., 2020). Additionally, within the figure legends and Methods, we have more thoroughly described the approach used, including that it relies on averaging the quantifications from at least 5 high-quality coronal OE tissue sections that are evenly distributed throughout the anterior-posterior length of each OE and thereby mitigates the effects of section size and cell number variation among sections. In the case of UNO treated mice, the open and closed sides within the same section are paired, which further reduces the effects of section-to section variation. We have found that this approach yields reproducible quantities of newborn and total OSNs among biological replicate mice and enables accurate assessment of how quantities of OSNs of specific subtypes change as a result of altered olfactory experience, a key objective of this study.

      (2) To assess whether the use of alternative approaches for normalizing newborn OSN quantities suggested by the reviewers would affect the present study’s findings, we compared three methods for normalizing the effects of exposure to male odors or muscone on quantities of newborn Olfr235 OSNs in the OEs of both UNO-treated and non-occluded mice: 1) OR+/EdU+ OSNs per half-section (used in this study), 2) OR+/EdU+ OSNs per total number of EdU+ cells (reviewer suggestion (i)), and 3) OR+/EdU+ OSNs per unit of DAPI+ area (an approximate measure of nuclei number; reviewer suggestion (ii)). The three normalization methods yielded statistically indistinguishable differences in assessing the effects of exposure of either UNO-treated or non-occluded mice to male odors (newly added Figure 2–figure supplement 2 and Figure 3–figure supplement 2), or of exposure of non-occluded mice to muscone (newly added Figure 4–figure supplement 3). Based on these findings, and the considerable time that would be required to renormalize all data in the manuscript, we have chosen to maintain the use of normalization per half-section.

      Finally, the paper will benefit from improved data presentation and adequate statistical testing. Images in Figures 2 - 7, showing both EdU labeling of newborn neurons and OR-specific RNA-FISH, are hard to interpret. Moreover, t-tests should not be employed when data is not normally distributed (as is the case for most of their samples).

      We have made extensive changes within the revised manuscript to increase the clarity and interpretability of the figures, including:

      (1) Addition of a split-channel, high-magnification view of a representative image that shows the overlap of FISH and EdU signals (Figure 2D).

      (2) Addition of experimental schematics and timelines corresponding to each set of experiments.

      In the revised manuscript, several changes to the statistical tests have been made, as follows:

      (1) To assess deviation from normality of the histological quantifications of newborn and total OSNs of specific subtypes in this study, all datasets were tested using the Shapiro-Wilk test for non-normality and the P values obtained are included in Supplementary file 1 (figure source data). Of the 274 datasets tested, 253 were found to have Shapiro-Wilk P values > 0.05, indicating that the vast majority (92%) do not show evidence of significant deviation from a normal distribution.

      (2) A general lack of deviation of the datasets in this study from a normal distribution is further supported by quantile-quantile (QQ) plots, which compare actual data to a theoretically normal distribution (Appendix 4–figure 1). The datasets analyzed were separated into the following categories:

      a. Quantities of newborn OSNs in UNO treated mice (Appendix 4-figure 1A)

      b. Quantities of total OSNs in UNO treated mice (Appendix 4-figure 1B)

      c. Quantities of newborn OSNs in non-occluded mice (Appendix 4-figure 1C)

      d. UNO effect sizes for newborn or total OSNs (Appendix 4-figure 1D)

      (3) Results of both parametric and non-parametric statistical tests of comparisons in this study have been included in Supplementary file 2 (statistical analyses). In general, the results from parametric and non-parametric tests are in good agreement.

      (4) Statistical analyses of differences in OSN quantities in the OEs of non-occluded mice or UNO effect sizes in UNO-treated mice subjected more than two different experimental conditions have now been performed using one-way ANOVA tests, FDR-adjusted using the 2-stage linear step-up procedure of Benjamini, Krieger and Yekutieli.

      Reviewer #2 (Recommendations for the Authors):

      The manuscript by Hossain et al. would benefit from a thorough revision. Here, we outline several points that should be addressed:

      Figure 3E - I & Figure 4E&F: Red lines that connect mean values are misleading.

      Within the revised manuscript, the UNO effect size graphs have been modified for clarity, including removal of the lines between mean values except for those comparing changes over time post EdU injection (Figure 6 and Figure 6-figure supplement 1). For these latter graphs, we think that lines help to illustrate changes in effect sizes over time.

      Figure 3E - I: UNO effect sizes (right) should be tested via ANOVA.

      In the revised manuscript, statistical analyses of UNO effect sizes in UNO-treated mice subjected more than two different experimental conditions were done using one-way ANOVA tests, FDR-adjusted using the 2-stage linear step-up procedure of Benjamini, Krieger and Yekutieli (Figure 2-figure supplement 2; Figure 3; Figure 3-figure supplement 1; Figure 4; Figure 4-figure supplements 1, 2). The same tests were used for analysis of differences in OSN quantities in the OEs of non-occluded mice subjected more than two different experimental conditions (Figure 3; Figure 3-figure supplement 2; Figure 4; Figure 4-figure supplements 3, 4). For comparisons of differences in quantities of newborn OSNs of musk-responsive subtypes at 4 and 7 days post-EdU between non-occluded mice exposed and unexposed to muscone, a two sample ANOVA - fixed-test, using F distribution (right-tailed) was used (Figure 6; Figure 6-figure supplement 1).

      Images in Figures 2 - 7, showing both EdU labeling of newborn neurons and OR-specific RNA-FISH: Colabeling is hard / often impossible to discern. Show zoom-ins and better explain the criteria for "colabeling" in the methods.

      In the revised manuscript an enlarged and split-channel view of an image showing multiple newborn Olfr235 OSNs (OR+/EdU+) has been added (Figure 2D). A detailed description of the criteria for OR+/EdU+ OSNs is provided in Methods under the section “Histological quantification of newborn and total OSNs of specific subtypes.”

      Figure 1C: add Olfr912.

      As a control group for iOSN quantities of musk-responsive subtypes in Figure 1, we selected random subtypes that are expressed in the same zones: 2 and 3. Olfr912 OSNs were not included because this subtype was not randomly chosen, nor is it expressed the same zones (Olfr912 is expressed in zone 4). We also note that the scRNA-seq analysis was done to allow an initial exploration of the hypothesis that some OSN subtypes with that are more highly represented in mice exposed to male odors show stimulation-dependent neurogenesis. Considering that the scRNA-seq datasets contain only small numbers of iOSNs of specific subtypes, we think they are more useful for analyzing changes in birthrates within groups of subtypes (e.g., musk responsive, random) rather than individual subtypes.

      The time of OE dissection is different for data shown in Figure 1 (P28) as compared to other figures (P35). Please comment/discuss.

      Within the Results section of the revised manuscript, we have now clarified that the PD 28 timepoint chosen for EdU birthdating in the histological quantification of newborn OSNs of specific subtypes is analogous to the PD 28 timepoint chosen for identification of immature (Gap43-expressing) OSNs in the scRNA-seq samples. In the case of EdU birthdating, it is necessary to provide a chase period of sufficient length to enable robust and stable expression of an OR, which defines the subtype. A chase period of 7 days was chosen based on a previous study (C. J. van der Linden et al., 2020). Hence, a dissection date of PD 35 was chosen.

      Figure 3F&G: please discuss the female à female effects

      In the Results and Discussion sections of the revised manuscript, we discuss our observation that the Olfr1440 and Olfr1431 subtypes show significantly higher quantities of newborn OSNs on the open side compared to closed sides in UNO-treated females. We speculate that these subtypes may receive some odor stimulation in juvenile females, perhaps via musk or related odors emitted by females themselves or from elsewhere within the environment.

      Figure 4E (and other examples): male à male displays two populations (no effect versus effect); please explain/speculate.

      For some UNO effect sizes, there appears to be high degree of variation among mice, and, in some cases, this diversity appears to cause the data to separate into groups. We assessed whether this diversity might reflect mice that came from different litters, but this is not the case. Rather, we speculate that the observed diversity most likely reflects low representations of newborn OSNs of some subtypes and/or under specific conditions. The data referred to by the reviewer (now Figure 3–figure supplement 3D), for example, shows UNO effect sizes for quantities of newborn Olfr1431 OSNs, which has the lowest representation among the musk-responsive subtypes analyzed in this study.

      Figure 5C-E: It is unclear why strong muscone concentrations (10%) have no effect, whereas no muscone sometimes (D&E) has an effect.

      As discussed in response to comments from Reviewer #1, we speculate that fluctuations in UNO effect sizes in muscone-exposed mice, particularly at high muscone concentrations, may be due, at least in part, to transnasal and/or retronasal air flow [reviewed in (Coppola, 2012)], which would be expected to result in exposure of the closed side of the OE to muscone concentrations that increase with increasing environmental concentrations. In support of this, quantities of newborn Olfr235 (Figure 4C-middle) and Olfr1440 (Figure 4–figure supplement 1A-middle) OSNs increase on both the open and closed sides with increasing muscone concentration (except at the highest concentration, 10%, in the case of Olfr1440). We speculate that reductions in newborn Olfr1440 OSN quantities observed in the presence of 10% muscone may reflect overstimulation-dependent reductions in survival.

      As emphasized above, our study also includes experiments on non-occluded animals (Figures 3, 4, 5). Findings from these experiments provide additional evidence that exposure to multiple musk odorants (muscone, ambretone) causes selective increases in the birthrates of multiple musk-responsive OSN subtypes (Olfr235, Olfr1440, Olfr1431).

      We have included an extensive interpretation of UNO-based experiments, including their limitations, within the Results section of the revised manuscript.

      Figure S1: please explain the large error bars regarding "Transcript level".

      We have clarified that the error bars in this figure, which is now Appendix 1–figure 1, correspond to 95% confidence intervals.

      The figure captions could be improved for ease of reading.

      Figure captions have been revised for increased clarity.

      Figure 4: Include Olfr235 data for consistency.

      All OSN subtypes analyzed for the effects of exposure to adult mice on UNO-induced open-side biases in quantities of newborn OSNs have been included in a single figure, which is now Figure 3–figure supplement 3.

      Figure S6F&G: Do not run statistics on n = 2 (G) or 3 (F) samples.

      We have removed statistical test results from comparisons involving fewer than 4 observations.

      Reviewer #3 (Public Review):

      Summary:

      Neurogenesis in the mammalian olfactory epithelium persists throughout the life of the animal. The process replaces damaged or dying olfactory sensory neurons. It has been tacitly that replacement of the OR subtypes is stochastic, although anecdotal evidence has suggested that this may not be the case. In this study, Santoro and colleagues systematically test this hypothesis by answering three questions: is there enrichment of specific OR subtypes associated with neurogenesis? Is the enrichment dependent on sensory stimulus? Is the enrichment the result of differential generation of the OR type or from differential cell death regulated by neural activity? The authors provide some solid evidence indicating that musk odor stimulus selectively promotes the OR types expressing the musk receptors. The evidence argues against a random selection of ORs in the regenerating neurons.

      Strengths:

      The strength of the study is a thorough and systematic investigation of the expression of multiple musk receptors with unilateral naris occlusion or under different stimulus conditions. The controls are properly performed. This study is the first to formulate the selective promotion hypothesis and the first systematic investigation to test it. The bulk of the study uses in situ hybridization and immunofluorescent staining to estimate the number of OR types. These results convincingly demonstrate the increased expression of musk receptors in response to male odor or muscone stimulation.

      Weaknesses:

      A major weakness of the current study is the single-cell RNASeq result. The authors use this piece of data as a broad survey of receptor expression in response to unilateral nasal occlusion. However, several issues with this data raise serious concerns about the quality of the experiment and the conclusions. First, the proportion of OSNs, including both the immature and mature types, constitutes only a small fraction of the total cells. In previous studies of the OSNs using the scRNASeq approach, OSNs constitute the largest cell population. It is curious why this is the case. Second, the authors did not annotate the cell types, making it difficult to assess the potential cause of this discrepancy. Third, given the small number of OSNs, it is surprising to have multiple musk receptors detected in the open side of the olfactory epithelium whereas almost none in the closed side. Since each OR type only constitutes ~0.1% of OSNs on average, the number of detected musk receptors is too high to be consistent with our current understanding and the rest of the data in the manuscript. Finally, unlike the other experiments, the authors did not describe any method details, nor was there any description of quality controls associated with the experiment. The concerns over the scRNASeq data do not diminish the value of the data presented in the bulk of the study but could be used for further analysis.

      We are grateful to the reviewer for raising these important questions.

      In the revised manuscript, we have clarified that the scRNA-seq dataset presented in the original version of the manuscript (now called dataset OE 1) was published and described in detail in a previous study (C. J. van der Linden et al., 2020). The reviewer is correct that the proportion of OSNs within that dataset was lower in that dataset than in other datasets that have been published more recently (using updated methods). We think this is likely because of the way that the cells were processed (e.g., from cryopreserved single cells followed by live/dead selection). However, because the open and closed sides were processed identically, we do not expect the ratios of OSNs of specific subtypes to be greatly affected. Hence, the differences observed for specific OSN subtypes on the open versus closed sides are expected to be valid.

      As the reviewer notes, there is a surprisingly large difference between the number of OSNs of musk-responsive subtypes on the open and closed sides within the OE 1 dataset. This difference is a key piece of information that led us to formulate the hypothesis in the study: that musk responsive subtypes are born at a higher rate in the presence of male/musk odor stimulation. And while it is true that, on average, each subtype represents ~0.1% of the population, it is known that there is wide variance in representations among different subtypes [e.g., (Ibarra-Soria et al., 2017)]. The frequencies of the musk responsive subtypes among all OSNs on the open side of OE 1 (0.3% for Olfr235, 0.4% for olfr1440, 0.06% for Olfr1434, 0% for olfr1431, and 1% for Olfr1437) are in line with previous findings.

      To confirm that the scRNA-seq findings from dataset OE 1 are not an artifact of the cell preparation methods used, we generated a second scRNA-seq dataset, OE 2, which has been added to the revised manuscript (Figure 1). The OE 2 dataset was prepared according to the same experimental timeline as OE 1, but the cells were captured immediately after dissociation and live/dead sorting via FACS. As expected, most cells within OE 2 dataset are OSNs (77% on the open side, 66% on the closed). Importantly, like the OE 1 dataset, the OE 2 dataset shows higher quantities of iOSNs of musk responsive subtypes on the open side of the OE compared to the closed (normalized for either total cells or total OSNs) (Figure 1–figure supplement 1D, E).

      A weakness of the experiment assessing musk receptor expression is that the authors do not distinguish immature from mature OSNs. Immature OSNs express multiple receptor types before they commit to the expression of a single type. The experiments do not reveal whether mature OSNs maintain an elevated expression level of musk receptors.

      While it is established that multiple ORs are coexpressed at a low level during OSN differentiation (Bashkirova et al., 2023; Fletcher et al., 2017; Hanchate et al., 2015; Pourmorady et al., 2024; Saraiva et al., 2015; Scholz et al., 2016; Tan et al., 2015), this has been found to occur primarily at the immediate neuronal precursor 3 (INP3) stage (Bashkirova et al., 2023; Fletcher et al., 2017), which is characterized by expression of Tex15 (Fletcher et al., 2017; Pourmorady et al., 2024) and precedes the immature OSN (iOSN) stage, which is characterized by expression of Gap43 (Fletcher et al., 2017; McIntyre et al., 2010; Verhaagen et al., 1989). Within the scRNA-seq datasets in the present study, iOSNs of specific subtypes are identified based on robust expression of Gap43 (Log<sup>2</sup> UMI > 1) and a specific OR gene (Log<sup>2</sup> UMI > 2), as described in the figures and methods. Thus, the cells defined as iOSNs are expected to express a single OR gene and this expression should be maintained as iOSNs transition to mOSNs. To confirm these predictions, we carried out a detailed analysis of OR expression at three different stages of OSN differentiation: INP3, iOSN, and mOSN (Figure 1–figure supplement 2). The cells chosen for analysis express the musk-responsive ORs Olfr235 or Olfr1440 or a randomly chosen OR Olfr701, in addition to markers that define INP3, iOSN, or mOSN cells. As expected, individual iOSNs and mOSNs of musk-responsive subtypes were found to exhibit robust and singular OR expression on the open and closed sides of OEs from UNO-treated mice. Moreover, and as observed previously, INP3 cells coexpress multiple OR transcripts at low levels. A detailed description of how the analysis was performed is included in the Methods section under Quantification and statistical analysis.

      Within the histology-based quantifications, newborn OSNs are identified based on their robust RNA-FISH signals corresponding to a specific OR transcript and an EdU label. Considering the EdU chase time of 7 days, most EdU-positive cells are expected to have passed the INP3 stage and be iOSNs or mOSNs. Moreover, considering the low level of OR expression within INP3 cells, it is unlikely OR transcripts are expressed at a high enough level to be detectable and/or counted at this stage and thereby affect newborn OSN quantifications.

      There are also two conceptual issues that are of concern. The first is the concept of selective neurogenesis. The data show an increased expression of musk receptors in response to male odor stimulation. The authors argue that this indicates selective neurogenesis of the musk receptor types. However, it is not clear what the distinction is between elevated receptor expression and a commitment to a specific fate at an early stage of development. As immature OSNs express multiple receptors, a likely scenario is that some newly differentiated immature OSNs have elevated expression of not only the musk receptors but also other receptors. The current experiments do not distinguish the two alternatives. Moreover, as pointed out above, it is not clear whether mature OSNs maintain the increased expression. Although a scRNASeq experiment can clarify it, the authors, unfortunately, did not perform an in-depth analysis to determine at which point of neurogenesis the cells commit to a specific musk receptor type. The quality of the scRNASeq data unfortunately also does not lend confidence for this type of analysis.

      The addition of a second scRNA-seq dataset within the revised manuscript (Figure 1), combined with the new scRNA-seq-based analyses of OR expression in INP3, iOSN, and mOSN cells (Figure 1-figure supplement 2), provide strong evidence that iOSNs and mOSNs robustly express a single OR gene and that cellular expression is stable from the iOSN to the mOSN stage. These analyses do not support a scenario in which odor stimulation causes upregulated expression of multiple ORs and thereby causes apparent increases in quantities of newly generated OSNs that express musk-responsive ORs. Rather, the data firmly support a mechanism in which odor stimulation increases quantities of newly generated OSNs that have stably committed to the robust expression of a single musk-responsive OR.

      A second conceptual issue, the idea of homeostasis in regeneration, which the authors presented in the Introduction, needs clarification. In its current form, it is confusing. It could mean that a maintenance of the distribution of receptor types, or it could mean the proper replacement of a specific OR type upon the loss of this type. The authors seem to refer to the latter and should define it properly.

      We have revised the Introduction section to clarify our use of the term homeostatic in one instance (paragraph 4) and replace it with more specific language in a second instance (paragraph 5).

      Reviewer #3 (Recommendations For The Authors):

      Concerns over scRNASeq data. It appears that the samples may have included non-OE tissues, which reduced the representation of the OSNs. This experiment may need to be repeated to increase the number of OSNs.

      As outlined in the response to the public comments, we think that the low proportion of OSNs in the OE 1 data set reflects how the cells were prepared and processed. We have now included a second scRNA-seq dataset to address this concern.

      Cell types should be identified in the scRNASeq analysis, and the number of cells documented for each cell type, at least for the OSNs. The data should be made available for general access.

      We have now clarified that the OE 1 dataset was published as part of a previous study (C. J. van der Linden et al., 2020) and was made publicly available as part of that study (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE157119). All cell types in the newly generated OE 2 dataset have been annotated (Figure 1) and this dataset has also been made publicly available (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE278693). The numbers and percentages of OSNs within OE 1 and OE 2 datasets have been added to the legend of Figure 1-figure supplement 1.

      The specific OR types should be segregated for mature and immature OSNs. The percentage of a specific OR type should be normalized to the total number of OSNs, rather than the total cells. The current quantification is misleading because it gives the false sense that the muscone receptors represent ~0.1% of cells when the proportion is much higher if only OSNs are considered.

      In the revised manuscript, quantities of iOSNs (Gap43+ cells) of specific subtypes within the OE 1 and OE 2 scRNA-seq datasets are graphed as percentages of both all OSNs (Figure 1E, Figure 1–figure supplement 1D) and all cells (Figure 1–figure supplement 1E). As a percentage of all OSNs, average quantities of iOSNs of musk responsive subtypes on the open side of the OE range from 0.005% (for Olfr1431) to 0.14% (for Olfr1440) (Figure 1E).

      Within the feature plots for the two datasets, the differentiation stages of indicated OSNs have been clearly defined within the figures and figure legends. For the OE 1 dataset, iOSNs are differentiated from mOSNs by arrows (Figure 1–figure supplement 1C). For the OE 2 dataset (Figure 1D), only immature OSNs are shown for simplicity.

      Technical details of the scRNASeq should be documented. In the feature plot of musk-response receptors (Figure. 1D), it is better to use the actual quantity of expression rather than binarized representation (with or without an OR). If one needs to use on/off to determine the number of cells for a given OR type, then the criteria of selection should be given.

      Technical details of generation of the scRNA-seq datasets have been documented in the “Method details” section (for the OE 2 dataset) and in the method section of our previous publication of the OE 1 dataset (C. J. van der Linden et al., 2020). Details of the scRNA-seq analyses, including the criteria used to define immature OSNs of specific subtypes, are documented within the “Quantification and statistical analysis” section.

      Within the feature plots, we have decided to show OSNs of a given subtype in a binary fashion using specific colors for the sake of simplicity (Figure 1D, Figure 1-figure supplement 1C). To address the reviewer’s cooncern, we have added a new figure that provides detailed information about OR transcript expression (levels and genes) within iOSNs and mOSNs of two different musk responsive subtypes and a randomly chosen subtype (Figure 1-figure supplement 2).

      An in-depth analysis of the onset of OR expression in the GBC, INP, immature, and mature OSNs should be performed. It is also important to determine how many other receptors are detected in the cells that express the musk receptors. The current scRNASeq data may not be of sufficiently high quality and the experiment needs to be repeated. It is also important for the authors to take measures to eliminate ambient RNA contamination.

      The revised manuscript includes a second scRNA-seq dataset (OE 2; Figure 1). Details of how both the original (OE 1) and new datasets were generated have been documented within the Methods sections of the corresponding publications [(C. J. van der Linden et al., 2020); present study]. For both datasets, live/dead selection of cells was performed, which was expected to reduce ambient RNA.

      The revised manuscript also includes a new figure that provides detailed information about OR transcript expression within INP3, iOSN and mOSN cells that express one of two different musk responsive ORs or a randomly chosen OR (Figure 1-figure supplement 2). These data reveal, as reported previously (Bashkirova et al., 2023; Fletcher et al., 2017; Pourmorady et al., 2024), that low levels of multiple OR transcripts are detected in INP3 (Tex15+) cells. By contrast, iOSN (Gap43+) and mOSN (Omp+) cells robustly express a single OR, with little or no expression of other ORs.

      Quantification of cells for Figure 2-7 should be changed. Instead of using cell number per 1/2 section, the data should be calculated using density (using the area of the epithelium or normalized to the total number of cells (based on DAPI staining). This is because multiple sections are taken from the same mouse along the A-P axis. These sections have different sizes and numbers of cells.

      As noted in response to a similar concern of Reviewer #2, this has been addressed in two ways within the revised manuscript:

      (1) We have noted within the Methods section that the approach of using half-sections for normalization has been used in multiple previous studies for quantifying newborn (OR+/EdU+) and total (OR+) OSN abundances (Hossain et al., 2023; Ibarra-Soria et al., 2017; C. van der Linden et al., 2018; C. J. van der Linden et al., 2020). Additionally, within the figure legends and Methods, we have more thoroughly described the approach used, including that it relies on averaging the quantifications from at least 5 high-quality coronal OE tissue sections that are evenly distributed throughout the anterior-posterior length of each OE and thereby mitigates the effects of section size and cell number variation among sections. In the case of UNO treated mice, the open and closed sides within the same section are paired, which further reduces the effects of section-to section variation. We have found that this approach yields reproducible quantities of newborn and total OSNs among biological replicate mice and enables accurate assessment of how quantities of OSNs of specific subtypes change as a result of altered olfactory experience, a key objective of this study.

      (2) To assess whether the use of alternative approaches for normalizing newborn OSN quantities suggested by the reviewers would affect the present study’s findings, we compared three methods for normalizing the effects of exposure to male odors or muscone on quantities of newborn Olfr235 OSNs in the OEs of both UNO-treated and non-occluded mice: 1) OR+/EdU+ OSNs per half-section (used in this study), 2) OR+/EdU+ OSNs per total number of EdU+ cells (reviewer suggestion (i)), and 3) OR+/EdU+ OSNs per unit of DAPI+ area (an approximate measure of nuclei number; reviewer suggestion (ii)). The three normalization methods yielded statistically indistinguishable differences in assessing the effects of exposure of either UNO-treated or non-occluded mice to male odors (newly added Figure 2–figure supplement 2 and Figure 3–figure supplement 2), or of exposure of non-occluded mice to muscone (newly added Figure 4–figure supplement 3). Based on these findings, and the considerable time that would be required to renormalize all data in the manuscript, we have chosen to maintain the use of normalization per half-section.

      References

      Bashkirova, E. V., Klimpert, N., Monahan, K., Campbell, C. E., Osinski, J., Tan, L., Schieren, I., Pourmorady, A., Stecky, B., Barnea, G., Xie, X. S., Abdus-Saboor, I., Shykind, B. M., Marlin, B. J., Gronostajski, R. M., Fleischmann, A., & Lomvardas, S. (2023). Opposing, spatially-determined epigenetic forces impose restrictions on stochastic olfactory receptor choice. eLife, 12, RP87445. https://doi.org/10.7554/eLife.87445

      Coppola, D. M. (2012). Studies of olfactory system neural plasticity: The contribution of the unilateral naris occlusion technique. Neural Plasticity, 2012, 351752. https://doi.org/10.1155/2012/351752

      Fletcher, R. B., Das, D., Gadye, L., Street, K. N., Baudhuin, A., Wagner, A., Cole, M. B., Flores, Q., Choi, Y. G., Yosef, N., Purdom, E., Dudoit, S., Risso, D., & Ngai, J. (2017). Deconstructing Olfactory Stem Cell Trajectories at Single-Cell Resolution. Cell Stem Cell, 20(6), 817-830.e8. https://doi.org/10.1016/j.stem.2017.04.003

      Han, X., Jiang, Y., Feng, N., Yang, P., Zhang, M., Jin, W., Zhang, T., Huang, Z., Zhao, H., Zhang, K., Liu, S., & Hu, D. (2022). Comparison of the Homology Between Muskrat Scented Gland and Mouse Preputial Gland. Journal of Mammalian Evolution, 29(2), 435–446. https://doi.org/10.1007/s10914-022-09604-w

      Hanchate, N. K., Kondoh, K., Lu, Z., Kuang, D., Ye, X., Qiu, X., Pachter, L., Trapnell, C., & Buck, L. B. (2015). Single-cell transcriptomics reveals receptor transformations during olfactory neurogenesis. Science (New York, N.Y.), 350(6265), 1251–1255. https://doi.org/10.1126/science.aad2456

      Hossain, K., Smith, M., & Santoro, S. W. (2023). A histological protocol for quantifying the birthrates of specific subtypes of olfactory sensory neurons in mice. STAR Protocols, 4(3), 102432. https://doi.org/10.1016/j.xpro.2023.102432

      Ibarra-Soria, X., Nakahara, T. S., Lilue, J., Jiang, Y., Trimmer, C., Souza, M. A., Netto, P. H., Ikegami, K., Murphy, N. R., Kusma, M., Kirton, A., Saraiva, L. R., Keane, T. M., Matsunami, H., Mainland, J., Papes, F., & Logan, D. W. (2017). Variation in olfactory neuron repertoires is genetically controlled and environmentally modulated. eLife, 6. https://doi.org/10.7554/eLife.21476

      Kelemen, G. (1947). The junction of the nasal cavity and the pharyngeal tube in the rat. Archives of Otolaryngology, 45(2), 159–168. https://doi.org/10.1001/archotol.1947.00690010168002

      Lin, D. Y., Zhang, S.-Z., Block, E., & Katz, L. C. (2005). Encoding social signals in the mouse main olfactory bulb. Nature, 434(7032), 470–477. https://doi.org/10.1038/nature03414

      McIntyre, J. C., Titlow, W. B., & McClintock, T. S. (2010). Axon growth and guidance genes identify nascent, immature, and mature olfactory sensory neurons. Journal of Neuroscience Research, 88(15), 3243–3256. https://doi.org/10.1002/jnr.22497

      Pourmorady, A. D., Bashkirova, E. V., Chiariello, A. M., Belagzhal, H., Kodra, A., Duffié, R., Kahiapo, J., Monahan, K., Pulupa, J., Schieren, I., Osterhoudt, A., Dekker, J., Nicodemi, M., & Lomvardas, S. (2024). RNA-mediated symmetry breaking enables singular olfactory receptor choice. Nature, 625(7993), 181–188. https://doi.org/10.1038/s41586-023-06845-4

      Saraiva, L. R., Ibarra-Soria, X., Khan, M., Omura, M., Scialdone, A., Mombaerts, P., Marioni, J. C., & Logan, D. W. (2015). Hierarchical deconstruction of mouse olfactory sensory neurons: From whole mucosa to single-cell RNA-seq. Scientific Reports, 5, 18178. https://doi.org/10.1038/srep18178

      Sato-Akuhara, N., Horio, N., Kato-Namba, A., Yoshikawa, K., Niimura, Y., Ihara, S., Shirasu, M., & Touhara, K. (2016). Ligand Specificity and Evolution of Mammalian Musk Odor Receptors: Effect of Single Receptor Deletion on Odor Detection. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 36(16), 4482–4491. https://doi.org/10.1523/JNEUROSCI.3259-15.2016

      Scholz, P., Kalbe, B., Jansen, F., Altmueller, J., Becker, C., Mohrhardt, J., Schreiner, B., Gisselmann, G., Hatt, H., & Osterloh, S. (2016). Transcriptome Analysis of Murine Olfactory Sensory Neurons during Development Using Single Cell RNA-Seq. Chemical Senses, 41(4), 313–323. https://doi.org/10.1093/chemse/bjw003

      Schwende, F. J., Wiesler, D., Jorgenson, J. W., Carmack, M., & Novotny, M. (1986). Urinary volatile constituents of the house mouse,Mus musculus, and their endocrine dependency. Journal of Chemical Ecology, 12(1), 277–296. https://doi.org/10.1007/BF01045611

      Shirasu, M., Yoshikawa, K., Takai, Y., Nakashima, A., Takeuchi, H., Sakano, H., & Touhara, K. (2014). Olfactory receptor and neural pathway responsible for highly selective sensing of musk odors. Neuron, 81(1), 165–178. https://doi.org/10.1016/j.neuron.2013.10.021

      Tan, L., Li, Q., & Xie, X. S. (2015). Olfactory sensory neurons transiently express multiple olfactory receptors during development. Molecular Systems Biology, 11(12), 844. https://doi.org/10.15252/msb.20156639

      van der Linden, C. J., Gupta, P., Bhuiya, A. I., Riddick, K. R., Hossain, K., & Santoro, S. W. (2020). Olfactory Stimulation Regulates the Birth of Neurons That Express Specific Odorant Receptors. Cell Reports, 33(1), 108210. https://doi.org/10.1016/j.celrep.2020.108210

      van der Linden, C., Jakob, S., Gupta, P., Dulac, C., & Santoro, S. W. (2018). Sex separation induces differences in the olfactory sensory receptor repertoires of male and female mice. Nature Communications, 9(1), 5081. https://doi.org/10.1038/s41467-018-07120-1

      Verhaagen, J., Oestreicher, A. B., Gispen, W. H., & Margolis, F. L. (1989). The expression of the growth associated protein B50/GAP43 in the olfactory system of neonatal and adult rats. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 9(2), 683–691.

      Vihani, A., Hu, X. S., Gundala, S., Koyama, S., Block, E., & Matsunami, H. (2020). Semiochemical responsive olfactory sensory neurons are sexually dimorphic and plastic. eLife, 9, e54501. https://doi.org/10.7554/eLife.54501

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, Le et al.. aimed to explore whether AAV-mediated overexpression of Oct4 could induce neurogenic competence in adult murine Müller glia, a cell type that, unlike its counterparts in cold-blooded vertebrates, lacks regenerative potential in mammals. The primary goal was to determine whether Oct4 alone, or in combination with Notch signaling inhibition, could drive Müller glia to transdifferentiate into bipolar neurons, offering a potential strategy for retinal regeneration.

      The authors demonstrated that Oct4 overexpression alone resulted in the conversion of 5.1% of Müller glia into Otx2+ bipolar-like neurons by five weeks post-injury, compared to 1.1% at two weeks. To further enhance the efficiency of this conversion, they investigated the synergistic effect of Notch signaling inhibition by genetically disrupting Rbpj, a key Notch effector. Under these conditions, the percentage of Müller gliaderived bipolar cells increased significantly to 24.3%, compared to 4.5% in Rbpjdeficient controls without Oct4 overexpression. Similarly, in Notch1/2 double-knockout Müller glia, Oct4 overexpression increased the proportion of GFP+ bipolar cells from 6.6% to 15.8%.

      To elucidate the molecular mechanisms driving this reprogramming, the authors performed single-cell RNA sequencing (scRNA-seq) and ATAC-seq, revealing that Oct4 overexpression significantly altered gene regulatory networks. They identified Rfx4, Sox2, and Klf4 as potential mediators of Oct4-induced neurogenic competence, suggesting that Oct4 cooperates with endogenously expressed neurogenic factors to reshape Müller glia identity.

      Overall, this study aimed to establish Oct4 overexpression as a novel and efficient strategy to reprogram mammalian Müller glia into retinal neurons, demonstrating both its independent and synergistic effects with Notch pathway inhibition. The findings have important implications for regenerative therapies as they suggest that manipulating pluripotency factors in vivo could unlock the neurogenic potential of Müller glia for treating retinal degenerative diseases.

      Strengths:

      (1) Novelty: The study provides compelling evidence that Oct4 overexpression alone can induce Müller glia-to-bipolar neuron conversion, challenging the conventional view that mammalian Müller glia lacks neurogenic potential.

      (2) Technological Advances: The combination of Muller glia-specific labeling and modifying mouse line, AAV-GFAP promoter-mediated gene expression, single-cell RNA-seq, and ATAC-seq provides a comprehensive mechanistic dissection of glial reprogramming.

      (3) Synergistic Effects: The finding that Oct4 overexpression enhances neurogenesis in the absence of Notch signaling introduces a new avenue for retinal repair strategies.

      Weaknesses:

      (1) In this study, the authors did not perform a comprehensive functional assessment of the bipolar cells derived from Müller glia to confirm their neuronal identity and functionality.

      (2) Demonstrating visual recovery in a bipolar cell-deficiency disease model would significantly enhance the translational impact of this work and further validate its therapeutic potential.

      Response: We thank the Reviewer for their evaluation. We agree that functional analysis of Müller glia-derived bipolar cells is indeed important, but is beyond the current scope of the manuscript.

      Reviewer #2 (Public review):

      Summary:

      The authors harness single-cell RNAseq data from zebrafish and mice to identify Oct4 as a candidate driver of neurogenesis. They then use adeno-associated virus vectors to show that while Oct4 overexpression alone converts rare adult Müller glia (MG) to bipolar cells, it synergizes with Notch pathway inhibition to cause this neurogenesis (achieved by Cre-mediated knockout of Rbpj floxed allele). Importantly, they genetically lineage-mark adult MG using a GLAST-CreER transgene and a Sun-GFP reporter, so that any non-MG cells that convert can be identified unambiguously. This is crucial because several high-profile papers made erroneous claims using short promoters in the viral delivery vector itself to mark MG, but those promoters are leaky and mark other non-MG cell types, making it impossible to definitively state whether manipulations studied were actually causing neurogenesis, or were merely the result of expression in pre-existing neurons. Once the authors establish Oct4 + RbpjKO synergy they use snRNAseq/ATACseq to identify known and novel transcription factors that could play a role in driving neurogenesis.

      Strengths:

      The system to mark MG is stringent, so the authors are studying transdifferentiation, not artifactual effects due to leaky viral promoters. The synergy between Oct4 and Notch pathway blockade is notable. The single-cell results add the potential involvement of new players such as Rfx4 in adult-MG-neurogenesis.

      Weaknesses:

      The existing version is difficult to read due to an unusually high number of text errors (e.g. references to the wrong figure panels etc.). A fuller explanation for the fraction of non-MG cells seen in control scRNAseq assays is required, particularly because the neurogenic trajectory which is enhanced in the Oct4/Rbpj-KO context is also evident in the control retina. Claims regarding the involvement of transcription factors in adult neurogenesis (such as Rfx4) need to be toned down unless they are backed up with functional data. It is possible that such factors are important, but equally, they may have no role or a redundant role, and without functional tests, it's impossible to say one way or the other.

      Overall, the authors achieved what they set out to do, and have made new insights into how neurogenesis can be stimulated in MG. Ultimately, a major long-term goal in the field is to replace lost photoreceptors as this is most relevant to many human visual disorders, and while this paper (like all others before it) does not generate rods or cones, it opens new strategies to coax MG to form a related neuronal cell type. Their approach underscores the benefits of using a gold-standard approach for lineage tracing.

      We thank the Reviewer for their evaluation. We have made extensive changes to the manuscript to correct errors and modify discussion as recommended. These are detailed below in our point-by-point responses to specific recommendations to the authors.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor corrections:

      (1) In Figure 1C top GFAP-mCherry panel, two dim GFP + cells have colocalized with Otx2, is it caused by optic imaging thickness or some muller glia cells having the Otx2 expression?

      This indeed reflects the effects of optic imaging thickness. Colocalization of Sun1-GFP and Otx2 is not observed when Z-stack images are examined in GlastCreER;Sun1-GFP retinas. This can also be appreciated by the fact that, in cases of apparent overlap of nuclear envelope-targeted Sun1 and Otx2, the sizes of the labeled areas differ. In cases of true expression overlap, such as is seen following Oct4 overexpression, the labeled areas are the same size, or very nearly so.

      Whether the Glast-CreERT2 x Rosa26-LSL-Sun1-GFP mouse line has cross-labeling with the Otx2+ bipolar cells, the author should image the mCherry ctrl sample with a thin optical imaging layer with a small pinhole for Z-stack to verify the co-labeling the GFP and Otx2 in mCherry ctrl sample.

      Please see above. Since we first described this line (de Melo, et al. 2012), we have examined thousands of sections of GlastCreER;Sun1-GFP retinas, and have yet to see a single GFP-positive neuron. To avoid confusion, however, we have replaced these images with an additional image from a control mCherry-infected GlastCreER;Sun1-GFP retina processed for the same study.

      In the middle upper panel, Oct4-mCherry group, the white arrows indicate the GFP colocalized with Otx2 signal, but seems not mCherry positive, by contrast, the neighbor cells have significant mCherry expression but no colocalization with Otx2. The GFAP promoter-Oct4-mCherry may have stopped expression after the Müller Glia cells were converted into Otx2+ bipolar cells, but is there any middle stage in which the Oct4mCherry and Otx2 co-expression? And after Müller glia to Bipolar conversion, why have Glast-CreERT2 driven GFP expressions not suppressed as GFAP promoter driven Oct4-mCherry? Could the author discuss this point?

      We observed a significant number of Muller glia-derived cells expressing both Otx2 and weak mCherry signal. GFP expression is driven by the ubiquitous CAG promoter following Cre-dependent excision of a transcriptional stop cassette. We have modified the text to make this point explicit.

      (2) In Figure S2b, the mouse is labeled with wild type; I assume it should be the same mouse line as Fig.1. Otherwise, the author should describe the source of the GFP signal.

      “Wildtype” in this case refers to GlastCreER;Sun1-GFP controls, which as the Reviewer correctly points out, are not truly wildtype. The genotype of these animals is specified in all figure legends, and is now referred to as “control” rather than “wildtype” in the figures and main text throughout.

      In Figure S2k and l, mCherry ctrl panel, the GFP+ cells looked co-labeling with Otx2, so again, is it the thicker optical imaging layer that caused overlapping vertically or the low specific of Müller Glia of the mouse line? Please describe the stars' meaning in Figure S2i,j in the figure legend. There are 2 figures labeled "n" of the quantification data.

      This is, again, an example of the thicker optical imaging layer causing apparent overlap. We have previously demonstrated that the Sun1-GFP+ cells do not co-label with Otx2 in GFAP-mCherry AAV-injected control retinas (Le et al., 2022; Fig. 2C). The asterisks (*) indicate mouse-on-mouse vascular staining, which is now clarified in the figure legend. The 2 figures labeled ‘n’ have been relabeled as ‘m’ and ‘n’.

      (3) In Figure 2c in the top panel, the Otx2 image was wrong; please replace it with the correct one.

      We thank the Reviewer for spotting this error. This is an inadvertent duplication of the single-channel Otx2 staining for mCherry control sample. We have replaced this with the correct image.

      (4) In Figure 3a, the Rbpj-cKO mouse line was used, but where was the GFP signal from? Please verify the mouse line you used in your work. The same question is also asked in Figure S3, S4b.

      GlastCreER;Rpbj<sup>lox/lox</sup>;Sun1-GFP were used in Figure 3a. As now specified in the Methods and all figure legends, all mice used in this study carry both the GlastCreER and Sun1-GFP transgenes.

      (5) In Figure S4c,d, and 5 wks time point, if the authors quantify the GFP+/Sox2- cells changing, it will be more helpful to understand the percentage of the Müller glia cells conversion to Bipolar cells compared to the Figure 2D, and can be as a supplement to the conclusion Müller to Bipolar conversion rather the Müller proliferation.

      Sox2-/GFP+ cells are a measure of Müller glia to bipolar cell conversion that complements that of GFP+/Otx2+ cells. This is now clarified in the text. We also include quantification of Sox2-/GFP+ neurons at 5 weeks post-injury in Fig. S5b.

      (6) In Figure S1b,c, there is a large portion of cells that are activated Müller glia after NMDA injury. Did the activated Müller glial cells lose their Müller glial identity? Between the loss of Müller glial identity and neuronal reprogramming, are there any markers that can be used to assess whether Müller glial cells are truly transdifferentiating into neurons rather than remaining in a reactive glial state or an intermediate phase?

      Wildtype Müller glia progressively revert to resting state, and by 72 hours post-injury have already lost expression of Klf4 and Myc (Hoang, et al. 2020), a point which is now specifically mentioned in the text. In GlastCreER;Sun1-GFP;Nfia/b/x<sup>lox/lox</sup>;Rbpj<sup>lox/lox</sup> Müller glia, reactive MG appear to largely convert to bipolar and amacrine-like cells, and it remains unclear if they eventually revert to a resting state (Le, et al. 2024).

      Reviewer #2 (Recommendations for the authors):

      This work demonstrates that Oct4 (Pou5f3) can induce neurogenesis in murine Müller glia (MG). Le et al start by showing that murine and zebrafish MG lack expression of Oct4 (Pou5f3) and its target Nanog. To assess the effect of Oct4 they first label adult MG with Sun1-GFP using tamoxifen-treated GlastCreER;Sun1-GFP mice, then later transduce in vivo with AAV vectors expressing mCherry alone or Oct4 + mCherry. Subsequently, they damage the retina with NMDA and assess the effects several weeks later. In Oct4+ cells at 2 weeks there is rare induction of the neural determinant Ascl1, down-regulation of the MG marker Sox2, induction of bipolar markers (Otx2, Scgn,Cabp5) but not amacrine (HuC/D) or rod (Nrl) markers. Combining Oct4 with

      Notch inhibition (deleting floxed Rbpj) synergistically increases bipolar cell induction, with Otx2 staining rising to >20% of GFP-marked cells, and cells losing MG identify (loss of Sox2/9). EdU labeling was negligible suggesting direct trans-differentiation. Similar synergy was seen upon combining Oct4 expression with Notch1/2 double gene knockout. Attempts to combine Oct4 with Nfia, Nfib, and Nfix loss were unsuccessful as the GFAP promoter driving Oct4 in MG seems to require these three related transcription factors. scRNAseq confirmed the Oct4-overexpression/Rbpj-KO-driven increase in bipolar cells and decrease in MG cells and revealed that these manipulations may enhance bipolar cell genesis by repressing genes that define quiescent MG and enhancing expression of genes that define reactive MG and neurogenic cells. Finally, multiomic snRNA/scATAC-seq data was performed to assess the effect of Oct2 in wt or Rbpj null MG. This approach revealed that, as anticipated, more genes were up and down-regulated in the context of both manipulations vs Oct4 OE alone. Moreover, Oct4 and Rbpj KO reduced chromatin accessibility at target motifs for transcription factors involved in MG identify/quiescence, while MGPCs showed elevated accessibility for neurogenic factors. The combination of Oct4 OE and Rbpj KO induces accessibility at various interesting TF sites that may contribute to the synergistic neurogenesis, including Rfx4, Klf4, Insm1, and others.

      This is an interesting paper that adds to the growing literature on how neurogenesis can be induced in mammalian MG. The focus on Oct4 is interesting and the synergistic effects are striking and analyzed in some detail with scRNAseq and multiomic snRNA/scATACseq. The latter results provide useful new insight into transcriptional programs that may be critical in driving neurogenesis. Functional insight into these new candidates is not explored in this manuscript, but that's beyond the scope of the current work and forms the basis for new studies. There are some overreaching statements in the Discussion that need to be toned down, but apart from that and a long list of textual errors that need to be fixed, this paper is a valuable contribution to the field.

      Major comments

      There are numerous textual errors (some, but not all, examples are detailed in minor comments). It was difficult to follow this paper given the unusually high number of textual errors and the abbreviated legends. Greater attention should be paid to harmonizing the text with the figures and ensuring that the legends are correct and complete.

      The manuscript has been proofread carefully and errors corrected.

      The opening section of the scRNAseq data should outline briefly why sorting for GFP labeled cells purifies a significant fraction of non-MG cell types, despite the earlier claim, (which agrees with other publications), that GLAST-CreER transgene expression is highly specific to MG. Presumably, it mainly/totally reflects the co-purification of cells, cell fragments, and/or cell-free mRNA from other lineages. Is it also possible that a fraction (however small) of these cells reflect low-level spurious/temporary activation of GLAST-CreER expression in non-MG? The "contamination" is present despite the addition of the GFP sequence to the reference genome (as explained in Methods). They mention: "a clear differentiation trajectory connecting Muller glia, neurogenic Muller gliaderived progenitor cells (MGPCs), and differentiating amacrine and bipolar cells (Fig. 3b)". However, the same trajectory is evident in control mCherry samples, so one could argue that this trajectory is active in normal retina at some low rate, but that would/should equate to rare sun-GFP+ non-MG in controls. Are there any such cells, even extremely rarely, or is it truly 0%? At any rate, the authors need to raise these concerns and offer some explanation(s) at the start of their scRNAseq Results section. If there are really no such sun-GFP+ cells, the authors should comment on the presence of the apparent inactive trajectory in the Discussion.

      Since we first described this line (de Melo, et al. 2012), we have examined thousands of sections of GlastCreER;Sun1-GFP retinas, and have yet to see a single GFP-positive neuron. We have also previously shown (Hoang, et al. 2020) that FACSbased isolation of GFP-positive cells from GlastCreER;Sun1-GFP yields a roughly thirty-fold enrichment of Muller glia, implying the presence of small numbers of contaminating neurons. We thereby conclude that the presence of small numbers of neurons (rods, cones, bipolar, and amacrine cells) in the control GlastCreER;Sun1-GFP represents contamination rather than low levels of glia-to-neuron conversion, particularly since we are unable to detect the expression of genes such as neurogenic bHLH factors or immature photoreceptor precursor-specific factors such as Prdm1 that indicate the presence of intermediate cell states. This is now addressed in the Results section related to both Figures 3 and 4.

      Discussion:

      In reference to other strategies to induce neurogenesis the authors make the claim that Oct4 is fundamentally different: "In these cases, Müller glia broadly upregulate proneural genes and/or downregulate Notch signaling. Oct4 instead induces expression of the neurogenic transcription factor Rfx4, which is not expressed in developing retina. It is likely that activation of this parallel pathway to neurogenic competence in part accounts for synergistic induction of neurogenesis seen in Rbpj-deficient Müller glia". First, all these strategies, including Oct4, seem to activate bHLH factors, so they have that in common and the authors should note that overlap. More seriously, without functional tests (e.g. KO Rfx4) the authors need to dial back the over-reaching statement that Rfx4 is the fundamental mechanism driving the Oct4 effect. They can certainly suggest that this is one possibility, but equally, Rfx4 may have very little or no effect on neurogenesis, or it could act redundantly with some of the other factors the authors uncovered. It's impossible to know without functional data, so they either need to add the functional data, or hold back on the strong one-sided and overreaching claim.

      Since both Rfx4 expression and motif accessibility are selectively observed following Oct4 overexpression, and Rfx4 also has known neurogenic activity, we stand by our conclusion that it is a particularly strong candidate for mediating the neurogenic effects of Oct4 overexpression. However, the Reviewer is correct that in the absence of functional data, speculation about its function should be qualified. We have done this in the revised manuscript.

      Minor comments

      This sentence in the Results is confusing: "While expression of neurogenic bHLH factors driven by the Gfap promoter was rapidly silenced in Muller glia and activated in amacrine and retinal ganglion cells, Gfap-Oct4-mCherry remained selectively expressed in Muller glia but did not induce detectable levels of Muller glia-derived neurogenesis in the uninjured retina (Le et al., 2022)". The cited reference is at the end so it sounds like the Oct4 assay was performed in Le et al 2022, and there is no reference to a Figure for the Oct4 data in the current paper.

      As stated here, in Le, et al. 2022, we did not observe any conversion of Sun1-GFP-positive Muller glia to neurons in the absence of injury. In the current study, we instead test whether NMDA-induced excitotoxicity induced glia to neuron conversion in Muller glia overexpressing Oct4. This is now made clear in the revised text.

      There are many errors and omissions regarding Figure S2:

      Figure S2a, b legend, and panels do not match. 2a should be a schematic of the strategy to label MG with Sun1-GFP using GLAST-Cre and a floxed Sun1-GFP allele, but that's missing and instead, the current 2a is a schematic of AAV vectors. It seems that the current 2b legend may describe the combination of the current 2a and 2b panels.

      This has been corrected.

      Figure S2: Asterisks label certain stained elements in the Oct4 labeled panels, but there is no explanation in the legend. Are these meant to indicate non-specific staining? If so, what is the evidence that the signal is non-specific?

      These asterisks represent non-specific mouse-on-mouse vascular staining observed with the mouse monoclonal anti-Oct4 used in this study. This is now indicated in the figure legend.

      The text refers to Ascl1 staining in Figure S2e,f, but it's S2g,h.

      This has been corrected.

      Re this: "While Sun1-GFP-positive cells infected with Oct4-mCherry mostly express the Muller glial marker Sox2 (Fig. S2a,b), from 2 weeks post-injury onwards a subset of GFP positive cells did not show detectable Sox2 expression (Fig. S2b, yellow arrows)". Figure S2a, b are schematic diagrams, not immunofluorescence. They probably mean Figure S2c, d.

      This has been corrected.

      Fig S2m is mislabeled "n".

      This has been corrected.

      There are probably other errors with this figure, but I mostly gave up at this point. The authors should go through the paper to find and correct any additional mistakes/omissions in the text and legends.

      The manuscript has been carefully proofread and errors corrected.

      The figure panels are not always mentioned in the order that they appear. There are many examples.

      Figure panels are now mentioned in the order that they appear.

      Several schematics use "d-18-14" to indicate "day -18 to -14". The former is at first uninterpretable or at best unclear (could mean day -18 to day 14), perhaps d -18 to -14, or d -18:-14 would be clearer.

      This has been corrected.

      Re: "AAV-infected wildtype Muller glia could be readily identified by selective expression of Oct4 (Fig. 4e). Wildtype Oct4-expressing Muller glia give rise to both small numbers of neurogenic MGPCs (Fig. 4b),". Figure 4E is labeled Pou5f1, but it would be helpful to avoid confusion by also indicating on the figure that Pou5f1 = Oct4; and Fig 4b does not indicate neurogenic MGPCs (perhaps they mean 4c).

      This has been corrected.

      Some parts of the Results are written in the present tense and should be in the past tense (for guidance: https://www.nature.com/scitable/topicpage/effective-writing13815989/).

      Past tense is now used throughout.

      Pit1 (Pou1f1) is referred to as a "close variant" of Oct4/Pou4f5, but this is unclear (e.g. variant could mean a splice variant from the same locus) and the term "paralogue" should be used.

      “Paralogue” is now used in this context.

      Re: "Infection with Oct4-mCherry vector induced both Oct4 (Fig. S5e) and Ascl1 (Fig. S5d) expression in Notch1/2-deficient Müller glia." Supplementary image 5d is the one depicting Oct4 and 5e is the one showing Ascl1. However, the reference is reversed.

      This has been corrected.

    1. Author response:

      The following is the authors’ response to the current reviews.

      We deeply appreciate the reviewer’s careful review and critiques. These are excellent critiques that we are working on and probably require a few more years of work. Published together, we believe these critiques add value to our manuscript.


      The following is the authors’ response to the original reviews.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Yu and coworkers investigates the potential role of Secretory leukocyte protease inhibitor (SLPI) in Lyme arthritis. They show that, after needle inoculation of the Lyme disease (LD) agent, B. burgdorferi, compared to wild type mice, a SLPI-deficient mouse suffers elevated bacterial burden, joint swelling and inflammation, pro-inflammatory cytokines in the joint, and levels of serum neutrophil elastase (NE). They suggest that SLPI levels of Lyme disease patients are diminished relative to healthy controls. Finally, they find that SLPI may interact directly the B. burgdorferi.

      Strengths:

      Many of these observations are interesting and the use of SLPI-deficient mice is useful (and has not previously been done).

      Weaknesses:

      (a) The known role of SLPI in dampening inflammation and inflammatory damage by inhibition of NE makes the enhanced inflammation in the joint of B. burgdorferi-infected mice a predicted result; (b) The potential contribution of the greater bacterial burden to the enhanced inflammation is acknowledged but not experimentally addressed; (c) The relationship of SLPI binding by B. burgdorferi to the enhanced disease of SLPI-deficient mice is not addressed in this study, making the inclusion of this observation in this manuscript incomplete; and (d) assessment of SLPI levels in healthy controls vs. Lyme disease patients is inadequate.

      We greatly appreciate the critiques, and we do agree. Even though the observation of NE level is predictable, we believe that it is important to actually demonstrate it in the context of murine Lyme arthritis. The function of SLPI goes beyond inhibiting NE level.  As an ongoing project in our lab, we believe that the current study serves as a good starting point to explore the pleiotropic effects SLPI in the pathogenesis of murine Lyme arthritis and in patients. And, the critiques here are of great value to our research.

      Comments on revised version:

      Several of the points were addressed in the revised manuscript, but the following issues remain:

      Previous point that the relationship of SLPI binding to B. burgdorferi to the enhanced disease of SLPI-deficient mice is not investigated: The authors indicate that such investigations are ongoing. In the absence of any findings, I recommend that their interesting BASEHIT and subsequent studies be presented in a future study, which would have high impact.

      We thank the reviewer for the critique. We do agree that this part of the story is not complete. However, we would like to keep the BASEHIT and binding data in the paper, as we believe that it is an important finding. We confirmed the binding using ELISA, flow cytometry, and immunofluorescent microscopy. We showed that the binding is specific to infectious strain of B. burgdorferi, thus likely to contribute to the pathogenesis of murine Lyme arthritis. Our data suggest that SLPI can directly interact with a B. burgdorferi protein. We are exploring the biological significance of the binding. And this finding can be further explored by other labs too.

      Previous recommendation 1: (The authors added lines 267-68, not 287-68). This ambiguity is acknowledged but remains. In addition, in the revised manuscript, the authors state "However, these data also emphasize the importance of SLPI in controlling the development of inflammation in periarticular tissues of B. burgdorferi-infected mice." Given acknowledged limitations of interpretation, "suggest" would be more appropriate than "emphasize".

      We thank the reviewer for the careful reading, and we apologize for the mistake. The change has been made accordingly (line 268).

      Previous recommendation 5: The lack of clinical samples can be a challenge. Nevertheless, 4 of the 7 samples from LD patients are from individuals suffering from EM rather than arthritis (i.e., the manifestation that is the topic of the study) and some who are sampled multiple times, make an objective statistical comparison difficult. I don't have a suggestion as to how to address the difference in number of samples from a given subject. However, the authors could consider segregating EM vs. LA in their analysis (although it appears that limiting the comparison between HC and LA patients would not reveal a statistical difference).

      We thank the reviewer for the critique. And we agree with the reviewer that the patient’s data presented are not ideal. We believe that at this point the combination of the samples is most logical, as the number of samples we have from patients with Lyme arthritis is fairly limited. We stated the limitation in the discussion. We do believe that the finding of the correlation is important. It suggests the potential function of SLPI in patients, beyond murine infection.

      What’s more, various groups with large number of different samples can elucidate the relationship further.

      Previous recommendation 6: Given that binding of SLPI to the bacterial surface is an essential aspect of the authors' model, and that the ELISA assay to indicate SLPI binding used cell lysates rather than intact bacteria, a control PI staining to validate the integrity of bacteria seems reasonable.

      We appreciate the suggestion and has provided the propidium iodide staining in Supplemental Figure 5 (line 539-542, 568-569, 718-722).

      Previous recommendation 8: The inclusion of a no serum control (that presumably shows 100% viability) would validate the authors' assertion that 20% serum has bactericidal activity.

      We appreciate the suggestion. As stated in the manuscript (line 583-584), the percent viability was normalized to the control spirochetes culture without any treatment. Thus, the control spirochetes culture, without serum and SLPI treatment, showed 100% viability. We have revised Supplemental Figure 3 accordingly.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The paper proposes an interesting perspective on the spatio-temporal relationship between FC in fMRI and electrophysiology. The study found that while similar networks configurations are found in both modalities, there is a tendency for the networks to spatially converge more commonly at synchronous than asynchronous timepoints. However, my confidence in the findings and their interpretation is undermined by an incomplete justification for the expected outcomes for each of the proposed scenarios.

      As detailed below, the reviewer’s comment motivated us to conduct simulations to establish the relationship between the scenarios that we seek to adjudicate and the empirical outcomes.

      Main Concern

      Fig 1 makes sense to me conceptually, including the schematics of the trajectories, i.e.:

      - Scenario1. Temporally convergent, same trajectories through connectome state space

      - Scenario2. Temporally divergent, different trajectories through connectome state space

      However, based on my understanding (and apologies if I am mistaken), I am concerned that these scenarios do not necessarily translate into the schematic CRP plots shown in fig 2C, or the statements in the main text, i.e.:

      - For scenario1, "epochs of cross-modal spatial similarity should occur more frequently at on-diagonal (synchronous) than off-diagonal (asynchronous) entries, resulting in an on-/off-diagonal ratio larger than unity"

      - For scenario2, "epochs of spatial similarity could occur equally likely at on-diagonal and off-diagonal entries (ratio≈1)"

      Where do the authors get these statements and the schematics in fig2C from? They do not seem to be fully justified via previous literature, theory, or simulations?

      In particular, I am not convinced based on the evidence currently in the paper, that the ratio of off- to on-diagonal entries (and under what assumptions) is a definitive way to discriminate between scenarios 1 and 2.

      For example, what about the case where the same network configuration reoccurs in both modalities at multiple time points. It seems to me that you would get a CRP with entries occurring equally on the on-diagonal as on the off-diagonal, regardless of whether the dynamics are matched between the two modalities or not (i.e. regardless of scenario 1 or 2 being true).

      This thought experiment example might have a flaw in it, and the authors might ultimately be correct, but nonetheless a systematic justification needs to be provided for using the ratio of off- to on-diagonal entries to discriminate between scenario 1 and 2 (and under what assumptions it is valid).

      Thank you for raising this important point. In response, we have now included simulation results to complement our earlier authors’ response, which provided literature references and a theoretical explanation of the on-/off-diagonal ratio metric.

      In the absence of theory, the authors could use surrogate data for scenario 1 and 2. For example:

      a. For scenario 1, run the CRP using a single modality. E.g. feed in the EEG into the analysis as both modality 1 AND modality 2. This should provide at least one example of CRP under scenario 1 (although it does not ensure that all CRPs under this scenario will look like this, it is at least a useful sanity check).

      Note: This simulation was included in the previous round of author’s responses.

      b. For scenario 2, run the CRP using a single modality plus a shuffled version. E.g. feed in the EEG into the analysis as both modality 1 AND a temporally shuffled version of the EEG as modality 2. The temporal shuffling of the EEG could be done by simple splitting the data into blocks of say ~10s and then shuffling them into a new order. This should provide a version of the CRP under scenario 2 (although it does not ensure that all CRPs under this scenario will look like this, it is at least a useful sanity check)

      The authors have provided CRP plots for option a. It shows a CRP, as expected, consistent with scenario 1. This is a useful sanity check. However, as mentioned above, it does not ensure that all CRPs under this scenario will look like this.

      However, the authors have not shown a CRP as per option b. As such, there is an incomplete justification for the expected outcomes of the scenarios.

      Note that another option, which has not been carried out, is to use full simulations, with clearly specified assumptions, for scenario1 and 2. One way of doing this is to use a simplified (state-space) setup where you randomly simulate N spatially fixed networks that are independently switching on and off over time (i.e. "activation" is 0 or 1). Note that this would result in a N-dimensional connectome state space.

      Using this, you can simulate and compute the CRPs for the two scenarios:

      a. Scenario 1: where the simulated activation timecourses are set to be the same between both modalities

      b. Scenario 2: where the simulated activation timecourses are simulated separately for each of the modalities

      We followed the reviewer’s suggestion and have now included full simulations to address the concerns regarding the theory of the on-/off-diagonal ratio metric. As recommended, we defined a random quantized signal with N levels to represent the recurrent manifestation of N fixed connectome states. This setup was used to demonstrate the relationship between the two scenarios and the CRP observations used to adjudicate between the scenarios in our paper.

      The CRP matrices in Fig. S10 provide an example illustration of this simulation. In the case where the two state timeseries are identical, there are more co-occurrences of the same state (white entries) on the diagonal than off the diagonal (left subplot). This is in line with Scenario 1, where both spatial and temporal convergence are present. Conversely, in Scenario 2, where state time courses are shuffled, co-occurrences of the same states are more dispersed, and the diagonal prominence vanishes (right subplot). This difference illustrates how the CRP reflects the presence or absence of temporal alignment, dissociating scenarios 1 and 2.

      To quantitively validate this observation, we calculated the on-/off-diagonal ratio across simulations with varying N values. For Scenario 2 (shuffled version), the ratio consistently remained close to 1, indicating the absence of temporal synchronization. In contrast, Scenario 1 (non-shuffled version) produced significantly higher ratios, exceeding 1, confirming the metric's ability to capture meaningful synchrony. These results demonstrate that the simulations successfully replicate the expected relationship between the two scenarios and the CRPs, and validate the theoretical foundation of the ratio metric under the defined assumptions.

      Minor Concern

      Leakage correction. The paper states: "To mitigate this issue, we provide results from source-localized data both with and without leakage correction (supplementary and main text, respectively)." It is great that the authors provide both. However, given that FC in EEG is almost totally dominated by spatial leakage (see Hipp paper), the main results/figures for the scalp EEG should be done using spatial leakage corrected EEG data.

      Thank you. We agree that source leakage is an important consideration, which is why the current work investigated the intracranial EEG-fMRI data as a primary approach and subsequently added the scalp EEG-fMRI approach. While source leakage correction is essential for addressing spurious connectivity, it can also risk removing genuine functional connectivity that includes zero-lag relationships. We are reassured by the observation that the scalp data both without and with leakage correction confirmed the findings of the intracranial data, i.e., the presence of spatial and a lack of temporal cross-modal convergence. As such we do not believe that source leakage had a considerable impact on the specific question at hand.

      Reviewer #2 (Public review):

      Summary:

      The study investigates the brain's functional connectivity (FC) dynamics across different timescales using simultaneous recordings of intracranial EEG/source-localized EEG and fMRI. The primary research goal was to determine which of three convergence/divergence scenarios is the most likely to occur.

      The results indicate that despite similar FC patterns found in different data modalities, the timepoints were not aligned, indicating spatial convergence but temporal divergence.

      The researchers also found that FC patterns in different frequencies do not overlap significantly, emphasizing the multi-frequency nature of brain connectivity. Such asynchronous activity across frequency bands supports the idea of multiple connectivity states that operate independently and are organized into a multiplex system.

      Strengths:

      The data supporting the authors' claims are convincing and come from simultaneous recordings of fMRI and iEEG/EEG, which has been recently developed and adapted.

      The analysis methods are solid and involved a novel approach to analyzing the co-occurrence of FC patterns across modalities (cross-modal recurrence plot, CRP) and robust statistics, including replication of the main results using multiple operationalizations of the functional connectome (e.g., amplitude, orthogonalized, and phase-based coupling).

      In addition, the authors provided a detailed interpretation of the results, placing them in the context of recent advances and understanding of the relationships between functional connectivity and cognitive states.

      The authors also did a control analysis and verified the effect of temporal window size or different functional connecvitity operationalizations. I also applaud their effort to make the analysis code open-sourced.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The authors answer my concerns and they are resolved.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This study investigates alterations in the autophagic-lysosomal pathway in the Q175 HD knock-in model crossed with the TRGL autophagy reporter mouse. The findings provide valuable insights into autophagy dynamics in HD and the potential therapeutic benefits of modulating this pathway. The study suggests that autophagy stimulation may offer therapeutic benefits in the early stages of HD progression, with mTOR inhibition showing promise in ameliorating lysosomal pathology and reducing mutant huntingtin accumulation.

      However, the data raises concerns regarding the strength of the evidence. The observed changes in autophagic markers, such as autolysosome and lysosome numbers, are relatively modest, and the Western blot results do not fully match the quantitative results. These discrepancies highlight the need for further validation and more pronounced effects to strengthen the conclusions. While the study suggests the potential of autophagy regulation as a long-term therapeutic strategy, additional experiments and more reliable data are necessary to confirm the broader applicability of the TRGL/Q175 mouse model.

      Furthermore, the 2004 publication by Ravikumar et al. demonstrated that inhibition of mTOR by rapamycin or the rapamycin ester CCI-779 induces autophagy and reduces the toxicity of polyglutamine expansions in fly and mouse models of Huntington's disease. mTOR is a key regulator of autophagy, and its inhibition has been explored as a therapeutic strategy for various neurodegenerative diseases, including HD. Studies suggest that inhibiting mTOR enhances autophagy, leading to the clearance of mHTT aggregates. Given that dysfunction of the autophagic-lysosomal pathway and lysosomal function in HD is already well-established, and that mTOR inhibition as a therapeutic approach for HD is also known, this study does not present entirely novel findings.

      Major Concerns:

      (1) In Figure 3A1 and A2, delayed and/or deficient acidification of AL causes deficits in the reformation of LY to replenish the LY pool. However, in Figure S2D, there is no difference in AL formation or substrate degradation, as shown by the Western blotting results for CTSD and CTSB. How can these discrepancies be explained?

      We appreciate the reviewer raising this point, and we agree with the concern. Please note that the material used for our immunoblotting was hemibrain homogenates, containing not only neurons but also glial cells, so the results for any protein, e.g., CTSD or CTSB in Fig. S2D, represented combined signals from neurons and glial cells. Our longstanding experience with western blot analysis of autophagy pathway markers is that signals from glial cells significantly interfere with/dilute the signals from neurons. By contrast, the immunofluorescence (IF) results in Fig. 3A, obtained with the assistance of tfLC3 probe and hue angle-based AV/LY subtype analysis, revealed the in situ conditions of the AL and LY within neurons selectively, which reflects the advantage of using the in vivo neuron-specific expression of the LC3 probe combined with IF with a LY marker in this study and our other related studies (Lee, Rao et al. 2019, Lee, Yang et al. 2022) as explained in the Introduction of this paper. Please also refer to a similar discussion regarding the WB-detected protein levels of p-ATG14 in L542-547. 

      (2) The results demonstrate that in the brain sections of 17-month-old TRGL/Q175 mice, there was an increase in the number of acidic autolysosomes (AL), including poorly acidified autolysosomes (pa-AL), alongside a decrease in lysosome (LY) numbers. These AL/pa-AL changes were not significant in 2-month-old or 7-month-old TRGL/Q175 mice, where only a reduction in lysosome numbers was observed. This indicates that these changes, representing damage to the autophagy-lysosome pathway (ALP), manifest only at later stages of the disease. Considering that the ALP is affected predominantly in the advanced stages of the disease (e.g., at 17 months), why were 6-month-old TRGL/Q175 mice selected for oral mTORi INK treatment, and why was the treatment duration restricted to just 3 weeks?

      We thank the reviewer for the comment. A key outcome measure in our evaluation of mTORi treatment was amelioration of mHTT pathology, i.e., mHTT aggregates/IBs. Before conducting the mTORi treatment experiments, we had learned from our assessments of age-associated progression of mHTT aggresomes/IBs in mice of different ages (e.g., 2-, 6-, 10- and 17-mo) that there were already severe mHTT accumulations in Q175 at 10-mo-old (e.g., Fig. 2A). This is consistent with a previous report (Carty, Berson et al. 2015) showing that striatal mHTT inclusions dynamically increase from 4 to 8 months. From a therapeutic point of view, more aggregates in the mouse brain would make it more difficult for the autophagy machinery to clear these aggregates. Thus, the high degree of aggregates in 10- or 17-mo may not be modifiable by the mTORi and/or prevent reliable/sensitive measurements on mTORi-induced phenotype changes. We then preferred to apply the treatment to younger (i.e., 6-mo-old) mice when the mHTT pathology was not so severe, with detectable, albeit mild, ALP abnormality.  Additionally, due to the 2-year funding limit for this project, there was insufficient time to generate a large set of old mice (e.g., ~18-mo) for another drug treatment experiment.  In future studies, it might be worthy to conduct the treatment “in the advanced stages of the disease (e.g., ~18-mo)” to further examine the modification potential of the mTORi on the ALP as well as the HTT aggregations. As for the treatment duration, we were interested in an acute treatment schedule given that, in our dosing tests, we observed rapid responses to the treatment (e.g., target engagement) in a few days even with one dose, and that the 14-15-day treatments produced consistent responses (e.g., Fig. S3A). Long-term treatment, however, would be worthy testing in the future although our current study informs a therapeutic approach that has been suggested by others involving intermittent/pulsatile administration of mTOR inhibitors to minimize side effects of chronic long-term administration.

      (3) Is the extent of motor dysfunction in TRGL/Q175 mice comparable to that in Q175 mice? Does the administration of mTORi INK improve these symptoms?

      Unfortunately, we were unable to investigate motor functions experimentally with specific assays such as open field or rotarod tests in this study (partially affected by the falling of the funded research period within the COVID-19 pandemic peak periods in 2020). Based on our experience in handling the mice, we did not notice any obvious differences between Q175 and TRGL/Q175, and any improvements after the acute mTORi INK treatment.  

      (4) Why is eGFP expression not visible in Fig. 6A in TRGL-Veh mice? Additionally, why do normal (non-poly-Q) mice have fewer lysosomes (LY) than TRGL/Q175-INK mice? IHC results also show that CTSD levels are lower in TRGL mice compared to TRGL/Q175-INK mice. Does this suggest lysosome dysfunction in TRGL-Veh mice?

      We appreciate the reviewer raising this point, which has been corrected (through slightly increasing the eGFP signal in the green channel and the merged channels equally for all genotypes), and the revised Fig. 6A is showing better eGFP signals. Regarding higher LY numbers/CTSD levels in TRGL/Q175-INK compared to the control TRGL-Veh mice, it does not necessarily imply LY dysfunction in TRGL mice, rather, it likely suggests mTORi treatment inducing LY biogenesis. Our original characterization of the TRGL mouse of varying ages, where low expression of the tgLC3 construct, produces only a very small increment of total LC3, resulting in no discernable functional changes in the autophagy pathway (Lee, Rao et al. 2019). The underlying mechanism, e.g., TFEB activation following mTOR inhibition, remains to be investigated in future studies. 

      (5) In Figure 5A, the phosphorylation of ATG14 (S29) shows minimal differences in Western blotting, which appears inconsistent with the quantitative results. A similar issue is observed in the quantification of Endo-LC3.

      We welcome the reviewer’s point, and therefore bands showing bigger differences of p-ATG14 (S29) have been used in the revised Fig. 5A, making the images and the quantitative results more consistent and representative. Similar changes have also been made to the Endo-LC3 data at the bottom of Fig. 5A.

      (6) In Figure S2A and Figure S2B, 17-month-old TRGL/Q175 mice show a decrease in pp70S6K and the p-ULK1/ULK1 ratio, but no changes are observed in autophagy-related markers. Do these results indicate only a slight change in autophagy at this stage in TRGL/Q175 mice? Since the mTOR pathway regulates multiple cellular mechanisms, could mTOR also influence other processes? Is it possible that additional mechanisms are involved?

      We completely agree with the reviewer. As mentioned in the text at multiple locations, LAP alterations in Q175 and TRGL/Q175 mice are mild even at a relatively old age (e.g., 17-mo), especially at the protein levels detected by immunoblotting. We agree that even if the mild alterations in the levels of pp70S6K (T389) and p-ULK1/ULK1 ratio may indicate “a slight change in autophagy”, it may also imply that other cell processes are involved given that mTOR signaling regulates multiple cellular functions. In particular, the p70S6K/p-p70S6K – a mTOR substrate used as a readout for mTOR activity in this study – is a key component of the protein synthesis pathway (Wang and Proud 2006, Magnuson, Ekim et al. 2012) , so its changes may serve as readouts for alterations in not only the autophagy pathway, but also the protein synthesis pathway. [A related discussion about mTOR/protein synthesis pathways, in response to a comment from Reviewer 2, has been incorporated into the text under Discussion, L633-640]

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors have explored the beneficial effect of autophagy upregulation in the context of HD pathology in a disease stage-specific manner. The authors have observed functional autophagy lysosomal pathway (ALP) and its machineries at the early stage in the HD mouse model, whereas impairment of ALP has been documented at the later stages of the disease progression. Eventually, the authors took advantage of the operational ALP pathway at the early stage of HD pathology, in order to upregulate ALP and autophagy flux by inhibiting mTORC1 in vivo, which ultimately reverted back to multiple ALP-related abnormalities and phenotypes. Therefore, this manuscript is a promising effort to shed light on the therapeutic interventions with which HD pathology can be treated at the patient level in the future.

      Strengths:

      The study has shown the alteration of ALP in the HD mouse model in a very detailed manner. Such stage-dependent in vivo study will be informative and has not been done before. Also, this research provides possible therapeutic interventions for patients in the future.

      Weaknesses:

      Some constructive comments and suggestions in order to reflect the key aspects and concepts better in the manuscript :

      (1) The authors have observed lysosome number alteration in a temporally regulated disease stage-specific manner. In this scenario investigation of regulation, localization, and level of TFEB, the transcription factor required for lysosome biogenesis, would be interesting and informative.

      We thank the reviewer for this point and completely agree that exploring TFEBrelated aspects would be interesting which will be investigated in future studies. 

      (2) For the general scientific community better clarification of the short forms will be useful. For example, in line 97, page 4, AP full form would be useful. Also 'metabolized via autophagy' can be replaced by 'degraded via autophagy'.

      We appreciate the reviewer for raising this point. We introduced each abbreviation at the location where the full term first appears and, for the case of “AP”, it was introduced in (previous) Line 69 when “autophagosome” first appears. We agree with the reviewer about easy reading for the general scientific community and thus we have added an Abbreviation section after the Key Words section, listing abbreviations used in this manuscript.

      Also, the word “metabolized” has been replaced with “degraded” as suggested. 

      (3) The nuclear vs cytosolic localization of HTT aggregates shown in Figure 2, are very interesting. The increase in cytosolic HTT aggregate formation at 10 months compared to 6 months probably suggests spatio-temporal regulation of aggregate formation. The authors could comment in a more elaborate manner, on the reason and impact of this kind of regulation of aggregate formation in the context of HD pathology.

      We value the reviewer’s important point. Previous studies have well documented that mHTT aggregates exist in both intranuclear and extranuclear locations in the brains of both human HD and mouse models (DiFiglia, Sapp et al. 1997, Li, Li et al. 1999, Carty, Berson et al. 2015, Peng, Wu et al. 2016, Berg, Veeranna et al. 2024). HTT can travel between the nucleus and cytoplasm and the default location for HTT is cytoplasmic, and thus the occurrence of nuclear mHTT aggregates is considered as a result of dysfunction in the nuclear exporting system for proteins (DiFiglia, Sapp et al. 1995, Gutekunst, Levey et al. 1995, Sharp, Loev et al. 1995, Cornett, Cao et al. 2005) while other factors such as phosphorylation of HTT may also affect nuclear targeting (DeGuire, Ruggeri et al. 2018). Extranuclear aggregates of mHTT usually appear later than nuclear aggregates and develop more aggressively in terms of numbers and pace after their appearance (Li, Li et al. 1999, Carty, Berson et al. 2015, Landles, Milton et al. 2020). The fact that there are neurons containing extranuclear aggregates without having nuclear aggregates within the same cells (Carty, Berson et al. 2015) does not support a nuclear-cytoplasmic sequence for aggregate formation, implying different mechanisms controlling the formation of these two types of aggregates. It was reported that there were no significant differences in toxicity associated with the presence of nuclear compared with extranuclear aggregates (Hackam, Singaraja et al. 1999), while other studies have proposed that nuclear aggregates correlate with transcriptional dysfunction while extranuclear aggregates may impair neuronal communication and can track disease progression (Li, Li et al. 1999, Benn, Landles et al. 2005, Landles, Milton et al. 2020). Thus, the observation of a higher level of extranuclear mHTT aggregates at 10-mo compared to 6-mo from the present study is consistent with previous findings mentioned above. In addition, our EM observations of homogenous granular/short fine fibril ultrastructure of both nuclear and extranuclear aggregates are consistent with findings from mouse model studies (Davies, Turmaine et al. 1997, Scherzinger, Lurz et al. 1997), which, interestingly, is different from in vitro studies where nuclear aggregates exhibited a core and shell structure but extranuclear aggregates did not possess the shell (Riguet, Mahul-Mellier et al. 2021), reflecting differences between in vivo and in vitro conditions. Taken together, even if efforts have been made in this and previous studies in trying to understand the differences between nuclear and extranuclear aggregates, the mechanisms regarding the spatial-temporal regulation of aggregate formation have so far not been fully revealed which will require additional investigations.

      (4) In this manuscript, the authors have convincingly shown that mTOR inhibition is inducing autophagy in the HD mouse model in vivo. On the other hand, mTOR inhibition would also reduce overall cellular protein translation. This aspect of mTOR inhibition can also potentially contribute to the alleviation of disease phenotype and disease symptoms by reducing protein overload in HD pathology. The authors' comments regarding this aspect would be appreciated.

      We recognize the value of the reviewer’s point which we completely agree with. Lowering mHTT via interfering protein translation (e.g., through RNAi, antisense oligonucleotides) has been an attractive strategy in HD therapeutic development (Kordasiewicz, Stanek et al. 2012, Tabrizi, Ghosh et al. 2019).  As mentioned above, mTOR regulates multiple cellular pathways including protein synthesis, and inhibition of mTOR as what was done in the present study is potentially affect protein synthesis as well. While our results of decreases in mHTT signals (Fig. 7) can be interpreted as a result of autophagymediated clearance of mHTT, certainly, a possibility cannot be excluded that mTOR inhibition may result in a reduction in HTT production which may also contribute to the observed results – future studies should determine how significant of such a contribution is. [The above description has been incorporated into the text under Discussion, L633-640] 

      (5) The authors have shown nuclear inclusion formation and aggregation of mHTT and also commented on its potential removal with the UPS system (proteasomal degradation) in vivo. As there is also a reciprocal relationship present between autophagy and proteasomal machineries, upon upregulation of autophagy machinery by mTOR inhibition proteasomal activity may decrease. How nuclear proteasomal activity increases to tackle nuclear mHTT IBs, would be interesting to understand in the context of HD pathology. Comments from the authors in this aspect would clarify the role of multiple degradation pathways in handling mutant HTT protein in HD pathology.

      We appreciate the reviewer raising this point. We agree that there are reciprocal relationships between autophagy and the UPS (Korolchuk, Menzies et al. 2010, Park and Cuervo 2013). In general, failure in one pathway would lead to compensatory upregulation of the other pathway, and vice versa (Lee, Park et al. 2019). So, as the reviewer pointed out, “upon upregulation of autophagy machinery by mTOR inhibition proteasomal activity may decrease”. However, we proposed in the Discussion that “It is possible that stimulation of autophagy is reducing the mHTT in the cytoplasm and thereby partially relieves the burden of the proteasome both in the cytoplasm and in the nucleus so that the nuclear proteasome operates more effectively”, which is inconsistent with the general expectation for a decreased UPS activity. However, please note that there are also instances where two pathways may act in the same direction, e.g., autophagy inhibition disturbs UPS degradative function (Korolchuk, Mansilla et al. 2009, Park and Cuervo 2013). Anyhow, our statement is just speculation, requiring verifications with additional experiments in the future. One of the observations reported here which may support the above speculation is the reductions of AV-non-associated form of mHTT/p62/Ub (Fig. 7B3), given that some of them might exist within the nucleus, whose reduced levels may reflect increased intranuclear UPS activity, besides the other possibility that they may travel from the nucleus to the cytosol for clearance as already discussed inside the text. [The last sentence has been incorporated into the text under Discussion, L628-632]

      (6) For the treatment of neurodegenerative disorders taking the temporal regulation into consideration is extremely important, as that will determine the success rate of the treatments in patients. The authors in this manuscript have clearly discussed this scenario. However, for neurodegenerative disordered patients, in most cases, the symptom manifestation is a late onset scenario. In that case, it will be complicated to initiate an early treatment regime in HD patients. If the authors can comment on and discuss the practicality of the early treatment regime for therapeutic purposes that would be impactful.

      We appreciate the reviewer raising this point and we agree with the main concern that “for neurodegenerative disordered patients, in most cases, the symptom manifestation is a late onset scenario.” This is really a common challenge in the therapeutic fields for neurodegeneration diseases. It should be first noted that the current study is an experimental therapeutical attempt in a mouse model which is consistent with previous reports (Ravikumar, Vacher et al. 2004) as a proof of concept for manipulating autophagy (i.e., via inhibiting mTOR in the current setting) as a potential therapeutic, whose clinical practicality requires further verifications. Moreover, in our opinion, early diagnosis (e.g., genetic testing in individuals with higher risk for HD) may be a key in overcoming the above challenges, i.e., if early diagnosis is enabled, it would become possible for earlier interventions. [The above description has been incorporated into the text under Discussion, L654-659] 

      Recommendations for the authors: 

      Reviewer #1 (Recommendations for the authors):

      Minor concerns:

      (1) Figures 1 and 2 should indicate the number of sections and mice/genotypes.

      Thanks for the suggestion, and the info has been added in the figure legends. 

      (2) Figure 3A2 should explain how AP, AL, pa-AL, and LY are quantified.

      Thanks for raising this point. Please note that the quantitation of AP, AL, pa-AL and LY was performed by the hue angle-based analysis which was described under “Confocal image collection and hue angle-based quantitative analysis for AV/LY subtypes” within the Materials and Methods. A phrase “(see the Materials and Methods)” has been added after the existing description “Hue angle-based analysis was performed for AV/LY subtype determination using the methods described in Lee et al., 2019” in the figure legend.

      References

      Benn, C. L., C. Landles, H. Li, A. D. Strand, B. Woodman, K. Sathasivam, S. H. Li, S. Ghazi-Noori, E. Hockly, S. M. Faruque, J. H. Cha, P. T. Sharpe, J. M. Olson, X. J. Li and G. P. Bates (2005). "Contribution of nuclear and extranuclear polyQ to neurological phenotypes in mouse models of Huntington's disease." Hum Mol Genet 14(20): 3065-3078.

      Berg, M. J., Veeranna, C. M. Rosa, A. Kumar, P. S. Mohan, P. Stavrides, D. M. Marchionini, D.S. Yang and R. A. Nixon (2024). "Pathobiology of the autophagy-lysosomal pathway in the Huntington’s disease brain." bioRxiv: 2024.2005.2029.596470.

      Carty, N., N. Berson, K. Tillack, C. Thiede, D. Scholz, K. Kottig, Y. Sedaghat, C. Gabrysiak, G. Yohrling, H. von der Kammer, A. Ebneth, V. Mack, I. Munoz-Sanjuan and S. Kwak (2015). "Characterization of HTT inclusion size, location, and timing in the zQ175 mouse model of Huntington's disease: an in vivo high-content imaging study." PLoS One 10(4): e0123527.

      Cornett, J., F. Cao, C. E. Wang, C. A. Ross, G. P. Bates, S. H. Li and X. J. Li (2005). "Polyglutamine expansion of huntingtin impairs its nuclear export." Nat Genet 37(2): 198204.

      Davies, S. W., M. Turmaine, B. A. Cozens, M. DiFiglia, A. H. Sharp, C. A. Ross, E. Scherzinger, E. E. Wanker, L. Mangiarini and G. P. Bates (1997). "Formation of neuronal intranuclear inclusions underlies the neurological dysfunction in mice transgenic for the HD mutation." Cell 90(3): 537-548.

      DeGuire, S. M., F. S. Ruggeri, M. B. Fares, A. Chiki, U. Cendrowska, G. Dietler and H. A. Lashuel (2018). "N-terminal Huntingtin (Htt) phosphorylation is a molecular switch regulating Htt aggregation, helical conformation, internalization, and nuclear targeting." J Biol Chem 293(48): 18540-18558.

      DiFiglia, M., E. Sapp, K. Chase, C. Schwarz, A. Meloni, C. Young, E. Martin, J. P. Vonsattel, R. Carraway, S. A. Reeves and et al. (1995). "Huntingtin is a cytoplasmic protein associated with vesicles in human and rat brain neurons." Neuron 14(5): 1075-1081.

      DiFiglia, M., E. Sapp, K. O. Chase, S. W. Davies, G. P. Bates, J. P. Vonsattel and N. Aronin (1997). "Aggregation of huntingtin in neuronal intranuclear inclusions and dystrophic neurites in brain." Science 277(5334): 1990-1993.

      Gutekunst, C. A., A. I. Levey, C. J. Heilman, W. L. Whaley, H. Yi, N. R. Nash, H. D. Rees, J. J. Madden and S. M. Hersch (1995). "Identification and localization of huntingtin in brain and human lymphoblastoid cell lines with anti-fusion protein antibodies." Proc Natl Acad Sci U S A 92(19): 8710-8714.

      Hackam, A. S., R. Singaraja, T. Zhang, L. Gan and M. R. Hayden (1999). "In vitro evidence for both the nucleus and cytoplasm as subcellular sites of pathogenesis in Huntington's disease." Hum Mol Genet 8(1): 25-33.

      Kordasiewicz, H. B., L. M. Stanek, E. V. Wancewicz, C. Mazur, M. M. McAlonis, K. A. Pytel, J. W. Artates, A. Weiss, S. H. Cheng, L. S. Shihabuddin, G. Hung, C. F. Bennett and D. W. Cleveland (2012). "Sustained therapeutic reversal of Huntington's disease by transient repression of huntingtin synthesis." Neuron 74(6): 1031-1044.

      Korolchuk, V. I., A. Mansilla, F. M. Menzies and D. C. Rubinsztein (2009). "Autophagy inhibition compromises degradation of ubiquitin-proteasome pathway substrates." Mol Cell 33(4): 517-527.

      Korolchuk, V. I., F. M. Menzies and D. C. Rubinsztein (2010). "Mechanisms of cross-talk between the ubiquitin-proteasome and autophagy-lysosome systems." FEBS Lett 584(7): 1393-1398.

      Landles, C., R. E. Milton, N. Ali, R. Flomen, M. Flower, F. Schindler, C. Gomez-Paredes, M. K. Bondulich, G. F. Osborne, D. Goodwin, G. Salsbury, C. L. Benn, K. Sathasivam, E. J. Smith, S. J. Tabrizi, E. E. Wanker and G. P. Bates (2020). "Subcellular Localization And Formation Of Huntingtin Aggregates Correlates With Symptom Onset And Progression In A Huntington'S Disease Model." Brain Commun 2(2): fcaa066.

      Lee, J. H., S. Park, E. Kim and M. J. Lee (2019). "Negative-feedback coordination between proteasomal activity and autophagic flux." Autophagy 15(4): 726-728.

      Lee, J. H., M. V. Rao, D. S. Yang, P. Stavrides, E. Im, A. Pensalfini, C. Huo, P. Sarkar, T. Yoshimori and R. A. Nixon (2019). "Transgenic expression of a ratiometric autophagy probe specifically in neurons enables the interrogation of brain autophagy in vivo." Autophagy 15(3): 543-557.

      Lee, J. H., D. S. Yang, C. N. Goulbourne, E. Im, P. Stavrides, A. Pensalfini, H. Chan, C. Bouchet-Marquis, C. Bleiwas, M. J. Berg, C. Huo, J. Peddy, M. Pawlik, E. Levy, M. Rao, M. Staufenbiel and R. A. Nixon (2022). "Faulty autolysosome acidification in Alzheimer's disease mouse models induces autophagic build-up of Abeta in neurons, yielding senile plaques." Nat Neurosci 25(6): 688-701.

      Li, H., S. H. Li, A. L. Cheng, L. Mangiarini, G. P. Bates and X. J. Li (1999). "Ultrastructural localization and progressive formation of neuropil aggregates in Huntington's disease transgenic mice." Hum Mol Genet 8(7): 1227-1236.

      Magnuson, B., B. Ekim and D. C. Fingar (2012). "Regulation and function of ribosomal protein S6 kinase (S6K) within mTOR signalling networks." Biochem J 441(1): 1-21.

      Park, C. and A. M. Cuervo (2013). "Selective autophagy: talking with the UPS." Cell Biochem Biophys 67(1): 3-13.

      Peng, Q., B. Wu, M. Jiang, J. Jin, Z. Hou, J. Zheng, J. Zhang and W. Duan (2016). "Characterization of Behavioral, Neuropathological, Brain Metabolic and Key Molecular Changes in zQ175 Knock-In Mouse Model of Huntington's Disease." PLoS One 11(2): e0148839.

      Ravikumar, B., C. Vacher, Z. Berger, J. E. Davies, S. Luo, L. G. Oroz, F. Scaravilli, D. F. Easton, R. Duden, C. J. O'Kane and D. C. Rubinsztein (2004). "Inhibition of mTOR induces autophagy and reduces toxicity of polyglutamine expansions in fly and mouse models of Huntington disease." Nat Genet 36(6): 585-595.

      Riguet, N., A. L. Mahul-Mellier, N. Maharjan, J. Burtscher, M. Croisier, G. Knott, J. Hastings, A. Patin, V. Reiterer, H. Farhan, S. Nasarov and H. A. Lashuel (2021). "Nuclear and cytoplasmic huntingtin inclusions exhibit distinct biochemical composition, interactome and ultrastructural properties." Nat Commun 12(1): 6579.

      Scherzinger, E., R. Lurz, M. Turmaine, L. Mangiarini, B. Hollenbach, R. Hasenbank, G. P. Bates, S. W. Davies, H. Lehrach and E. E. Wanker (1997). "Huntingtin-encoded polyglutamine expansions form amyloid-like protein aggregates in vitro and in vivo." Cell 90(3): 549-558.

      Sharp, A. H., S. J. Loev, G. Schilling, S. H. Li, X. J. Li, J. Bao, M. V. Wagster, J. A. Kotzuk, J. P. Steiner, A. Lo and et al. (1995). "Widespread expression of Huntington's disease gene (IT15) protein product." Neuron 14(5): 1065-1074.

      Tabrizi, S. J., R. Ghosh and B. R. Leavitt (2019). "Huntingtin Lowering Strategies for Disease Modification in Huntington's Disease." Neuron 101(5): 801-819.

      Wang, X. and C. G. Proud (2006). "The mTOR pathway in the control of protein synthesis." Physiology (Bethesda) 21: 362-369.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This study offers a valuable investigation into the role of cholecystokinin (CCK) in thalamocortical plasticity during early development and adulthood, employing a range of experimental techniques. The authors demonstrate that tetanic stimulation of the auditory thalamus induces cortical long-term potentiation (LTP), which can be evoked through either electrical or optical stimulation of the thalamus or by noise bursts. They further show that thalamocortical LTP is abolished when thalamic CCK is knocked down or when cortical CCK receptors are blocked. Interestingly, in 18-month-old mice, thalamocortical LTP was largely absent but could be restored through the cortical application of CCK. The authors conclude that CCK contributes to thalamocortical plasticity and may enhance thalamocortical plasticity in aged subjects.

      While the study presents compelling evidence, I would like to offer several suggestions for the authors' consideration:

      (1) Thalamocortical LTP and NMDA-Dependence:

      It is well established that thalamocortical LTP is NMDA receptor-dependent, and blocking cortical NMDA receptors can abolish LTP. This raises the question of why thalamocortical LTP is eliminated when thalamic CCK is knocked down or when cortical CCK receptors are blocked. If I correctly understand the authors' hypothesis - that CCK promotes LTP through CCKR-intracellular Ca2+-AMPAR. This pathway should not directly interfere with the NMDA-dependent mechanism. A clearer explanation of this interaction would be beneficial.

      Thank you for your question regarding the role of CCK and NMDA receptors (NMDARs) in thalamocortical LTP. We propose that CCK receptor (CCKR) activation enhances intracellular calcium levels, which are crucial for thalamocortical LTP induction. Calcium influx through NMDARs is also essential to reach the threshold required for activating downstream signaling pathways that promote LTP (Heynen and Bear, 2001). Thus, CCKRs and NMDARs may function in a complementary manner to facilitate LTP, with both contributing to the elevation of intracellular calcium.

      However, it is important to note that the postsynaptic mechanisms of thalamocortical LTP in the auditory cortex (ACx) differ from those in other sensory cortices. Studies have shown that thalamocortical LTP in the ACx appears to be less dependent on NMDARs (Chun et al., 2013), which is distinct from somatosensory or visual cortices. Our previous studies also found that while NMDAR antagonists can block HFS-induced LTP in the inner ACx, LTP can still be induced in the presence of CCK even after the NMDARs blockade (Chen et al. 2019). These findings suggest that CCK may act through an alternative mechanism involving CCKR-mediated calcium signaling and AMPAR modulation, which partially compensates for the loss of NMDAR signaling. This distinction may reflect functional differences between the ACx and other sensory cortices, as highlighted in previous studies (King and Nelken, 2009).

      While our current study focuses on the role of CCKR-mediated plasticity in the auditory system, further investigations are needed to elucidate how CCKRs and NMDARs interact within the broader framework of thalamocortical neuroplasticity across different cortical regions. Understanding whether similar mechanisms operate in other sensory systems, such as the visual cortex, will be an important direction for future research.

      Heynen, A.J., and Bear, M.F. (2001). Long-term potentiation of thalamocortical transmission in the adult visual cortex in vivo. J Neurosci 21, 9801-9813. 10.1523/jneurosci.21-24-09801.2001.

      Chun, S., Bayazitov, I.T., Blundon, J.A., and Zakharenko, S.S. (2013). Thalamocortical Long-Term Potentiation Becomes Gated after the Early Critical Period in the Auditory Cortex. The Journal of Neuroscience 33, 7345-7357. 10.1523/jneurosci.4500-12.2013.

      Chen, X., Li, X., Wong, Y.T., Zheng, X., Wang, H., Peng, Y., Feng, H., Feng, J., Baibado, J.T., Jesky, R., et al. (2019). Cholecystokinin release triggered by NMDA receptors produces LTP and sound-sound associative memory. Proc Natl Acad Sci U S A 116, 6397-6406. 10.1073/pnas.1816833116.

      King, A. J., & Nelken, I. (2009). Unraveling the principles of auditory cortical processing: can we learn from the visual system? Nature neuroscience, 12(6), 698-701.

      (2) Complexity of the Thalamocortical System:

      The thalamocortical system is intricate, with different cortical and thalamic subdivisions serving distinct functions. In this study, it is not fully clear which subdivisions were targeted for stimulation and recording, which could significantly influence the interpretation of the findings. Clarifying this aspect would enhance the study's robustness.

      Thank you for your valuable feedback. We would like to clarify that stimulation was conducted in the medial geniculate nucleus ventral (MGv), and recording was performed in layer IV of the ACx. Targeting the MGv allows us to investigate the influence of thalamic inputs on auditory cortical responses. Layer IV of the ACx is known to receive direct thalamic projections, making it an ideal site for assessing how thalamic activity influences cortical processing. We will incorporate this clarification into the revised manuscript to enhance the robustness of our study.

      Results section:

      “Stimulation electrodes were placed in the MGB (specifically in the medial geniculate nucleus ventral subdivision, MGv), and recording electrodes were inserted into layer IV of ACx”

      “The recording electrodes were lowered into layer IV of ACx, while the stimulation electrodes were lowered into MGB (MGv subdivision). The final stimulating and recording positions were determined by maximizing the cortical fEPSP amplitude triggered by the ES in the MGB. The accuracy of electrode placement was verified through post-hoc histological examination and electrophysiological responses.”

      (3) Statistical Variability:

      Biological data, including field excitatory postsynaptic potentials (fEPSPs) and LTP, often exhibit significant variability between samples, sometimes resulting in a standard deviation that exceeds 50% of the mean value. The reported standard deviation of LTP in this study, however, appears unusually small, particularly given the relatively limited sample size. Further discussion of this observation might be warranted.

      Thank you for your question. In our experiments, the sample size N represents the number of animals used, while n refers to the number of recordings, with each recording corresponding to a distinct stimulation and recording sites. To adhere to ethical guidelines and minimize animal usage, we often perform multiple recordings within a single animal, such as from different hemispheres of the brain. Although N may appear small, our statistical analyses are based on n, ensuring sufficient data points for reliable conclusions.

      Furthermore, as our experiments are conducted in vivo, we observe lower variability in the increase of fEPSP slopes following LTP induction compared to brain slice preparations, where standard deviations exceeding 50% of the mean are common. This reduced variability likely reflects the robustness of the physiologically intact conditions in the in vivo setup.

      (4) EYFP Expression and Virus Targeting:

      The authors indicate that AAV9-EFIa-ChETA-EYFP was injected into the medial geniculate body (MGB) and subsequently expressed in both the MGB and cortex. If I understand correctly, the authors assume that cortical expression represents thalamocortical terminals rather than cortical neurons. However, co-expression of CCK receptors does not necessarily imply that the virus selectively infected thalamocortical terminals. The physiological data regarding cortical activation of thalamocortical terminals could be questioned if the cortical expression represents cortical neurons or both cortical neurons and thalamocortical terminals.

      Thank you for your question. In Figure 2A, EYFP expression indicates thalamocortical projections, while the co-expression of EYFP with PSD95 confirms the identity of thalamocortical terminals. The CCK-B receptors (CCKBR) are located on postsynaptic cortical neurons. The observed co-labeling of thalamocortical terminals and postsynaptic CCKBR suggests that CCK-expressing neurons in the medial geniculate body (MGB) can release CCK, which subsequently acts on the postsynaptic CCKBR. This evidence supports our interpretation of the functional role of CCK modulating neural plasticity between thalamocortical inputs and cortical neurons. As shown in Figure 2A, we aim to demonstrate that the co-labeling of thalamocortical terminals with CCK receptors accounts for a substantial proportion of the thalamocortical terminals. We will ensure that this clarification is emphasized in the revised manuscript to address your concerns.

      Results section:

      “Cre-dependent AAV9-EFIa-DIO-ChETA-EYFP was injected into the MGB of CCK-Cre mice. EYFP labeling marked CCK-positive neurons in the MGB. The co-expression of EYFP thalamocortical projections with PSD95 confirms the identity of thalamocortical terminals (yellow), which primarily targeted layer IV of the ACx (Figure 2A, upper panel). Immunohistochemistry revealed that a substantial proportion (15 out of 19, Figure 2A lower right panel) of thalamocortical terminals (arrows) colocalize with CCK receptors (CCKBR) on postsynaptic cortical neurons in the ACx (Figure 2A lower panel), supporting the functional role of CCK in modulating thalamocortical plasticity.”

      (5) Consideration of Previous Literature:

      A number of studies have thoroughly characterized auditory thalamocortical LTP during early development and adulthood. It may be beneficial for the authors to integrate insights from this body of work, as reliance on data from the somatosensory thalamocortical system might not fully capture the nuances of the auditory pathway. A more comprehensive discussion of the relevant literature could enhance the study's context and impact.

      Thank you for your valuable feedback. We will enhance our discussion on auditory thalamocortical LTP during early development and adulthood to provide a more comprehensive context for our study.

      (6) Therapeutic Implications:

      While the authors suggest potential therapeutic applications of their findings, it may be somewhat premature to draw such conclusions based on the current evidence. Although speculative discussion is not harmful, it may not significantly add to the study's conclusions at this stage.

      Thank you for your thoughtful feedback. We agree that the therapeutic applications mentioned in our study are speculative at this stage and should be regarded as a forward-looking perspective rather than definitive conclusions. Our intention was to highlight the broader potential of our findings to inspire further research, rather than to propose immediate clinical applications.

      In light of your feedback, we have adjusted the language in the manuscript to reflect a more cautious interpretation. Speculative discussions are now explicitly framed as hypotheses or possibilities for future exploration. We emphasize that our findings provide a foundation for further investigations into CCK-based plasticity and its implications.

      We believe that appropriately framed forward-thinking discussions are valuable in guiding the direction of future research. We sincerely hope that our current and future work will contribute to a deeper understanding of thalamocortical plasticity and, over time, potentially lead to advancements in human health.

      Reviewer #2 (Public review):

      Summary:

      This work used multiple approaches to show that CCK is critical for long-term potentiation (LTP) in the auditory thalamocortical pathway. They also showed that the CCK mediation of LTP is age-dependent and supports frequency discrimination. This work is important because it opens up a new avenue of investigation of the roles of neuropeptides in sensory plasticity.

      Strengths:

      The main strength is the multiple approaches used to comprehensively examine the role of CCK in auditory thalamocortical LTP. Thus, the authors do provide a compelling set of data that CCK mediates thalamocortical LTP in an age-dependent manner.

      Weaknesses:

      The behavioral assessment is relatively limited but may be fleshed out in future work.

      Reviewer #3 (Public review):

      Summary:

      Cholecystokinin (CCK) is highly expressed in auditory thalamocortical (MGB) neurons and CCK has been found to shape cortical plasticity dynamics. In order to understand how CCK shapes synaptic plasticity in the auditory thalamocortical pathway, they assessed the role of CCK signaling across multiple mechanisms of LTP induction with the auditory thalamocortical (MGB - layer IV Auditory Cortex) circuit in mice. In these physiology experiments that leverage multiple mechanisms of LTP induction and a rigorous manipulation of CCK and CCK-dependent signaling, they establish an essential role of auditory thalamocortical LTP on the co-release of CCK from auditory thalamic neurons. By carefully assessing the development of this plasticity over time and CCK expression, they go on to identify a window of time that CCK is produced throughout early and middle adulthood in auditory thalamocortical neurons to establish a window for plasticity from 3 weeks to 1.5 years in mice, with limited LTP occurring outside of this window. The authors go on to show that CCK signaling and its effect on LTP in the auditory cortex is also capable of modifying frequency discrimination accuracy in an auditory PPI task. In evaluating the impact of CCK on modulating PPI task performance, it also seems that in mice <1.5 years old CCK-dependent effects on cortical plasticity are almost saturated. While exogenous CCK can modestly improve discrimination of only very similar tones, exogenous focal delivery of CCK in older mice can significantly improve learning in a PPI task to bring their discrimination ability in line with those from young adult mice.

      Strengths:

      (1) The clarity of the results along with the rigor multi-angled approach provide significant support for the claim that CCK is essential for auditory thalamocortical synaptic LTP. This approach uses a combination of electrical, acoustic, and optogenetic pathway stimulation alongside conditional expression approaches, germline knockout, viral RNA downregulation, and pharmacological blockade. Through the combination of these experimental configures the authors demonstrate that high-frequency stimulation-induced LTP is reliant on co-release of CCK from glutamatergic MGB terminals projecting to the auditory cortex.

      (2) The careful analysis of the CCK, CCKB receptor, and LTP expression is also a strength that puts the finding into the context of mechanistic causes and potential therapies for age-dependent sensory/auditory processing changes. Similarly, not only do these data identify a fundamental biological mechanism, but they also provide support for the idea that exogenous asynchronous stimulation of the CCKBR is capable of restoring an age-dependent loss in plasticity.

      (3) Although experiments to simultaneously relate LTP and behavioral change or identify a causal relationship between LTP and frequency discrimination are not made, there is still convincing evidence that CCK signaling in the auditory cortex (known to determine synaptic LTP) is important for auditory processing/frequency discrimination. These experiments are key for establishing the relevance of this mechanism.

      Weaknesses:

      (1) Given the magnitude of the evoked responses, one expects that pyramidal neurons in layer IV are primarily those that undergo CCK-dependent plasticity, but the degree to which PV-interneurons and pyramidal neurons participate in this process differently is unclear.

      Thank you for this insightful comment. We agree that the differential roles of PV-interneurons and pyramidal neurons in CCK-dependent thalamocortical plasticity remain unclear and acknowledge this as an important limitation of our study. Our primary focus was on pyramidal neurons, as our in vivo electrophysiological recordings measured the fEPSP slope in layer IV of the auditory cortex, which primarily reflects excitatory synaptic activity. However, we recognize the critical role of the excitatory-inhibitory balance in cortical function and the potential contribution of PV-interneurons to this process. In future studies, we plan to utilize techniques such as optogenetics, two-photon calcium imaging and cell-type-specific recordings to investigate the distinct contributions of PV-interneurons and pyramidal neurons to CCK-dependent thalamocortical plasticity, thereby providing a more comprehensive understanding of how CCK modulates thalamocortical circuits.

      (2) While these data support an important role for CCK in synaptic LTP in the auditory thalamocortical pathway, perhaps temporal processing of acoustic stimuli is as or more important than frequency discrimination. Given the enhanced responsivity of the system, it is unclear whether this mechanism would improve or reduce the fidelity of temporal processing in this circuit. Understanding this dynamic may also require consideration of cell type as raised in weakness #1.

      Thank you for this thoughtful comment. We acknowledge that our study did not directly address the fidelity of temporal processing, which is indeed a critical aspect of auditory function. Our behavioral experiments primarily focused on linking frequency discrimination to the role of CCK in synaptic strengthening within the auditory thalamocortical pathway. However, we agree that enhanced responsivity of the system could also impact temporal processing dynamics, such as the precise timing of auditory responses. Whether this modulation improves or reduces the fidelity of temporal processing remains an open and important question.

      As you noted, understanding these dynamics will require a deeper investigation into the interactions between different cell types, particularly the balance between excitatory and inhibitory neurons. Exploring how CCK modulation affects both the circuit and cellular levels in temporal processing is an important direction for future research, which we plan to pursue. Thank you again for raising this important point.

      Disscusion section:

      “While we focused on homosynaptic plasticity at thalamocortical synapses by recording only fEPSPs in layer IV of ACx, it is essential to further explore heterosynaptic effects of CCK released from thalamocortical synapses on intracortical circuits, particularly its role in modulating the excitatory-inhibitory balance. PV-interneurons, as key regulators of cortical inhibition, may contribute to the temporal fidelity of sensory processing, which is critical for auditory perception (Nocon et al., 2023; Cai et al., 2018). Additionally, CCK may facilitate cross-modal plasticity by modulating heterosynaptic plasticity in interconnected cortical areas. Future studies would provide valuable insights into the broader role of CCK in shaping sensory processing and cortical network dynamics.”

      Nocon, J.C., Gritton, H.J., James, N.M., Mount, R.A., Qu, Z., Han, X., and Sen, K. (2023). Parvalbumin neurons enhance temporal coding and reduce cortical noise in complex auditory scenes. Communications Biology 6, 751. 10.1038/s42003-023-05126-0.

      Cai, D., Han, R., Liu, M., Xie, F., You, L., Zheng, Y., Zhao, L., Yao, J., Wang, Y., Yue, Y., et al. (2018). A Critical Role of Inhibition in Temporal Processing Maturation in the Primary Auditory Cortex. Cereb Cortex 28, 1610-1624. 10.1093/cercor/bhx057.

      (3) In Figure 1, an example of increased spontaneous and evoked firing activity of single neurons after HFS is provided. Yet it is surprising that the group data are analyzed only for the fEPSP. It seems that single-neuron data would also be useful at this point to provide insight into how CCK and HFS affect temporal processing and spontaneous activity/excitability, especially given the example in 1F.

      Thank you for your insightful comment. In our in vivo electrophysiological experiments on LTP induction, we recorded neural activity for over 1.5 hours to assess changes in neuronal responses over time, both prior to and following the induction. While single neuron firing data can provide valuable insights, such measurements are inherently more variable due to factors like cortical state fluctuations and the condition of nearby neurons, which makes them less reliable for long-term analysis. For this reason, we focused on fEPSP, as it offers a more stable and robust readout of synaptic activity over extended periods.

      We appreciate your suggestion and recognize the value of single-neuron data in understanding how CCK and HFS affect temporal processing and excitability. In future studies, we will consider to incorporate single-neuron analyses to complement our synaptic-level findings and provide a more comprehensive understanding of these mechanisms.

      (4) The authors mention that CCK mRNA was absent in CCK-KO mice, but the data are not provided.

      Thank you for your comment. Data from the CCK-KO mice are presented in Figure 3A (far right) and in the upper panel of Figure 3B (far right). In the lower panel of Figure 3B, data from the CCK-KO group are not shown because the normalized values for this group were essentially zero, as expected due to the absence of CCK mRNA.

      (5) The circuitry that determines PPI requires multiple brain areas, including the auditory cortex. Given the complicated dynamics of this process, it may be helpful to consider what, if anything, is known specifically about how layer IV synaptic plasticity in the auditory cortex may shape this behavior.

      Thank you for raising this important point. Pre-pulse inhibition (PPI) of the acoustic startle response indeed involves multiple brain regions, with the ascending auditory pathway playing a key role (Gómez-Nieto et al., 2020). Within the auditory cortex, layer IV neurons receive tonotopically organized inputs from the medial geniculate nucleus and are critical for integrating thalamic inputs and shaping auditory processing.

      In our behavioral experiments, mice were required to discriminate pre-pulses of varying frequencies against a continuous background sound. Given the role of auditory cortical neurons in integrating thalamic inputs and shaping auditory processing, it is likely that synaptic plasticity in these neurons contributes to the enhanced discrimination of pre-pulses. Supporting this idea, our previous work demonstrated that local infusion of CCK, paired with weak acoustic stimuli, significantly increased auditory responses in the auditory cortex (Li et al., 2014). In the current study, we further showed that CCK release during high-frequency stimulation of the thalamocortical pathway induced LTP in layer IV of the auditory cortex. Together, these findings suggest that CCK-dependent synaptic plasticity in layer IV may amplify the cortical representation of weak auditory inputs, thereby improving pre-pulses detection and enhancing PPI performance.

      It is also worth noting that aged mice with hearing loss typically exhibit PPI deficits due to impaired auditory processing (Ouagazzal et al., 2006 and Young et al., 2010). We propose that enhanced plasticity in the thalamocortical pathway, mediated by CCK, might partially compensate for these deficits by amplifying residual auditory signals in aged mice. However, the precise mechanisms by which layer IV synaptic plasticity modulates PPI behavior remain to be fully understood. Given the complex dynamics of sensory processing, future studies could explore how layer IV neurons interact with other cortical and subcortical circuits involved in PPI, as well as the specific contributions of excitatory and inhibitory cell types. These investigations will help provide a more comprehensive understanding of the role of CCK in modulating sensory gating and auditory processing.

      Gómez-Nieto, R., Hormigo, S., & López, D. E. (2020). Prepulse inhibition of the auditory startle reflex assessment as a hallmark of brainstem sensorimotor gating mechanisms. Brain sciences, 10(9), 639.

      Li, X., Yu, K., Zhang, Z., Sun, W., Yang, Z., Feng, J., Chen, X., Liu, C.-H., Wang, H., Guo, Y.P., and He, J. (2014). Cholecystokinin from the entorhinal cortex enables neural plasticity in the auditory cortex. Cell Research 24, 307-330. 10.1038/cr.2013.164.

      Ouagazzal, A. M., Reiss, D., & Romand, R. (2006). Effects of age-related hearing loss on startle reflex and prepulse inhibition in mice on pure and mixed C57BL and 129 genetic background. Behavioural brain research, 172(2), 307-315.

      Young, J. W., Wallace, C. K., Geyer, M. A., & Risbrough, V. B. (2010). Age-associated improvements in cross-modal prepulse inhibition in mice. Behavioral neuroscience, 124(1), 133.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Major concerns:

      (1) In Figure 1, the authors used different metrics for fEPSP strength. In Figure 1D, the authors used the slope, while they used the amplitude in Figure 1G. It is known that the two metrics are different from each other. While the slope is calculated from the linear regression between the voltage change per time of the rising phase of the fEPSP, the amplitude represents the voltage value of the fEPSP's peak. Please clarify here and in the method what metric you used, because the two terms are not interchangeable.

      Thank you for pointing out this oversight in our manuscript. We confirm that we used the slope of the fEPSP as the metric for assessing synaptic strength throughout the study, including both Figure 1D and Figure 1G. We will make the necessary corrections to ensure clarity and consistency. Thank you for bringing this to our attention.

      (2) It is not mentioned in the details of the methods about the CCK-KO mice. Please give such details. Although the authors used the CCK-KO mouse model as a control, I think that it is not a good choice to test the hypothesis mentioned in lines 165 and 166. The experiment was supposed to monitor the CCK-BR activity after HFS of the MGB and answer whether the CCK-BR will get activated by thalamic stimulation, but the CCK-KO mouse does not have CCK to be released after the optogenetic activation of the Chrimson probe. Therefore, it is expected to give nothing as if the experimenter runs an experiment without intervention. I think that the appropriate way to examine the hypothesis is to compare mice that were either injected with AAV9-Syn-FLEX-ChrimsonR-tdTomato or AAV9-Syn-FLEX-tdTomato. However, CCK-OK would be a perfect model to confirm that LTP can be only generated dependently on CCK, by simply running the HFS of the MGB that would be associated with the cortical recording of the fEPSP. This also will rule out the assumption that the authors mentioned in lines 191 and 192.

      Thank you for your valuable feedback. The rationale behind our experimental design was to validate the newly developed CCK sensor and confirm its specificity. We aimed to verify CCK release post-HFS by comparing the responses of the CCK sensor in CCK-KO mice and CCK-Cre mice. This comparison allowed us to determine that the observed increase in fluorescence intensity post-HFS was specifically due to CCK release, rather than other neurotransmitters induced by HFS.

      We appreciate your suggestion to compare mice injected with AAV9-Syn-FLEX-ChrimsonR-tdTomato and AAV9-Syn-FLEX-tdTomato, as it is indeed a valuable approach for directly testing the hypothesis regarding CCK-BR activation. However, we prioritized using the CCK-KO model to validate the CCK sensor's efficacy and specificity. The validation can be inferred by comparing the CCK sensor activity before and after HFS.

      Regarding concerns mentioned in lines 191 and 192 about potential CCK release from other projections via indirect polysynaptic activation, CCK-KO mice were not suitable for this aspect due to their global knockout of CCK. To address this limitation, we utilized shRNA to specifically down-regulate Cck expression in MGB neurons. This approach focused on the necessity of CCK released from thalamocortical projections for the observed LTP and effectively ruled out the possibility of indirect polysynaptic activation.

      We also acknowledge that the methods section lacked sufficient details about the CCK-KO mice, which may have caused confusion. In the revised methods section, we will add the following details:

      (1) The genotype of the CCK-KO mice used in this study (CCK-ires-CreERT2, Jax#012710).

      (2) A brief description of the CCK-KO validation, emphasizing the absence of CCK mRNA in these mice (as shown in Figure 3A and 3B).

      (3) The experimental purpose of using CCK-KO mice to validate the specificity of the CCK sensor.

      We believe these additions will clarify the rationale for using CCK-KO mice and their role in this study. Thank you again for highlighting these important points.

      (3) Figure 3C: The authors should examine if there is a difference in the baseline of fEPSPs across different age groups as the dependence on the normalization in the analysis within each group would hide if there were any difference of the baseline slope of fEPSP between groups which could be related to any misleading difference after HFS. Also, I wonder about the absence of LTP in P20, which is a closer age to the critical period. Could the authors discuss that, please?

      Thank you for your insightful feedback. To address your concern regarding baseline differences in fEPSP slopes across age groups, we conducted additional analysis. Baseline fEPSP across the three groups (P20, 8w, 18m), normalized to the 8w group, were 64.8± 13.1%, 100.0 ± 20.4%, and 58.8± 10.3%, respectively. While there was a trend suggesting smaller fEPSP slopes in the P20 and 18m groups compared to the young adult group, these differences were not statistically significant due to data variability (P20 vs. 8w, P = 0.319; 8w vs. 18m, P=0.147; P20 vs. 18m, P = 1.0, one-way ANOVA). These results suggest that baseline variability is unlikely to confound the observed differences in LTP after HFS. Furthermore, we ensured that normalization minimized any potential baseline effects.

      Regarding the absence of LTP in P20, this likely reflects developmental regulation of CCKBR expression in the auditory cortex (ACx). The HFS-induced thalamocortical LTP observed in our study is CCK-dependent and mechanistically distinct from the NMDA-dependent thalamocortical LTP during the critical period. Specifically, correlated pre- and postsynaptic activity can induce NMDA-dependent thalamocortical LTP only during an early critical period corresponding to the first several postnatal days, after which this pairing becomes ineffective starting from the second postnatal week (Crair and Malenka, 1995; Isaac et al., 1997; Chun et al., 2013). In contrast, the CCK-dependent Thalamocortical LTP induced by HFS is robust in adult mice but appears absent in P20, likely due to the lack of postsynaptic CCKBR expression in the ACx at this developmental stage.

      We will include these clarifications in the revised manuscript, particularly in the Discussion section, to provide a more comprehensive explanation of our findings. Thank you for your valuable comments and suggestions.

      Crair, M.C., and Malenka, R.C. (1995). A critical period for long-term potentiation at thalamocortical synapses. Nature 375, 325-328. 10.1038/375325a0.

      Isaac, J.T.R., Crair, M.C., Nicoll, R.A., and Malenka, R.C. (1997). Silent Synapses during Development of Thalamocortical Inputs. Neuron 18, 269-280. https://doi.org/10.1016/S0896-6273(00)80267-6.

      Chun, S., Bayazitov, I.T., Blundon, J.A., and Zakharenko, S.S. (2013). Thalamocortical Long-Term Potentiation Becomes Gated after the Early Critical Period in the Auditory Cortex. The Journal of Neuroscience 33, 7345-7357. 10.1523/jneurosci.4500-12.2013.

      (4) Figure 4F: It is noticed that the baseline fEPSP of the CCK group and ACSF groups were different, which raises a concern about the baseline differences between treatment groups.

      Thank you for your valuable feedback and for pointing out this important detail. We apologize for any confusion caused by the presentation of the data. As noted in the figure legend, the scale bars for the fEPSPs were different between the left (0.1 mV) and right panels (20 µV). This difference in scale may have created the perception of baseline differences between the CCK and ACSF groups. To enhance clarity and avoid potential misunderstanding, we will unify the scale bar values in the revised figure. This adjustment will provide a clearer and more accurate comparison of fEPSPs between groups. Thank you again for bringing this issue to our attention.

      (5) From Figure S2D, it seems that different animals were injected with the drug and ACSF. Therefore, how the authors validate the position of the recording electrode to the cortical area of certain CF and relative EF. Also, there is not enough information about the basis of the selection of the EF. Should it be lower than the CF with a certain value? Was the EF determined after the initial tuning curve in each case? To mitigate this difference, it would be appropriate if the authors examined the presence of a significant difference in the tuning width and CFs between animals exposed to ACSF and CCK-4. This will give some validation of a balanced experiment between ACSF and CCK-4. I wonder also why the authors used rats here not mice, as it will be easier to interpret the results came from the same species.

      Thank you for your thoughtful comments. The effective frequency (EF) was determined after measuring the initial tuning curve for each case. The EF was selected to elicit a clear sound response while maintaining a sufficient distance from the characteristic frequency (CF) to allow measurable increases in response intensity. Specifically, EF was selected based on the starting point of the tuning peak, which corresponds to the onset of its fastest rising phase. From this point, EF was determined by moving 0.2 or 0.4 octaves toward the CF. While there were individual differences in EF selection among animals, the methodology for determining EF was standardized and applied consistently across both the ACSF and CCK-4 groups.

      Regarding the use of rats in these experiments, these studies were conducted prior to our current work with mice. The findings in rat provide valuable insights that support our current results in mice. Since the rat data are supplementary to the primary findings, we included them as supplementary material to provide additional context and validation. Furthermore, in consideration of animal welfare, we chose not to replicate these experiments in mice, as the findings from rats were sufficient to support our conclusions.

      Methods section:

      “The tuning curve was determined by plotting the lowest intensity at which the neuron responded to different tones. The characteristic frequency (CF) is defined as the frequency corresponding to the lowest point on this curve. The effective frequency (EF) was determined to elicit a clear sound response while maintaining a sufficient distance from the CF to allow measurable increases in response intensity. Specifically, EF was selected based on the starting point of the tuning peak, which corresponds to the onset of its fastest rising phase. From this point, EF was determined by moving 0.2 or 0.4 octaves toward the CF.”

      (6) Lines 384-386: There are no figures named 5H and I.

      Thank you for pointing this out. The references to Figures 5H and 5I were incorrect and should have referred to Figures 5C and 5D. We sincerely apologize for this oversight and will correct these errors in the revised manuscript to ensure clarity and accuracy. Thank you again for bringing this to our attention.

      (7) The authors should mention the sex of the animals used.

      Thank you for your comment and for highlighting this important detail. The sex of the animals used in this study is specified in the Animals section of the Methods: "In the present study, male mice and rats were used to investigate thalamocortical LTP." We appreciate your careful attention to this point and will ensure that this detail remains clearly stated in the manuscript.

      (8) Lines 534 and 648: These coordinates are difficult to understand. Since the experiment was done on both mice and rats, we need a clear description of the coordinates in both. Also, I think that you should mention the lateral distance from the sagittal suture as the ventral coordinates should be calculated from the surface of the skull above the AC and not from the sagittal suture.

      Thank you for your valuable feedback and for pointing out this important issue. We apologize for any confusion caused by our description of the coordinates. The term “ventral” was deliberately used because the auditory cortex is located on the lateral side of the skull, which may have caused some misunderstanding.

      To provide a clearer and more accurate descriptions of the coordinates, we will revise the text in the manuscript as follows: “A craniotomy was performed at the temporal bone (-2 to -4 mm posterior and -1.5 to -3 mm ventral to bregma for mice; -3.0 to -5.0 mm posterior and -2.5 to -6.5 mm ventral to bregma for rats) to access the auditory cortex.'

      We appreciate your attention to these details and will ensure that the revised manuscript includes this clarification to improve accuracy and eliminate potential confusion. Thank you again for bringing this to our attention.

      (9) Line 536: The author should specify that these coordinates are for the experiment done on mice.

      Thank you for your valuable feedback. We will revise the manuscript to explicitly specify that these coordinates refer to the experiments conducted on mice. This clarification will help improve the clarity and precision of the manuscript. We greatly appreciate your attention to this point and your effort to enhance the quality of our work.

      Methods section:

      “and a hole was drilled in the skull according to the coordinates of the ventral division of the MGB (MGv, AP: -3.2 mm, ML: 2.1 mm, DV: 3.0 mm) for experiments conducted on mice.”

      (10) Line 590: Please add the specifications of the stimulating electrode. Is it unipolar or bipolar? What is the cat.# provided by FHC?

      Thank you for your valuable feedback. The electrodes used in the experiments are unipolar. We will include the catalog number provided by FHC in the revised manuscript for clarity. The revised text will be updated as follows:

      “In HFS-induced thalamocortical LTP experiments, two customized microelectrode arrays with four tungsten unipolar electrodes each, impedance: 0.5-1.0 MΩ (recording: CAT.# UEWSFGSECNND, FHC, U.S.), and 200-500 kΩ (stimulating: CAT.# UEWSDGSEBNND, FHC, U.S.), were used for the auditory cortical neuronal activity recording and MGB ES, respectively.”

      We appreciate your attention to this detail, and we will ensure that the revised manuscript reflects this clarification accurately.

      (11) Lines 612-614: There are no details of how the optic fiber was inserted or post-examined. If there is a word limitation, the authors may reference another study showing these procedures.

      Thank you for your insightful comment and for highlighting this important aspect of the methodology. To address this, we will reference the study by Sun et al. (2024) in the revised manuscript, which provides detailed procedures for optic fiber insertion and post-examination. We believe that this reference will help enhance the clarity and completeness of the methods section.

      Sun, W., Wu, H., Peng, Y., Zheng, X., Li, J., Zeng, D., Tang, P., Zhao, M., Feng, H., Li, H., et al. (2024). Heterosynaptic plasticity of the visuo-auditory projection requires cholecystokinin released from entorhinal cortex afferents. eLife 13, e83356. 10.7554/eLife.83356.

      We appreciate your valuable suggestion, which will contribute to improving the quality of the manuscript.

      Minor concerns:

      (1) The definition of HFS was repeated many times throughout the manuscript. Please mention the defined name for the first time in the manuscript only followed by its abbreviation (HFS).

      Thank you for your suggestion and for pointing out this important detail. We will revise the manuscript to ensure that all abbreviations are defined only upon their first mention in the manuscript, with subsequent mentions using the abbreviations consistently. We appreciate your careful attention to detail and your effort to help improve the manuscript.

      (2) Line 173: There is a difference between here and the methods section (620 nm here and 635 nm there) please correct which wavelength the authors used.

      Thank you for your careful review and for bringing this discrepancy to our attention. We have corrected the inconsistency, and the wavelength has been unified throughout the manuscript to ensure accuracy and clarity. The revised text now reads as follows:

      “The fluorescent signal was monitored for 25s before and 60s after the HFLS (5~10 mW, 620 nm) or HFS application.”

      We appreciate your valuable feedback, which has helped us improve the precision and consistency of the manuscript.

      (3) Line 185: I think the authors should refer to Figure 2G before mentioning the statistical results.

      Thank you for your careful review and for pointing out this oversight. We have now added a reference to Figure 2G at the appropriate location to ensure clarity and logical flow in the manuscript, as recommended..

      (4) Line 202: I think the authors should refer to Figure 2J before mentioning the statistical results.

      Thank you again for your careful review and for highlighting this point. We have revised the manuscript to include a reference to Figure 2J before mentioning the statistical results.

      We appreciate your valuable feedback, which has helped us improve the accuracy and presentation of the results.

      (5) Line 260: Please add appropriate references at the end of the sentence to support the argument.

      Thank you for your valuable suggestion. To address this, we have add appropriate references to support the statement regarding the multiple steps involved between mRNA expression and neuropeptide release. Additionally, we have revised the statement to adopt a more cautious interpretation. The revised text is as follows:

      “It is widely recognized that mRNA levels do not always directly correlate with peptide levels due to multiple steps involved in peptide synthesis and processing, including translation, post-translational modifications, packaging, transportation, and proteolytic cleavage, all of which require various enzymes and regulatory mechanisms (38-41). A disruption at any stage in this process could lead to impaired CCK release, even when Cck mRNA is present.”

      We have included the following references to support this statement:

      38. Mierke, C.T. (2020). Translation and Post-translational Modifications in Protein Biosynthesis. In Cellular Mechanics and Biophysics: Structure and Function of Basic Cellular Components Regulating Cell Mechanics, C.T. Mierke, ed. (Springer International Publishing), pp. 595-665. 10.1007/978-3-030-58532-7_14.

      39. Gualillo, O., Lago, F., Casanueva, F.F., and Dieguez, C. (2006). One ancestor, several peptides post-translational modifications of preproghrelin generate several peptides with antithetical effects. Mol Cell Endocrinol 256, 1-8. 10.1016/j.mce.2006.05.007.

      40. Sossin, W.S., Fisher, J.M., and Scheller, R.H. (1989). Cellular and molecular biology of neuropeptide processing and packaging. Neuron 2, 1407-1417. https://doi.org/10.1016/0896-6273(89)90186-4.

      41. Hook, V., Funkelstein, L., Lu, D., Bark, S., Wegrzyn, J., and Hwang, S.R. (2008). Proteases for processing proneuropeptides into peptide neurotransmitters and hormones. Annu Rev Pharmacol Toxicol 48, 393-423. 10.1146/annurev.pharmtox.48.113006.094812.

      We greatly appreciate your helpful feedback, which has allowed us to improve both the accuracy and the depth of discussion in the manuscript.

      (6) Line 278: The authors mentioned "due to the absence of CCK in aged animals", which was not an appropriate description. It should be a reduction of CCK gene expression or a possible deficient CCK release.

      Thank you for your careful review and for pointing out the inaccuracy in our description. We agree with your suggestion and have revised the statement to more appropriately reflect the findings.

      “Our findings revealed that thalamocortical LTP cannot be induced in aged mice, likely due to insufficient CCK release, despite intact CCKBR expression.”

      This revision ensures a more accurate and precise description of the potential mechanisms underlying the observed phenomenon. We greatly appreciate your valuable feedback, which has helped us improve the clarity and accuracy of the manuscript.

      (7) Line 291: The authors mentioned that "without MGB stimulation", which is confusing. The MGB was stimulated with a single electrical pulse to evoke cortical fEPSPs. Therefore it should be "without HFS of MGB".

      Thank you for pointing this out and for highlighting the potential confusion caused by our original phrasing. Upon review, we recognize that our original phrasing "without MGB stimulation" may have been unclear and could have led to misinterpretation. To clarify, our intention was to describe the period during which CCK was present without any stimulation of the MGB.

      It is important to note that, in the presence of CCK, LTP can be induced even with low-frequency stimulation, including in aged mice. This observation underscores the potent effect of CCK in facilitating thalamocortical LTP, regardless of the specific stimulation protocol used.

      To address this issue, we have revised the sentence for improved clarity as follows::

      " To investigate whether CCK alone is sufficient to induce thalamocortical LTP without activating thalamocortical projections, we infused CCK-4 into the ACx of young adult mice immediately after baseline fEPSPs recording. Stimulation was then paused for 15 min to allow for CCK degradation, after which recording resumed."

      We believe this revision resolves the misunderstanding and provides a clearer and more accurate description of the experimental context. We greatly appreciate your insightful feedback, which has helped us refine the manuscript for clarity and precision.

      Reviewer #3 (Recommendations for the authors):

      Minor comments:

      (1) Line 99, 134, possibly other locations: "site" to "sites".

      Thank you for your careful review. We appreciate your attention to detail and have made the necessary corrections in the manuscript.

      (2) Throughout the manuscript there are some minor issues with language choice and subtle phrasing errors and I suggest English language editing.

      Thank you for your suggestion. In response, we have thoroughly reviewed the manuscript and addressed issues related to language choice and phrasing. The text has been carefully edited to ensure clarity, precision, and consistency. We believe these revisions have significantly enhanced the overall quality of the manuscript. We greatly appreciate your feedback, which has been invaluable in improving the presentation of our work.

      (3) Based on the experimental configurations, I do not think it is a problematic caveat, but authors should be aware of the high likelihood of AAV9 jumping synapses relative to other AAV serotypes.

      Thank you for bringing up the potential of AAV9 crossing synapses, a recognized characteristic of this serotype. We appreciate your observation regarding its relevance to our experimental design. In our study, we carefully considered the possibility of trans-synaptic transfer during both the experimental design and data interpretation phases. To minimize the likelihood of significant trans-synaptic spread, we implemented several measures, including controlling the injection volume, using a slow injection rate, and limiting the viral expression time. Post-hoc histological analyses confirmed that the expression of AAV9 was largely confined to the intended regions, with limited evidence of synaptic jumping under our experimental conditions.

      While we acknowledge the inherent potential for AAV9 to cross synapses, we believe this effect does not substantially confound the interpretation of our findings in the current study. To address this concern, we have added a brief discussion on this point in the revised manuscript to enhance clarity. We greatly appreciate your insightful comment, which has helped us further refine our work.

      Discussion section:

      “ One potential limitation of our study is the trans-synaptic transfer property of AAV9. To mitigate this, we carefully controlled the injection volume, rate, and viral expression time, and conducted post-hoc histological analyses to minimize off-target effects, thereby reducing the likelihood of trans-synaptic transfer confounding the interpretation of our findings.”

      (4) The trace identifiers (1-4) do not seem correctly placed/colored in Figure S1D. Please check others carefully.

      Thank you for your careful review and for bringing this issue to our attention. We have corrected the trace identifiers in Figure S1D. Additionally, we have carefully reviewed all other figures to ensure their accuracy and consistency. We greatly appreciate your attention to detail, which has helped improve the overall quality of the manuscript.

      (5) Please provide a value of the laser power range based on calibrated values.

      Thank you for your suggestion. We have included the calibrated laser power range in the revised manuscript as follows:

      “The laser stimulation was produced by a laser generator (5-20 mW(30), Wavelength: 473 nm, 620 nm; CNI laser, China) controlled by an RX6 system and delivered to the brain via an optic fiber (Thorlabs, U.S.) connected to the generator.”

      We appreciate your feedback, which has helped improve the clarity and precision of our methodological description.

      (6) It would be useful to annotate figures in a way that identifies in which transgenic mice experiments are being performed.

      Thank you for your valuable suggestion. We will add annotations to the figures to explicitly identify the type of mice used in each experiment. We believe this enhancement will improve the clarity and accessibility of our results. We greatly appreciate your input in making our manuscript more informative.

      (7) Please comment on the rigor you use to address the accuracy of viral injections. How often did they spread outside of the MGB/AC?

      Thank you for raising this important question regarding the accuracy of viral injections and the potential spread outside the MGB or AC. Below, we provide details for each set of experiments:

      shRNA Experiments:

      For the shRNA experiments targeting the MGB, our primary goal was to achieve comprehensive coverage of the entire MGB. To this end, we used larger injection volumes and multiple injection sites, which inevitably resulted in some viral spread beyond the MGB. However, this approach was necessary to ensure robust knockdown effects that were representative of the entire MGB. While strict confinement to specific subregions could not be guaranteed, this strategy allowed us to prioritize the effectiveness of the knockdown within the target region.

      Fiber photometry Experiments:

      For the fiber photometry experiments targeting the auditory cortex (AC), we used larger injection volumes and multiple injection sites to cover its relatively large size. Although this approach might have resulted in some CCK-sensor virus spread outside the AC, the placement of the optic fiber was guided by the location of the auditory cortex. Consequently, any minor viral expression outside the AC would not affect the experimental results, as recordings were confined to the intended area through precise fiber placement.  

      Optogenetic Experiments:

      For the optogenetic experiments targeting the MGB, we specifically injected virus into the MGv subregion. To minimize viral spread, we employed several strategies, including the used fine injection needles, waiting for tissue stabilization (7 minutes post-needle insertion), delivering small volumes at a slow rate to prevent backflow, aspirating 5 nL of the solution post-injection, and raising the needle by 100 μm before waiting an additional 5 minutes prior to full retraction. These measures significantly reduced the risk of viral leakage to adjacent regions.

      Histological Validation:

      After the electrophysiological experiments, we systematically verified the accuracy of viral expression by examining histological sections to ensure that the expression was primarily localized within the intended regions.

      Terminology in the Manuscript:

      In the manuscript, we deliberately used the term "MGB" in the manuscript rather than specifically "MGv" to transparently acknowledge the potential for viral spread in some experiments.

      We hope this explanation clarifies the strategies we employed to address the accuracy of viral injections, as well as how we managed potential viral spread. We have also added a brief information in the revised manuscript to reflect these points and acknowledge the inherent variability in viral delivery.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for their constructive and helpful comments, which led us to make major changes in the model and manuscript, including adding the results of new experiments and analyses. We believe that the revised manuscript is much better than the previous version and that it addresses all issued raised by the reviewers. 

      Summary of changes made in the revised manuscript:

      (1) We increased the training set size from 39 video clips to 97 video clips and the testing set size from 25 video clips to 60 video clips. The increase in training set size improved the overall accuracy from a mean F1 score of 0.81 in the previous version to a mean F1 score of 0.891 (see Figure 2 and Figure 3) in the current version. Specifically, the F1 score for urine detection was improved from 0.79 to 0.88.

      (2) We further evaluated the accuracy of the DeePosit algorithm in comparison to a second human annotator and found that the algorithm accuracy is comparable to human-level accuracy.

      (3) The additional test videos allowed us to test the consistency of the algorithm performance across gender, space, time, and experiment type (SP, SxP, and ESPs). We found consistent levels of performance across all categories (see Figure 3), suggesting that errors made by the algorithm are uniform across conditions, hence should not create any bias of the results.

      (4) In addition, we tested the algorithm performance on a second strain of mice (male C57BL/6) in a different environmental condition (white arena instead of a black one) and found that the algorithm achieves comparable accuracy, even though C57BL/6 mice and white arena were not included in the training set. Thus, the algorithm seems to be robust and efficient across various experimental conditions.

      (5) Analyzing urination and defecation dynamics in an additional strain of mice revealed interesting strain-specific features, as discussed in the revised manuscript.

      (6) Overall, we found DeePosit accuracy to be stable with no significant bias across stages of the experiment, types of the experiment, gender of the mice, strain of mice, and across experimental conditions.

      (7) We also compared the performance of DeePosit to a classic object detection algorithm: YOLOv8. We trained YOLOv8 both on a single image input (YOLOv8 Gray) and on 3 image inputs representing a sequence of three time points around the ground truth event (t): t+0, t+10, and t+30 seconds (YOLOv8 RGB). DeePosit achieved significantly better accuracy over both YOLOv8 alternatives. YOLOv8 RGB achieved better accuracy than YOLOv8 Gray, suggesting that temporal information is important for this task. It's worth mentioning that while YOLOv8 requires the annotator to draw rectangles surrounding each urine spot or feces as part of the training set, our algorithm training set used just a single click inside each spot, allowing faster generation of training sets. 

      (8) As for the algorithm parameters, we tested the effect of the main parameter of the preliminary detection (the temperature threshold for the detection of a new blob) and found that a threshold of 1.6°C gave the best accuracy and used this parameter for all of the experiments instead of 1.1°C which was used in the original manuscript. It's worth mentioning that the performance is quite stable (mean F1 score of 0.88-0.89) for the thresholds between 1.1°C and 3°C (Figure 3—Figure Supplement 2).

      (9) We also checked if changing the input length of the video clip that is fed to the classifier affects the accuracy by training the classifier with -11..30 seconds video clips (41 seconds in total) instead of -11..60 seconds (71 seconds in total) and found no difference in accuracy. 

      (10) In the revised paper, we report recall, precision, and F1 scores in the caption of the relevant figures and also supply Excel files with the full statistics for each of the figures.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript provides a novel method for the automated detection of scent marks from urine and feces in rodents. Given the importance of scent communication in these animals and their role as model organisms, this is a welcome tool.

      We thank the reviewer for the positive assessment of our tool

      Strengths:

      The method uses a single video stream (thermal video) to allow for the distinction between urine and feces. It is automated.

      Weaknesses:

      The accuracy level shown is lower than may be practically useful for many studies. The accuracy of urine is 80%. 

      We have trained the model better, using a larger number of video clips. The increase in training set size improved the overall accuracy from a mean F1 score of 0.81 in the previous version to a mean F1 score of 0.891 (see Figure 2 and Figure 3) in the current version. Specifically, the F1 score for urine detection was improved from 0.79 to 0.88. 

      This is understandable given the variability of urine in its deposition, but makes it challenging to know if the data is accurate. If the same kinds of mistakes are maintained across many conditions it may be reasonable to use the software (i.e., if everyone is under/over counted to the same extent). Differences in deposition on the scale of 20% would be challenging to be confident in with the current method, though differences of the magnitude may be of biological interest. Understanding how well the data maintain the same relative ranking of individuals across various timing and spatial deposition metrics may help provide further evidence for the utility of the method.

      The additional test videos allowed us to test the consistency of the algorithm performance across gender, space, time and experiment type (SP, SxP, and ESP). We found consistent levels of performance across all categories (see Figure 3), suggesting that errors made by the algorithm are uniform across conditions, hence should not create any bias of the results.

      Reviewer #2 (Public Review):

      Summary:

      The authors built a tool to extract the timing and location of mouse urine and fecal deposits in their laboratory set up. They indicate that they are happy with the results they achieved in this effort.

      Yes, we are.

      The authors note urine is thought to be an important piece of an animal's behavioral repertoire and communication toolkit so methods that make studying these dynamics easier would be impactful.

      We thank the reviewer for the positive assessment of our work.

      Strengths:

      With the proposed method, the authors are able to detect 79% of the urine that is present and 84% of the feces that is present in a mostly automated way.

      Weaknesses:

      The method proposed has a large number of design choices across two detection steps that aren't investigated. I.e. do other design choices make the performance better, worse, or the same? 

      We chose to use a heuristic preliminary detection algorithm for the detection of warm blobs, since warm blobs can be robustly detected with heuristic algorithms without the need for a training set. This design selection might allow easier adaptation of our algorithm for different types of arenas. Another advantage of using a heuristic preliminary detection is the easy control of the preliminary detection parameters such as the minimum temperature difference for detecting a blob, size limits of the detected blob, cooldown rate and so on that may help in adopting it to new conditions. As for the classifier, we chose to feed it with a relatively small window surrounding each preliminary detection, and hence it is not affected by the arena’s appearance outside of its region of interest. This should allow lower sensitivity to the arena’s appearance.  

      As for the algorithm parameters, we tested the effect of the main parameter of the preliminary detection (the temperature threshold for the detection of a new blob) and found that a threshold of 1.6°C gave the best accuracy and used this parameter for all of the experiments instead of 1.1°C which was used in the original manuscript. It's worth mentioning that the performance is quite stable (mean F1 score of 0.88-0.89) for the thresholds between 1.1°C and 3°.

      We also checked if changing the input length of the video clip fed to the classifier affects the accuracy by training the classifier with -11..30 seconds video clips (41 seconds in total) instead of -11..60 seconds (71 seconds in total) and found no difference in accuracy. 

      Overall, the algorithm's accuracy seems to be rather stable across various choices of parameters.

      Are these choices robust across a range of laboratory environments?

      We tested the algorithm performance on a second strain of mice (male C57BL/6) in a different environmental condition (white arena instead of a black one) and found that the algorithm achieves comparable accuracy, even though C57BL/6 mice and white arena were not included in the training set. Thus, the algorithm seems to be robust and efficient across various experimental conditions.

      How much better are the demonstrated results compared to a simple object detection pipeline (i.e. FasterRCNN or YOLO on the raw heat images)?

      We compared the performance of DeePosit to a classic object detection algorithm: YOLOv8. We trained YOLOv8 both on a single image input (YOLOv8 Gray) and on 3 image inputs representing a sequence of three time points around the ground truth event (t): t+0, t+10, and t+30 seconds (YOLOv8 RGB). DeePosit achieved significantly better accuracy over both YOLOv8 alternatives. YOLOv8 RGB achieved better accuracy than YOLOv8 Gray, suggesting that temporal information is important for this task. It's worth mentioning that while YOLOv8 requires annotator to draw rectangles surrounding each urine spot or feces as part of the training set, our algorithm training set used just a single click inside each spot, allowing faster generation of a training sets. 

      The method is implemented with a mix of MATLAB and Python.

      That is right.

      One proposed reason why this method is better than a human annotator is that it "is not biased." While they may mean it isn't influenced by what the researcher wants to see, the model they present is still statistically biased since each object class has a different recall score. This wasn't investigated. In general, there was little discussion of the quality of the model. 

      We tested the consistency of the algorithm performance across gender, space, time and experiment type (SP, SxP, and ESP). We found consistent levels of performance across all categories (see Figure 3), suggesting that errors made by the algorithm are uniform across conditions, hence should ne create any bias of the results. Specifically, the detection accuracy is similar between urine and feces, hence should not impose a bias between the various object classes.

      Precision scores were not reported.

      In the revised paper we report recall, precision, and F1 scores in the caption of the relevant figures and also supply Excel files with the full statistics for each of the figures.

      Is a recall value of 78.6% good for the types of studies they and others want to carry out? What are the implications of using the resulting data in a study?

      We have trained the model better, using a larger number of video clips. The increase in training set size improved the overall accuracy from a mean F1 score of 0.81 in the previous version to a mean F1 score of 0.891 (see Figure 2 and Figure 3) in the current version. Specifically, the F1 score for urine detection was improved from 0.79 to 0.88. 

      How do these results compare to the data that would be generated by a "biased human?"

      We further evaluated the accuracy of the DeePosit algorithm in comparison to a second human annotator and found that the algorithm accuracy is comparable to human-level accuracy (Figure 3).

      5 out of the 6 figures in the paper relate not to the method but to results from a study whose data was generated from the method. This makes a paper, which, based on the title, is about the method, much longer and more complicated than if it focused on the method.

      We appreciate the reviewer's comment, but the analysis of this new dataset by DeePosit demonstrates how the algorithm may be used to reveal novel and distinguishable dynamics of urination and defecation activities during social interactions, which were not yet reported. 

      Also, even in the context of the experiments, there is no discussion of the implications of analyzing data that was generated from a method with precision and recall values of only 7080%. Surely this noise has an effect on how to correctly calculate p-values etc. Instead, the authors seem to proceed like the generated data is simply correct.

      As mentioned above, the increase in training set size improved the overall accuracy from a mean F1 score of 0.81 in the previous version to a mean F1 score of 0.891 (see Figure 2 and Figure 3) in the current version. Specifically, the F1 score for urine detection was improved from 0.79 to 0.88.  

      Reviewer #3 (Public Review):

      Summary:

      The authors introduce a tool that employs thermal cameras to automatically detect urine and feces deposits in rodents. The detection process involves a heuristic to identify potential thermal regions of interest, followed by a transformer network-based classifier to differentiate between urine, feces, and background noise. The tool's effectiveness is demonstrated through experiments analyzing social preference, stress response, and temporal dynamics of deposits, revealing differences between male and female mice.

      Strengths:

      The method effectively automates the identification of deposits

      The application of the tool in various behavioral tests demonstrates its robustness and versatility.

      The results highlight notable differences in behavior between male and female mice

      We thank the reviewer for the positive assessment of our work.

      Weaknesses:

      The definition of 'start' and 'end' periods for statistical analysis is arbitrary. A robustness check with varying time windows would strengthen the conclusions.

      In all the statistical tests conducted in the revised manuscript, we have used a time period of 4 minutes for the analysis. We did not used the last minute of each stage for the analysis since the input of DeePosit requires 1 minute of video after the event. Nevertheless, we also conducted the same tests using a 5-minute period and found similar results (Figure 5—Figure Supplement 1).

      The paper could better address the generalizability of the tool to different experimental setups, environments, and potentially other species.

      As mentioned above, we tested the algorithm performance on a second strain of mice (male C57BL/6) in a different environmental condition (white arena instead of a black one) and found that the algorithm achieves comparable accuracy, even though C57BL/6 mice and white arena were not included in the training set. Thus, the algorithm seems to be robust and efficient across various experimental conditions.

      The results are based on tests of individual animals, and there is no discussion of how this method could be generalized to experiments tracking multiple animals simultaneously in the same arena (e.g., pair or collective behavior tests, where multiple animals may deposit urine or feces).

      At the moment, the algorithm cannot be applied for multiple animals freely moving in the same arena. However, in the revised manuscript we explicitly discussed what is needed for adapting the algorithm to perform such analyses.

      Recommendations for the authors: 

      -  Add a note and/or perform additional calculations to show that the results do not depend on the specific definitions of 'start' and 'end' periods. For instance, vary the time window thresholds and recalculate the statistics using different windows (e.g., 1-5 minutes instead of 1-4 minutes).

      In all the statistical tests conducted in the revised manuscript, we have used a time period of 4 minutes for the analysis. We did not use the last minute of each stage for the analysis since the input of DeePosit requires 1 minute of video after the event. Nevertheless, we also conducted the same tests using a 5-minute period and found similar results (Figure 5—Figure Supplement 1).

      - Condense Figures 4, 5, and 6 to simplify the presentation. Focus on demonstrating the effectiveness of the tool rather than detailed experimental outcomes, as the primary contribution of this paper is methodological.

      We have added to the revised manuscript one technical figure (Figure 3) comparing the accuracy of the algorithm performance across gender, space, time, and experiment type (SP, SxP, and ESP) as well as comparing its performance to a second human annotator and to YOLOv8. One more partially technical figure (Figure 5) compares the results of the algorithm between white ICR mice in the black arena and black C57BL/6 mice in the white arena. Thus, only Figures 4 and 6 show detailed experimental outcomes.

      - Provide more detail on how the preliminary detection procedure and parameters might need adjustment for different experimental setups or conditions. Discuss potential adaptations for field settings or more complex environments.

      As for the algorithm parameters, we tested the effect of the main parameter of the preliminary detection (the temperature threshold for the detection of a new blob) and found that a threshold of 1.6°C gave the best accuracy and used this parameter for all of the experiments instead of 1.1°C which was used in the original manuscript. It's worth mentioning that the performance is quite stable (mean F1 score of 0.88-0.89) for the thresholds between 1.1°C and 3°.

      We also checked if changing the input length of the video clip that is fed to the classifier affects the accuracy by training the classifier with -11..30 seconds video clips (41 seconds in total) instead of -11..60 seconds (71 seconds in total) and found no difference in accuracy. 

      Overall, the algorithm's accuracy seems to be rather stable across various choices of parameters.

      Editor's note:

      Should you choose to revise your manuscript, please ensure your manuscript includes full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

      We have deposited the detailed statistics of each figure in https://github.com/davidpl2/DeePosit/tree/main/FigStat/PostRevision

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This valuable study investigates how hearing impairment affects neural encoding of speech, in particular the encoding of hierarchical linguistic information. The current analysis provides incomplete evidence that hearing impairment affects speech processing at multiple levels, since the novel analysis based on HM-LSTM needs further justification. The advantage of this method should also be further explained. The study can also benefit from building a stronger link between neural and behavioral data.

      We sincerely thank the editors and reviewers for their detailed and constructive feedback.

      We have revised the manuscript to address all of the reviewers’ comments and suggestions. The primary strength of our methods lies in the use of the HM-LSTM model, which simultaneously captures linguistic information at multiple levels, ranging from phonemes to sentences. As such, this model can be applied to other questions regarding hierarchical linguistic processing. We acknowledge that our current behavioral results from the intelligibility test may not fully differentiate between the perception of lower-level acoustic/phonetic information and higher-level meaning comprehension. However, it remains unclear what type of behavioral test would effectively address this distinction. We aim to xplore this connection further in future studies.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors are attempting to use the internal workings of a language hierarchy model, comprising phonemes, syllables, words, phrases, and sentences, as regressors to predict EEG recorded during listening to speech. They also use standard acoustic features as regressors, such as the overall envelope and the envelopes in log-spaced frequency bands. This is valuable and timely research, including the attempt to show differences between normal-hearing and hearing-impaired people in these regards. I will start with a couple of broader questions/points, and then focus my comments on three aspects of this study: The HM-LSTM language model and its usage, the time windows of relevant EEG analysis, and the usage of ridge regression.

      Firstly, as far as I can tell, the OSF repository of code, data, and stimuli is not accessible without requesting access. This needs to be changed so that reviewers and anybody who wants or needs to can access these materials. 

      It is my understanding that keeping the repository private during the review process and making them public after acceptance is standard practice. As far as I understand, although the OSF repository was private, anyone with the link should be able to access it. I have now made the repository public.

      What is the quantification of model fit? Does it mean that you generate predicted EEG time series from deconvolved TRFs, and then give the R2 coefficient of determination between the actual EEG and predicted EEG constructed from the convolution of TRFs and regressors? Whether or not this is exactly right, it should be made more explicit.

      Model fit was measured by spatiotemporal cluster permutation tests (Maris & Oostenveld, 2007) on the contrasts of the timecourses of the z-transformed coefficient of determination (R<sup>2</sup>). For instance, to assess whether words from the attended stimuli better predict EEG signals during the mixed speech compared to words from the unattended stimuli, we used the 150dimensional vectors corresponding to the word layer from our LSTM model for the attended and unattended stimuli as regressors. We then fit these regressors to the EEG signals at 9 time points (spanning -100 ms to 300 ms around the sentence offsets, with 50 ms intervals). We then conducted one-tailed two-sample t-tests to determine whether the differences in the contrasts of the R<sup>2</sup> timecourses were statistically significant. Note that we did not perform TRF analyses. We have clarified this description in the “Spatiotemporal clustering analysis” section of the “Methods and Materials” on p.10 of the manuscript.

      About the HM-LSTM:

      • In the Methods paragraph about the HM-LSTM, a lot more detail is necessary to understand how you are using this model. Firstly, what do you mean that you "extended" it, and what was that procedure? 

      The original HM-LSTM model developed by Chung et al. (2017) consists of only two levels: the word level and the phrase level (Figure 1b from their paper). By “extending” the model, we mean that we expanded its architecture to include five levels: phoneme, syllable, word, phrase, and sentence. Since our input consists of phoneme embeddings, we cannot directly apply their model, so we trained our model on the WenetSpeech corpus (Zhang et al., 2021), which provides phoneme-level transcripts. We have added this clarification on p.4 of the manuscript.

      • And generally, this is the model that produces most of the "features", or regressors, whichever word we like, for the TRF deconvolution and EEG prediction, correct? 

      Yes, we extracted the 2048-dimensional hidden layer activity from the model to represent features for each sentence in our speech stimuli at the phoneme, syllable, word, phrase and sentence levels. But we did not perform any TRF deconvolution, we fit these features (downsampled to 150-dimension using PCA) to the EEG signals at 9 timepoints around the offset of each sentence using ridge regression. We have now added a multivariate TRF (mTRF) analysis following Reviewer 3’s suggestions, and the results showed similar patterns to the current results (see Figure S2). We have added the clarification in the “Ridge regression at different time latencies” section of the “Methods and Materials” on p.10 of the manuscript.

      Resutls from the mTRF analyses were added on p.7 of the manuscript.

      • A lot more detail is necessary then, about what form these regressors take, and some example plots of the regressors alongside the sentences.

      The linguistic regressors are just 5 150-dimensional vectors, each corresponding to one linguistic level, as shown in Figure 1B.

      • Generally, it is necessary to know what these regressors look like compared to other similar language-related TRF and EEG/MEG prediction studies. Usually, in the case of e.g. Lalor lab papers or Simon lab papers, these regressors take the form of single-sample event markers, surrounded by zeros elsewhere. For example, a phoneme regressor might have a sample up at the onset of each phoneme, and a word onset regressor might have a sample up at the onset of each word, with zeros elsewhere in the regressor. A phoneme surprisal regressor might have a sample up at each phoneme onset, with the value of that sample corresponding to the rarity of that phoneme in common speech. Etc. Are these regressors like that? Or do they code for these 5 linguistic levels in some other way? Either way, much more description and plotting is necessary in order to compare the results here to others in the literature.

      No, these regressors were not like that. They were 150-dimensional vectors (after PCA dimension reduction) extracted from the hidden layers of the HM-LSTM model. After training the model on the WenetSpeech corpus, we ran it on our speech stimuli and extracted representations from the five hidden layers to correspond to the five linguistic levels. As mentioned earlier, we did not perform TRF analyses; instead, we used ridge regression to predict EEG signals around the offset of each sentence, a method commonly employed in the literature (e.g., Caucheteux & King, 2022; Goldstein et al., 2022; Schmitt et al., 2021; Schrimpf et al., 2021). For instance, Goldstein et al. (2022) used word embeddings from GPT-2 to predict ECoG activity surrounding the onset of each word during naturalistic listening. We have included these literatures on p.3 in the manuscript, and the method is illustrated in Figure 1B.

      • You say that the 5 regressors that are taken from the trained model's hidden layers do not have much correlation with each other. However, the highest correlations are between syllable and sentence (0.22), and syllable and word (0.17). It is necessary to give some reason and interpretation of these numbers. One would think the highest correlation might be between syllable and phoneme, but this one is almost zero. Why would the syllable and sentence regressors have such a relatively high correlation with each other, and what form do those regressors take such that this is the case?

      All the regressors are represented as 2048-dimensional vectors derived from the hidden layers of the trained HM-LSTM model. We applied the trained model to all 284 sentences in our stimulus text, generating a set of 284 × 2048-dimensional vectors. Next, we performed Principal Component Analysis (PCA) on the 2048 dimensions and extracted the first 100 principal components (PCs), resulting in 284 × 100-dimensional vectors for each regressor. These 284 × 100 matrices were then flattened into 28,400-dimensional vectors. Subsequently, we computed the correlation matrix for the z-transformed 28,400-dimensional vectors of our five linguistic regressors. The code for this analysis, lstm_corr.py, can be found in our OSF repository. We have added a section “Correlation among linguistic features” in “Materials and Methods” on p.10 of the manuscript.

      We consider the observed coefficients of 0.17 and 0.22 to be relatively low compared to prior model-brain alignment studies which report correlation coefficients above 0.5 for linguistic regressors (e.g., Gao et al., 2024; Sugimoto et al., 2024). In Chinese, a single syllable can also function as a word, potentially leading to higher correlations between regressors for syllables and words. However, we refrained from overinterpreting the results to suggest a higher correlation between syllable and sentence compared to syllable and word. A paired ttest of the syllable-word coefficients versus syllable-sentence coefficients across the 284 sentences revealed no significant difference (t(28399)=-3.96, p=1). We have incorporated this information into p.5 of the manuscript.

      • If these regressors are something like the time series of zeros along with single sample event markers as described above, with the event marker samples indicating the onset of the relevant thing, then one would think e.g. the syllable regressor would be a subset of the phoneme regressor because the onset of every syllable is a phoneme. And the onset of every word is a syllable, etc.

      All the regressors are aligned to 9 time points surrounding sentence offsets (-100 ms to 300 ms with a 50 ms interval). This is because all our regressors are taken from the HM-LSTM model, where the input is the phoneme representation of a sentence (e.g., “zh ə_4 y ie_3 j iəu_4 x iaŋ_4 sh uei_3 y ii_2 y aŋ_4”). For each unit in the sentence, the model generates five 2048dimensional vectors, each corresponding to the five linguistic levels of the entire sentence. We have added the clarification on p.11 of the manuscript.

      For the time windows of analysis:

      • I am very confused, because sometimes the times are relative to "sentence onset", which would mean the beginning of sentences, and sometimes they are relative to "sentence offset", which would mean the end of sentences. It seems to vary which is mentioned. Did you use sentence onsets, offsets, or both, and what is the motivation?

      • If you used onsets, then the results at negative times would not seem to mean anything, because that would be during silence unless the stimulus sentences were all back to back with no gaps, which would also make that difficult to interpret.

      • If you used offsets, then the results at positive times would not seem to mean anything, because that would be during silence after the sentence is done. Unless you want to interpret those as important brain activity after the stimuli are done, in which case a detailed discussion of this is warranted.

      Thank you very much for pointing this out. All instances of “sentence onset” were typos and should be corrected to “sentence offset.” We chose offset because the regressors are derived from the hidden layer activity of our HM-LSTM model, which processes the entire sentence before generating outputs. We have now corrected all the typos. In continuous speech, there is no distinct silence period following sentence offsets. Additionally, lexical or phrasal processing typically occurs 200 ms after stimulus offsets (Bemis & Pylkkanen, 2011; Goldstein et al., 2022; Li et al., 2024; Li & Pylkkänen, 2021). Therefore, we included a 300 ms interval after sentence offsets in our analysis, as our regressors encompass linguistic levels up to the sentence level. We have added this motivation on p.11 of the manuscript.

      • For the plots in the figures where the time windows and their regression outcomes are shown, it needs to be explicitly stated every time whether those time windows are relative to sentence onset, offset, or something else.

      Completely agree and thank you very much for the suggestion. We have now added this information on Figure 4-6.

      • Whether the running correlations are relative to sentence onset or offset, the fact that you can have numbers outside of the time of the sentence (negative times for onset, or positive times for offset) is highly confusing. Why would the regressors have values outside of the sentence, meaning before or after the sentence/utterance? In order to get the running correlations, you presumably had the regressor convolved with the TRF/impulse response to get the predicted EEG first. In order to get running correlation values outside the sentence to correlate with the EEG, you would have to have regressor values at those time points, correct? How does this work?

      As mentioned earlier, we did not perform TRF analyses or convolve the regressors. Instead, we conducted regression analyses at each of the 9 time points surrounding the sentence offsets, following standard methods commonly used in model-brain alignment studies (e.g., Gao et al., 2024; Goldstein et al., 2022). The time window of -100 to 300 ms was selected based on prior findings that lexical and phrasal processing typically occurs 200–300 ms after word offsets (Bemis & Pylkkanen, 2011; Goldstein et al., 2022; Li et al., 2024; Li & Pylkkänen, 2021). Additionally, we included the -100 to 200 ms time period in our analysis to examine phoneme and syllable level processing (cf. Gwilliams et al., 2022). We have added the clarification on p. of the manuscript.

      • In general, it seems arbitrary to choose sentence onset or offset, especially if the comparison is the correlation between predicted and actual EEG over the course of a sentence, with each regressor. What is going on with these correlations during the middle of the sentences, for example? In ridge regression TRF techniques for EEG/MEG, the relevant measure is often the overall correlation between the predicted and actual, calculated over a longer period of time, maybe the entire experiment. Here, you have calculated a running comparison between predicted and actual, and thus the time windows you choose to actually analyze can seem highly cherry-picked, because this means that most of the data is not actually analyzed.

      The rationale for choosing sentence offsets instead of onsets is that we are aligning the HM-LSTM model’s activity with EEG responses, and the input to the model consists of phoneme representations of the entire sentence at one time. In other words, the model needs to process the whole sentence before generating representations at each linguistic level. Therefore, the corresponding EEG responses should also align with the sentence offsets, occurring after participants have seen the complete sentence. The ridge regression followed the common practice in model-brain alignment studies (e.g., Gao et al., 2024; Goldstein et al., 2022; Huth et al., 2016; Schmitt et al., 2021; Schrimpf et al., 2021), and the time window is not cherrypicked but based on prior literature reporting lexical and sublexical processing at these time period (e.g., Bemis & Pylkkanen, 2011; Goldstein et al., 2022; Gwilliams et al., 2022; Li et al., 2024; Li & Pylkkänen, 2021).

      • In figures 5 and 6, some of the time window portions that are highlighted as significant between the two lines have the lines intersecting. This looks like, even though you have found that the two lines are significantly different during that period of time, the difference between those lines is not of a constant sign, even during that short period. For instance, in figure 5, for the syllable feature, the period of 0 - 200 ms is significantly different between the two populations, correct? But between 0 and 50, normal-hearing are higher, between 50 and 150, hearing-impaired are higher, and between 150 and 200, normal-hearing are higher again, correct? But somehow they still end up significantly different overall between 0 and 200 ms. More explanation of occurrences like these is needed.

      The intersecting lines in Figures 5 and represent the significant time windows for withingroup comparisons (i.e., significant model fit compared to 0). They do not depict betweengroup comparisons, as no significant contrasts were found between the groups. For example, in Figure 1, the significant time windows for the acoustic models are shown separately for the hearing-impaired and normal-hearing groups. No significant differences were observed, as indicated by the sensor topography. We have now clarified this point in the captions for Figures 5 and 6.

      Using ridge regression:

      • What software package(s) and procedure(s) were specifically done to accomplish this? If this is ridge regression and not just ordinary least squares, then there was at least one non-zero regularization parameter in the process. What was it, how did it figure in the modeling and analysis, etc.?

      The ridge regression was performed using customary python codes, making heavy use of the sklearn (v1.12.0) package. We used ridge regression instead of ordinary least squares regression because all our linguistic regressors are 150-dimensional dense vectors, and our acoustic regressors are 130-dimension vectors (see “Acoustic features of the speech stimuli” in “Materials and Methods”). We kept the default regularization parameter (i.e., 1). This ridge regression methods is commonly used in model-brain alignment studies, where the regressors are high-dimensional vectors taken from language models (e.g., Gao et al., 2024; Goldstein et al., 2022; Huth et al., 2016; Schmitt et al., 2021; Schrimpf et al., 2021). The code ridge_lstm.py can be found in our OSF repository, and we have added the more detailed description on p.11 of the manuscript.

      • It sounds like the regressors are the hidden layer activations, which you reduced from 2,048 to 150 non-acoustic, or linguistic, regressors, per linguistic level, correct? So you have 150 regressors, for each of 5 linguistic levels. These regressors collectively contribute to the deconvolution and EEG prediction from the resulting TRFs, correct? This sounds like a lot of overfitting. How much correlation is there from one of these 150 regressors to the next? Elsewhere, it sounds like you end up with only one regressor for each of the 5 linguistic levels. So these aspects need to be clarified.

      • For these regressors, you are comparing the "regression outcomes" for different conditions; "regression outcomes" are the R2 between predicted and actual EEG, which is the coefficient of determination, correct? If this is R2, how is it that you have some negative numbers in some of the plots? R2 should be only positive, between 0 and 1.

      Yes we reduced 2048-dimensional vectors for each of the 5 linguistic levels to 150 using PCA, mainly for saving computational resources. We used ridge regression, following the standard practice in the field (e.g., Gao et al., 2024; Goldstein et al., 2022; Huth et al., 2016; Schmitt et al., 2021; Schrimpf et al., 2021). 

      Yes, the regression outcomes are the R<sup>2</sup> values representing the fit between the predicted and actual EEG data. However, we reported normalized R<sup>2</sup> values which are ztransformed in the plots. All our spatiotemporal cluster permutation analyses were conducted using the z-transformed R<sup>2</sup> values. We have added this clarification both in the figure captions and on p.11 of the manuscript. As a side note, R<sup>2</sup> values can be negative because they are not the square of a correlation coefficient. Rather, R<sup>2</sup> compares the fit of the chosen model to that of a horizontal straight line (the null hypothesis). If the chosen model fits the data worse than the horizontal line, then R<sup>2</sup> value becomes negative: https://www.graphpad.com/support/faq/how-can-rsup2sup-be-negative 

      Reviewer #2 (Public Review):

      This study compares neural responses to speech in normal-hearing and hearing-impaired listeners, investigating how different levels of the linguistic hierarchy are impacted across the two cohorts, both in a single-talker and multi-talker listening scenario. It finds that, while normal-hearing listeners have a comparable cortical encoding of speech-in-quiet and attended speech from a multi-talker mixture, participants with hearing impairment instead show a reduced cortical encoding of speech when it is presented in a competing listening scenario. When looking across the different levels of the speech processing hierarchy in the multi-talker condition, normal-hearing participants show a greater cortical encoding of the attended compared to the unattended stream in all speech processing layers - from acoustics to sentencelevel information. Hearing-impaired listeners, on the other hand, only have increased cortical responses to the attended stream for the word and phrase levels, while all other levels do not differ between attended and unattended streams.

      The methods for modelling the hierarchy of speech features (HM-LSTM) and the relationship between brain responses and specific speech features (ridge-regression) are appropriate for the research question, with some caveats on the experimental procedure. This work offers an interesting insight into the neural encoding of multi-talker speech in listeners with hearing impairment, and it represents a useful contribution towards understanding speech perception in cocktail-party scenarios across different hearing abilities. While the conclusions are overall supported by the data, there are limitations and certain aspects that require further clarification.

      (1) In the multi-talker section of the experiment, participants were instructed to selectively attend to the male or the female talker, and to rate the intelligibility, but they did not have to perform any behavioural task (e.g., comprehension questions, word detection or repetition), which could have demonstrated at least an attempt to comply with the task instructions. As such, it is difficult to determine whether the lack of increased cortical encoding of Attended vs. Unattended speech across many speech features in hearing-impaired listeners is due to a different attentional strategy, which might be more oriented at "getting the gist" of the story (as the increased tracking of only word and phrase levels might suggest), or instead it is due to hearing-impaired listeners completely disengaging from the task and tuning back in for selected key-words or word combinations. Especially the lack of Attended vs. Unattended cortical benefit at the level of acoustics is puzzling and might indicate difficulties in performing the task. I think this caveat is important and should be highlighted in the Discussion section. RE: Thank you very much for the suggestion. We admit that the hearing-impaired listeners might adopt different attentional strategies or potentially disengage from the task due to comprehension difficulties. However, we would like to emphasize that our hearing-impaired participants have extended high-frequency (EHF) hearing loss, with impairment only at frequencies above 8 kHz. Their condition is likely not severe enough to cause them to adopt a markedly different attentional strategy for this task. Moreover, it is possible that our normalhearing listeners may also adopt varying attentional strategies, yet the comparison still revealed notable differences.We have added the caveat in the Discussion section on p.8 of the manuscript.

      (2) In the EEG recording and preprocessing section, you state that the EEG was filtered between 0.1Hz and 45Hz. Why did you choose this very broadband frequency range? In the literature, speech responses are robustly identified between 0.5Hz/1Hz and 8Hz. Would these results emerge using a narrower and lower frequency band? Considering the goal of your study, it might also be interesting to run your analysis pipeline on conventional frequency bands, such as Delta and Theta, since you are looking into the processing of information at different temporal scales.

      Indeed, we have decomposed the epoched EEG time series for each section into six classic frequency bands components (delta 1–3 Hz, theta 4–7 Hz, alpha 8–12 Hz, beta 12–20 Hz, gamma 30–45 Hz) by convolving the data with complex Morlet wavelets as implemented in MNE-Python (version 0.24.0). The number of cycles in the Morlet wavelets was set to frequency/4 for each frequency bin. The power values for each time point and frequency bin were obtained by taking the square root of the resulting time-frequency coefficients. These power values were normalized to reflect relative changes (expressed in dB) with respect to the 500 ms pre-stimulus baseline. This yielded a power value for each time point and frequency bin for each section. We specifically examined the delta and theta bands, and computed the correlation between the regression outcome (R<sup>2</sup> in the shape of number of subject * sensor * time were flattened for computing correlation) for the five linguistic predictors from these bands and those obtained using data from all frequency bands. The results showed high correlation coefficients (see the correlation matrix in Supplementary Figures S2 for the attended and unattended speech). Therefore, we opted to use the epoched EEG data from all frequency bands for our analyses. We have added this clarification in the Results section on p.5 and the “EEG recording and preprocessing” section in “Materials and Methods” on p.11 of the manuscript.

      (3) A paragraph with more information on the HM-LSTM would be useful to understand the model used without relying on the Chung et al. (2017) paper. In particular, I think the updating mechanism of the model should be clarified. It would also be interesting to modify the updating factor of the model, along the lines of Schmitt et al. (2021), to assess whether a HM-LSTM with faster or slower updates can better describe the neural activity of hearing-impaired listeners. That is, perhaps the difference between hearing-impaired and normal-hearing participants lies in the temporal dynamics, and not necessarily in a completely different attentional strategy (or disengagement from the stimuli, as I mentioned above).

      Thank you for the suggestion. We have added more details on our HM-LSTM model on p.10 “Hierarchical multiscale LSTM model” in “Materials and Methods”: Our HM-LSTM model consists of 4 layers, at each layer, the model implements a COPY or UPDATE operation at each time step t. The COPY operation maintains the current cell state of without any changes until it receives a summarized input from the lower layer. The UPDATE operation occurs when a linguistic boundary is detected in the layer below, but no boundary was detected at the previous time step t-1. In this case, the cell updates its summary representation, similar to standard RNNs. We agree that exploring modifications to the model’s updating factor would be an interesting direction. However, since we have already observed contrasts between normal-hearing and hearing-impaired listeners using the current model’s update parameters, we believe discussing additional hypotheses would overextend the scope of this paper.

      (4) When explaining how you extracted phoneme information, you mention that "the inputs to the model were the vector representations of the phonemes". It is not clear to me whether you extracted specific phonetic features (e.g., "p" sound vs. "b" sound), or simply the phoneme onsets. Could you clarify this point in the text, please?

      The model inputs were individual phonemes from two sentences, each transformed into a 1024-dimensional vector using a simple lookup table. This lookup table stores embeddings for a fixed dictionary of all unique phonemes in Chinese. This approach is a foundational technique in many advanced NLP models, enabling the representation of discrete input symbols in a continuous vector space. We have added this clarification on p.10 of the manuscript.

      Reviewer #3 (Public Review):

      Summary:

      The authors aimed to investigate how the brain processes different linguistic units (from phonemes to sentences) in challenging listening conditions, such as multi-talker environments, and how this processing differs between individuals with normal hearing and those with hearing impairments. Using a hierarchical language model and EEG data, they sought to understand the neural underpinnings of speech comprehension at various temporal scales and identify specific challenges that hearing-impaired listeners face in noisy settings.

      Strengths:

      Overall, the combination of computational modeling, detailed EEG analysis, and comprehensive experimental design thoroughly investigates the neural mechanisms underlying speech comprehension in complex auditory environments.

      The use of a hierarchical language model (HM-LSTM) offers a data-driven approach to dissect and analyze linguistic information at multiple temporal scales (phoneme, syllable, word, phrase, and sentence). This model allows for a comprehensive neural encoding examination of how different levels of linguistic processing are represented in the brain.

      The study includes both single-talker and multi-talker conditions, as well as participants with normal hearing and those with hearing impairments. This design provides a robust framework for comparing neural processing across different listening scenarios and groups.

      Weaknesses:

      The analyses heavily rely on one specific computational model, which limits the robustness of the findings. The use of a single DNN-based hierarchical model to represent linguistic information, while innovative, may not capture the full range of neural coding present in different populations. A low-accuracy regression model-fit does not necessarily indicate the absence of neural coding for a specific type of information. The DNN model represents information in a manner constrained by its architecture and training objectives, which might fit one population better than another without proving the non-existence of such information in the other group. To address this limitation, the authors should consider evaluating alternative models and methods. For example, directly using spectrograms, discrete phoneme/syllable/word coding as features, and performing feature-based temporal response function (TRF) analysis could serve as valuable baseline models. This approach would provide a more comprehensive evaluation of the neural encoding of linguistic information.

      Our acoustic features are indeed direct the broadband envelopes and the log-mel spectrograms of the speech streams. The amplitude envelope of the speech signal was extracted using the Hilbert transform. The 129-dimension spectrogram and 1-dimension envelope were concatenated to form a 130-dimension acoustic feature at every 10 ms of the speech stimuli. Given the duration of our EEG recordings, which span over 10 minutes, conducting multivariate TRF (mTRF) analysis with such high-dimensional predictors was not feasible. Instead, we used ridge regression to predict EEG responses across 9 temporal latencies, ranging from -100 ms to +300 ms, with additional 50 ms latencies surrounding sentence offsets. To evaluate the model's performance, we extracted the R<sup>2</sup> values at each latency, providing a temporal profile of regression performance over the analyzed time period. This approach is conceptually similar to TRF analysis.

      We agree that including baseline models for the linguistic features is important, and we have now added results from mTRF analysis using phoneme, syllable, word, phrase, and sentence rates as discrete predictors (i.e., marking a value of 1 at each unit boundary offset). Our EEG data spans the entire 10-minute duration for each condition, sampled at 10-ms intervals. The TRF results for our main comparison—attended versus unattended conditions— showed similar patterns to those observed using features from our HM-LSTM model. At the phoneme and syllable levels, normal-hearing listeners showed marginally significantly higher TRF weights for attended speech compared to unattended speech at approximately -80 to 150 ms after phoneme offsets (t=2.75, Cohen’s d=0.87, p=0.057), and 120 to 210 ms after syllable offsets (t=3.96, Cohen’s d=0.73d = 0.73, p=0.083). At the word and phrase levels, normalhearing listeners exhibited significantly higher TRF weights for attended speech compared to unattended speech at 190 to 290 ms after word offsets (t=4, Cohen’s d=1.13, p=0.049), and around 120 to 290 ms after phrase offsets (t=5.27, Cohen’s d=1.09, p=0.045). For hearing-impaired listeners, marginally significant effects were observed at 190 to 290 ms after word offsets (t=1.54, Cohen’s d=0.6, p=0.059), and 180 to 290 ms after phrase offsets (t=3.63, Cohen’s d=0.89, p=0.09). These results have been added on p.7 of the manuscript, and the corresponding figure is included as Supplementary F2.

      It is not entirely clear if the DNN model used in this study effectively serves the authors' goal of capturing different linguistic information at various layers. Specifically, the results presented in Figure 3C are somewhat confusing. While the phonemes are labeled, the syllables, words, phrases, and sentences are not, making it difficult to interpret how the model distinguishes between these levels of linguistic information. The claim that "Hidden-layer activity for samevowel sentences exhibited much more similar distributions at the phoneme and syllable levels compared to those at the word, phrase and sentence levels" is not convincingly supported by the provided visualizations. To strengthen their argument, the authors should use more quantified metrics to demonstrate that the model indeed captures phrase, word, syllable, and phoneme information at different layers. This is a crucial prerequisite for the subsequent analyses and claims about the hierarchical processing of linguistic information in the brain.

      Quantitative measures such as mutual information, clustering metrics, or decoding accuracy for each linguistic level could provide clearer evidence of the model's effectiveness in this regard.

      In Figure 3C, we used color-coding to represent the activity of five hidden layers after dimensionality reduction. Each dot on the plot corresponds to one test sentence. Only phonemes are labeled because each syllable in our test sentences contains the same vowels (see Table S1). The results demonstrate that the phoneme layer effectively distinguishes different phonemes, while the higher linguistic layers do not. We believe these findings provide evidence that different layers capture distinct linguistic information. Additionally, we computed the correlation coefficients between each pair of linguistic predictors, as shown in Figure 3B. We think this analysis serves a similar purpose to computing the mutual information between pairs of hidden-layer activities for our constructed sentences. Furthermore, the mTRF results based on rate models of the linguistic features we presented earlier align closely with the regression results using the hidden-layer activity from our HM-LSTM model. This further supports the conclusion that our model successfully captures relevant information across these linguistic levels. We have added the clarification on p.5 of the manuscript.

      The formulation of the regression analysis is somewhat unclear. The choice of sentence offsets as the anchor point for the temporal analysis, and the focus on the [-100ms, +300ms] interval, needs further justification. Since EEG measures underlying neural activity in near real-time, it is expected that lower-level acoustic information, which is relatively transient, such as phonemes and syllables, would be distributed throughout the time course of the entire sentence. It is not evident if this limited time window effectively captures the neural responses to the entire sentence, especially for lower-level linguistic features. A more comprehensive analysis covering the entire time course of the sentence, or at least a longer temporal window, would provide a clearer understanding of how different linguistic units are processed over time. Additionally, explaining the rationale behind choosing this specific time window and how it aligns with the temporal dynamics of speech processing would enhance the clarity and validity of the regression analysis.

      Thank you for pointing this out. We chose this time window as lexical or phrasal processing typically occurs 200 ms after stimulus offsets (Bemis & Pylkkanen, 2011; Goldstein et al., 2022; Li et al., 2024; Li & Pylkkänen, 2021). Additionally, we included the -100 to 200 ms time period in our analysis to examine phoneme and syllable level processing (e.g., Gwilliams et al., 2022). Using the entire sentence duration was not feasible, as the sentences in the stimuli vary in length, making statistical analysis challenging. Additionally, since the stimuli consist of continuous speech, extending the time window would risk including linguistic units from subsequent sentences. This would introduce ambiguity as to whether the EEG responses correspond to the current or the following sentence. We have added this clarification on p.12 of the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      As I mentioned, I think the OSF repo needs to be changed to give anyone access. I would recommend pursuing the lines of thought I mentioned in the public review to make this study complete and to allow it to fit into the already existing literature to facilitate comparisons.

      Yes the OSF folder is now public. We have made revisions following all reviewers’ suggestions.

      There are some typos in figure labels, e.g. 2B.

      Thank you for pointing it out! We have now revised the typo in Figure 2B.

      Reviewer #2 (Recommendations For The Authors):

      (1) I was able to access all of the audio files and code for the study, but no EEG data was shared in the OSF repository. Unless there is some ethical and/or legal constraint, my understanding of eLife's policy is that the neural data should be made publicly available as well.

      The preprocessed EEG data in .npy format in the OSF repository. 

      (2) The line-plots in Figures 4B,5B, and 6B have very similar colours. They would be easier to interpret if you changed the line appearance as well as the colours. E.g., dotted line for hearingimpaired listeners, thick line for normal-hearing.

      Thank you for the suggestion! We have now used thicker lines for normal-impaired listeners in all our line plots.

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors may consider presenting raw event-related potentials (ERPs) or spatiotemporal response profiles before delving into the more complex regression encoding analysis. This would provide a clearer foundational understanding of the neural activity patterns. For example, it is not clear if the main claims, such as the neural activity in the normal-hearing group encoding phonetic information in attended speech better than in unattended speech, are directly observable. Showing ERP differences or spatiotemporal response pattern differences could support these claims more straightforwardly. Additionally, training pattern classifiers to test if different levels of information can be decoded from EEG activity in specific groups could provide further validation of the findings.

      We have now included results from more traditional mTRF analyses using phoneme, syllable, word, phrase, and sentence rates as baseline models (see p.7 of the manuscript and Figure S3). The results show similar patterns to those observed in our current analyses. While we agree that classification analyses would be very interesting, our regression analyses have already demonstrated distinct EEG patterns for each linguistic level. Consequently, classification analyses would likely yield similar results unless a different method for representing linguistic information at these levels is employed. To the best of our knowledge, no other computational model currently exists that can simultaneously represent these linguistic levels.

      (2) Is there any behavioral metric suggesting that these hearing-impaired participants do have deficits in comprehending long sentences? The self-rated intelligibility is useful, but cannot fully distinguish between perceiving lower-level phonetic information vs longer sentence comprehension.

      In the current study, we included only self-rated intelligibility tests. We acknowledge that this approach might not fully distinguish between the perception of lower-level phonetic information and higher-level sentence comprehension. However, it remains unclear what type of behavioral test would effectively address this distinction. Furthermore, our primary aim was to use the behavioral results to demonstrate that our hearing-impaired listeners experienced speech comprehension difficulties in multi-talker environments, while relying on the EEG data to investigate comprehension challenges at various linguistic levels.

      Minor:

      (1) Page 2, second line in Introduction, "Phonemes occur over ..." should be lowercase.

      According to APA format, the first word after the colon is capitalized if it begins a complete sentence (https://blog.apastyle.org/apastyle/2011/06/capitalization-after-colons.html). Here

      the sentence is a complete sentence so we used uppercase for “phonemes”.

      (2) Page 8, second paragraph "...-100ms to 100ms relative to sentence onsets", should it be onsets or offsets?

      This is typo and it should be offsets. We have now revised it.

      References

      Bemis, D. K., & Pylkkanen, L. (2011). Simple composition: An MEG investigation into the comprehension of minimal linguistic phrases. Journal of Neuroscience, 31(8), 2801– 2814.

      Gao, C., Li, J., Chen, J., & Huang, S. (2024). Measuring meaning composition in the human brain with composition scores from large language models. In L.-W. Ku, A. Martins, & V. Srikumar (Eds.), Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 11295–11308). Association for Computational Linguistics.

      Goldstein, A., Zada, Z., Buchnik, E., Schain, M., Price, A., Aubrey, B., Nastase, S. A., Feder, A., Emanuel, D., Cohen, A., Jansen, A., Gazula, H., Choe, G., Rao, A., Kim, C., Casto, C., Fanda, L., Doyle, W., Friedman, D., … Hasson, U. (2022). Shared computational principles for language processing in humans and deep language models. Nature Neuroscience, 25(3), Article 3.

      Gwilliams, L., King, J.-R., Marantz, A., & Poeppel, D. (2022). Neural dynamics of phoneme sequences reveal position-invariant code for content and order. Nature Communications, 13(1), Article 1.

      Huth, A. G., de Heer, W. A., Griffiths, T. L., Theunissen, F. E., & Gallant, J. L. (2016). Natural speech reveals the semantic maps that tile human cerebral cortex. Nature, 532(7600), 453–458.

      Li, J., Lai, M., & Pylkkänen, L. (2024). Semantic composition in experimental and naturalistic paradigms. Imaging Neuroscience, 2, 1–17.

      Li, J., & Pylkkänen, L. (2021). Disentangling semantic composition and semantic association in the left temporal lobe. Journal of Neuroscience, 41(30), 6526–6538.

      Maris, E., & Oostenveld, R. (2007). Nonparametric statistical testing of EEG- and MEG-data. Journal of Neuroscience Methods, 164(1), 177–190.

      Schmitt, L.-M., Erb, J., Tune, S., Rysop, A. U., Hartwigsen, G., & Obleser, J. (2021). Predicting speech from a cortical hierarchy of event-based time scales. Science Advances, 7(49), eabi6070.

      Schrimpf, M., Blank, I. A., Tuckute, G., Kauf, C., Hosseini, E. A., Kanwisher, N., Tenenbaum, J. B., & Fedorenko, E. (2021). The neural architecture of language: Integrative modeling converges on predictive processing. Proceedings of the National Academy of Sciences, 118(45), e2105646118.

      Sugimoto, Y., Yoshida, R., Jeong, H., Koizumi, M., Brennan, J. R., & Oseki, Y. (2024). Localizing Syntactic Composition with Left-Corner Recurrent Neural Network Grammars. Neurobiology of Language, 5(1), 201–224.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      The study presents some useful findings on Mendelian randomization-phenome-wide association, with BMI associated with health outcomes, and there is a focus on sex differences. Although there are some solid phenotype and genotype data, some of the data are incomplete and could be better presented, perhaps benefiting from more rigorous approaches. Confirmation and further assessment of the observed sex differences will add further value.

      Thank you for your positive comments. We have revised the analysis based on your feedback and that from the two reviewers. Specifically, we implemented a stricter multiple testing correction approach, improved the figures, included additional figures in the Supplementary Materials, considered the sex differences more rigorously and reported them in more detail. A comprehensive description of the revisions is provided below.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study uses information from the UK Biobank and aims to investigate the role of BMI on various health outcomes, with a focus on differences by sex. They confirm the relevance of many of the well-known associations between BMI and health outcomes for males and females and suggest that associations for some endpoints may differ by sex. Overall their conclusions appear supported by the data. The significance of the observed sex variations will require confirmation and further assessment.

      Strengths:

      This is one of the first systematic evaluations of sex differences between BMI and health outcomes. The hypothesis that BMI may be associated with health differentially based on sex is relevant and even expected. As muscle is heavier than adipose tissue, and as men typically have more muscle than women, as a body composition measure BMI is sometimes prone to classifying even normal weight/muscular men as obese, while this measure is more lenient when used in women. Confirmation of the many well-known associations is as expected and attests to the validity of their approach. Demonstration of the possible sex differences is interesting, with this work raising the need for further study.

      Thank you for your valuable comments. We are grateful for the time and effort you have devoted to reviewing our manuscript. We have strengthened our paper by adding your insightful comment about the rationale for sex-specific analysis to the introduction:

      Weaknesses:

      (1) Many of the statistical decisions appeared to target power at the expense of quality/accuracy. For example, they chose to use self-reported information rather than doctor diagnoses for disease outcomes for which both types of data were available.

      Thank you for your valuable comments. We apologize for the lack of clarity in our original description of the phenotypes. Information about health in the UK Biobank was obtained at baseline from tests, measurements and self reports. Subsequently comprehensive data linkage to hospital admissions, death registries and cancer registries was implemented. However, data linkage to primary care data, such as doctor diagnoses, has not been comprehensively implemented for the UK Biobank, possibly for logistic reasons. Doctor diagnoses are only available for about half the cohort, (https://www.ukbiobank.ac.uk/enable-your-research/about-our-data/health-related-outcomes-data). So, we used self-reported diagnoses because they are substantially more comprehensive than the doctor diagnoses. We have explained this point by making the following change to the Methods:

      “Where attributes were available from both self-report and doctor diagnosis, we used self-reports. This is because comprehensive record linkage to doctor diagnoses has not yet been fully implemented for the UK Biobank, so information from doctor diagnoses may not fully represent the broader UK Biobank cohort.”

      (2) Despite known problems and bias arising from the use of one sample approach, they chose to use instruments from the UK Biobank instead of those available from the independent GIANT GWAS, despite the difference in sample size being only marginally greater for UKB for the context. With the way the data is presented, it is difficult to assess the extent to which results are compatible across approaches.

      Thank you for your comments. We agree completely about the issues with a one sample approach, please accept our apologies for not explaining our rationale. The sex-specific GIANT GWAS study is similar in size to the UK Biobank GWAS. However, the sex-specific GIANT GWAS is much less densely genotyped (~2,5 million variants) than the sex-specific UK Biobank GWAS (~10 million variants), so has less power, hence our use of the UK Biobank. To make this clear, we have added the number of variants in each study to the method section. Nevertheless, we also repeated analysis using sex-specific GIANT, as now given in the methods by making the following change

      We amended the description in the first paragraph of the results section:

      “Initial analysis using sex-specific BMI from GIANT yielded similar estimates as when using sex-specific BMI from the UK Biobank but had fewer SNPs resulting in wider confidence intervals (S Table 1) and fewer significant associations (S Table 1). Analysis using sex-combined GIANT yielded more significant associations but lacks granularity, so we presented the results obtained using sex-specific BMI from the UK Biobank.”

      In the discussion we also made the following changes:

      “Tenth, although this study primarily utilized sex-specific BMI, we also conducted analyses using overall BMI from GIANT including the UK Biobank, which gave a generally similar interpretation (S Table 1). Using sex-specific BMI from the UK Biobank and GIANT may lead to lower statistical power than using overall population BMI but allows for the detection of traits that are affected differently by BMI by sex. Including findings from the overall population BMI from sex-combined GIANT (S Table 1) makes the results more comparable to previous similar studies.”

      (3) The approach to multiple testing correction appears very lenient, although the lack of accuracy in the reporting makes it difficult to know what was done exactly. The way it reads, FDR correction was done separately for men, and then for women (assuming that the duplication in tests following stratification does not affect the number of tests). In the second stage, they compared differences by sex using Z-test, apparently without accounting for multiple testing.

      Thank you, we have accounted for multiple comparisons when considering differences by sex and have made corresponding changes. Specifically, in the methods, we changed:

      “We obtained differences by sex using a z-test (Paternoster et al., 1998), which as recommended was on a linear scale for dichotomous outcomes (Knol et al., 2007; Rothman, 2008), then we identified which ones remained after allowing for false discovery”

      We have made the following changes to the results section:

      “We found significant differences by sex in the associations of BMI with 105 health-related attributes (p-value<0.05); 46 phenotypes remained after allowing for false discovery (Table 1). Of these 46 differences most (35) were in magnitude but not direction, such as for SHBG, ischemic heart disease, heart attack, and facial aging, while 11 were directionally different.

      Notably, BMI was more strongly positively associated with myocardial infarction, major coronary heart disease events, ischemic heart disease, heart attack, and facial aging in men than in women. BMI was more strongly positively associated with diastolic blood pressure, and hypothyroidism/myxoedema in women than men. BMI was more strongly inversely associated with LDL-c, hay fever and allergic rhinitis in men than women. BMI was more strongly inversely associated with SHBG in women than men.

      BMI was inversely associated with ApoB, iron deficiency anemia, hernia, and total testosterone in men, while positively associated with these traits in women (Table 1). BMI was inversely associated with sensitivity/hurt feelings, and ever seeking medical advice for nerves, anxiety, tension, or depression in men. However, BMI was positively associated with sensitivity/hurt feelings and ever seeking medical advice for these same issues in women. BMI was positively associated with muscle or soft tissue injuries and haemorrhage from respiratory passages in men, whilst inversely associated with these traits in women.”

      We have correspondingly amended the discussion to reflect these changes by adding:

      “Whether the difference in ischemic heart disease rates between men and women that emerged in the US and the UK the late 19th century (Nikiforov & Mamaev, 1998) is explained by rising BMI remains to be determined.”

      (4) Presentation lacks accuracy in a few places, hence assessment of the accuracy of the statements made by the authors is difficult.

      Thank you, we have revised the whole manuscript in order to improve clarity.

      (5) Conclusion (Abstract) "These findings highlight the importance of retaining a healthy BMI" is rather uninformative, especially as they claim that for some attributes the effects of BMI may be opposite depending on sex/gender.

      Thank you for your comments. We have changed the conclusion of the abstract, as given below:

      “Our study revealed that BMI might affect a wide range of health-related attributes and also highlights notable sex differences in its impact, including opposite associations for certain attributes, such as ApoB; and stronger effects in men, such as for cardiovascular diseases. Our findings underscore the need for nuanced, sex-specific policy related to BMI to address inequities in health.”.

      We have changed the Impact statement, as given below:

      “BMI may affect a wide range of health-related attributes and there are notable sex differences in its impact, including opposite associations for certain attributes, such as ApoB; and stronger effects in men, such as for cardiovascular diseases. Our findings underscore the need for nuanced, sex-specific policy related to BMI.”

      We have changed the conclusion of the paper, as given below:

      “Our contemporary systematic examination found BMI associated with a broad range of health-related attributes. We also found significant sex differences in many traits, such as for cardiovascular diseases, underscoring the importance of addressing higher BMI in both men and women possibly as means of redressing differences in life expectancy. Ultimately, our study emphasizes the harmful effects of obesity and the importance of nuanced, sex-specific policy related to BMI to address inequities.in health.”

      Reviewer #2 (Public review):

      Summary:

      In this present Mendelian randomization-phenome-wide association study, the authors found BMI to be positively associated with many health-related conditions, such as heart disease, heart failure, and hypertensive heart disease. They also found sex differences in some traits such as cancer, psychological disorders, and ApoB.

      Strengths:

      The use of the UK-biobank study with detailed phenotype and genotype information.

      Thank you for your valuable comments. We are grateful for the time and effort you have devoted to reviewing our manuscript.

      Weaknesses:

      (1) Previous studies have performed this analysis using the same cohort, with in-depth analysis. See this paper: Searching for the causal effects of body mass index in over 300,000 participants in UK Biobank, using Mendelian randomization. https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.10079i51

      Thank you for your valuable comments. We checked the paper carefully. It gives sex-specific estimates when the outcome was assessed in different ways in men and women, for example the question about number of children was asked in terms of live births in women and number of children fathered in men. In addition, for some significant findings, the authors investigated differences by sex. However, the paper did not use sex-specific BMI or sex-specific outcomes systematically. We have added this paper to our introduction and amended the text to explain the novelty of our study compared to previous studies.

      “Previous phenome-wide association studies using MR (MR-PheWASs) have identified impacts of sex-combined BMI on endocrine disorders, circulatory diseases, inflammatory and dermatological conditions, some biomarkers and feelings of nervousness (Hyppönen et al., 2019; Millard et al., 2015; Millard et al. 2019), but did not systematically use sex-specific BMI for the exposure or sex-specific outcomes.”

      (2) I believe that the authors' claim, "To our knowledge, no sex-specific PheWAS has investigated the effects of BMI on health outcomes," is not well supported. They have not cited a relevant paper that conducted both overall and sex-stratified PheWAS using UK Biobank data with a detailed analysis. Given the prior study linked above, I am uncertain about the additional contributions of the present research.

      Thank you for your valuable comments, please accept our apologies for this oversight. As explained above, we have checked very carefully. There are three previous PheWAS for BMI, Hyppönen et al., 2019, Millard et al., 2015 and Millard et al. 2019. Hyppönen et al., 2019 and Millard et al., 2015 are not sex-specific. Millard et al. 2019 used sex-combined instruments, but some sex-specific outcomes, when the questions were asked sex-specifically, such as age at puberty asked as “age when periods started (menarche)” in women and “relative age of first facial hair” and “relative age voice broke” in men. When they found a factor significantly associated with BMI, they sometimes analyze it further including sex-specific analysis, but they did not do the analysis systematically for men and women with sex-specific BMI and sex-specific outcomes. We have amended the introduction to clarify this point.

      “To our knowledge, no sex-specific PheWAS has investigated the effects of BMI on health outcomes (Hyppönen et al., 2019; Millard et al., 2015; Millard et al. 2009). To address this gap, we conducted a sex-specific PheWAS, using the largest available sex-specific GWAS of BMI, to explore the impact of sex-specific BMI on sex-specific health-related attributes”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Presentation, accuracy, and referencing:

      (1) The quality of the English language needs to be checked, including that all sentences carry all components required (including verbs).

      We thank the reviewer for this suggestion. The manuscript has undergone language editing by a native English-speaker, with particular attention to grammatical completeness (including verb consistency and sentence structure). We have also clarified ambiguities and inconsistencies in terms pointed out by the native English speakers. All revisions have been implemented in the updated manuscript.

      (2) The accuracy of statements needs to be checked. For example, in lines 82-83 it is not true that 2015/2019 was 'before the advent of large-scale GWAs studies". In the context of the above in lines 83-85, how can reference be made to a study published in 2020 calling that 'previous' MR studies and how a trial published in 2016 is 'recent'? Please revise, and please also check the manuscript for any other issues with accuracy of this kind.

      We thank the reviewer for this suggestion. We have checked the manuscript and revised these sentences to be clearer, by making the following change.

      “Previous phenome-wide association studies using MR (MR-PheWASs) have identified impacts of sex-combined BMI on endocrine disorders, circulatory diseases, inflammatory and dermatological conditions, some biomarkers and feelings of nervousness (Hyppönen et al., 2019; Millard et al., 2015; Millard et al. 2019), but did not systematically use sex-specific BMI for the exposure or sex-specific outcomes. Previous MR studies and trials of incretins have expanded our knowledge about a broad range of effects of BMI (Larsson et al., 2020; Marso et al., 2016).”

      (3) The adequacy of referencing will need to be checked, e.g. line 136 "as recommended by UK biobank" is vague and needs to be referenced.

      We thank the reviewer for this suggestion. We have added citations.

      “We categorized attributes as age at recruitment, physical measures, lifestyle and environmental, medical conditions, operations, physiological factors, cognitive function, health and medical history, sex-specific factors, blood assays and urine assays, based on the UK Biobank categories (https://biobank.ndph.ox.ac.uk/ukb/cats.cgi).”

      (4) The accurate use of terminology needs to be checked. For example, BMI is a measure of adiposity, while high BMI (typically >30) is used to index obesity.

      We thank you for your comments. We have changed the descriptions into “overweight/obesity” throughout.

      (5) Figure 1, Please check that complete information is given for 'selection criteria' and that the rationale for all information included is clear. For example, it is currently unclear what is the distinction between the bottom two sections which both present a number of features included in the analyses? Also, the Box detailing exclusion of 3585 variables does not give the criteria for these exclusions. Please add.

      Thank you for your comments. We have represented and revised Figure 1. Specifically, we have revised the bottom two sections to give each reason for exclusion and the number excluded for that reason. The updated “Excluded: 3,572 phenotypes, for the reason listed below:” box now contains bullet-points giving each reason for exclusion in the box (e.g. age of certain diseases/disorders onset: 26, alcohol: 56).

      (6) Figure 4, does not look to be of typical publication quality.

      We thank you for your comments. We have used different colors to make it smaller and more readable. Please see Table 1.

      Analyses:

      (1) As it stands, it is very difficult for a reader to confirm the conclusion that similar findings are obtained both when using instruments from the UKB and GIANT based on data presented (Stable 1 and 2). I suggested two things.

      a) Organise stable 1 and 2 by significance and category, with separation by highlighting for those which are significant under correction. I would consider merging these two tables, such that it would be easy for the reader to make the comparisons side by side. Consider presenting separate tables for the analyses for women and men.

      We thank you for your comments. We have followed your helpful advice and merged S Table 1 and S Table 2 into S Table 1. Furthermore, we have also merged S Table 5 to S Table 1.

      b) In Stable 3, please add information from related comparisons using the GIANT instruments. To support the authors' claim that associations are similar, but only the precision of estimation differed, you could consider adding information for numbers of associations for those that are directionally consistent and which have an association at least under nominal significance'. For associations where this does not hold, I would refrain from making a claim that the results are not affected by the choice of instrument (or biases relating to the analysis conducted).

      We thank you for your comments. Among 42 significant sex-specific associations identified in both the UK Biobank and the sex-specific GIANT consortium for men, all showed consistent directions of effect. Similarly, for women, all of the 45 significant associations exhibited consistent directions for UK Biobank compared with GIANT instruments.

      In the sex-specific UK Biobank, there are 203 significant associations in men, and 232 significant associations in women. We have added: in the sex-specific GIANT, there are 46 significant associations in men, and 84 significant associations in women. In the sex-combined GIANT, there are 246 significant associations in men, and 276 significant associations in women. We have provided all this information in S Table 2.

      We added the following descriptions at the end of the results section:

      “Of the 42 significant sex-specific associations identified in both the UK Biobank and the sex-specific GIANT consortium for men, all were directionally consistent. Similarly, for women, all 45 such significant associations were directionally consistent.

      We amended the following descriptions in the first paragraph of the results section:

      “Initial analysis using sex-specific BMI from the GIANT yielded similar estimates as when using sex-specific BMI from the UK Biobank but had fewer SNPs resulting in wider confidence intervals (S Table 1) and fewer significant associations (S Table 2). Analysis using sex-combined GIANT yielded more significant associations but lacks granularity, so we presented the results obtained using sex-specific BMI from the UK Biobank.”

      In the methods, we changed:

      “We obtained differences by sex using a z-test (Paternoster et al., 1998), which as recommended was on a linear scale for dichotomous outcomes (Knol et al., 2007; Rothman, 2008), then we identified which ones remained after allowing for false discovery.”

      We have made the following changes to the results section:

      “We found significant differences by sex in the associations of BMI with 105 health-related attributes (p-value<0.05); 46 phenotypes remained after allowing for false discovery (Table 1). Of these 46 differences most (35) were in magnitude but not direction, such as for SHBG, ischemic heart disease, heart attack, and facial aging, while 11 were directionally different.

      Notably, BMI was more strongly positively associated with myocardial infarction, major coronary heart disease events, ischemic heart disease, heart attack, and facial aging in men than in women. BMI was more strongly positively associated with diastolic blood pressure, and hypothyroidism/myxoedema in women than men. BMI was more strongly inversely associated with LDL-c, hay fever and allergic rhinitis in men than women. BMI was more strongly inversely associated with SHBG in women than men.

      BMI was inversely associated with ApoB, iron deficiency anemia, hernia, and total testosterone in men, while positively associated with these traits in women (Table 1). BMI was inversely associated with sensitivity/hurt feelings, and ever seeking medical advice for nerves, anxiety, tension, or depression in men. However, BMI was positively associated with sensitivity/hurt feelings and ever seeking medical advice for these same issues in women. BMI was positively associated with muscle or soft tissue injuries and haemorrhage from respiratory passages in men, whilst inversely associated with these traits in women.”

      (2) It is not clear what statistical criteria were used to determine sex differences, and the strategy/presentation should be clarified. In lines 229-231, it is implied that the 'significance' in one gender, but not in the other is used to indicate a difference. However, 'comparison of p-values' is not a valid statistical approach, and a more formal test (accounting for multiple testing would be warranted). It may be that a systematic approach has been implemented, but please check that it is adequately and accurately described to the reader.

      Please accept our apologies for being unclear. Multiple comparisons are for independent phenotypes however, here, some phenotypes cannot be independent, therefore, using multiple comparisons in men and women separately is quite strict. We added multiple comparisons for the assessment of sex-differences, which is now given in Table 1. Initially, there were 105 significant associations (p value for sex-difference<0.05) (Table 1), and 46 associations remained after FDR correction (Table 1).  

      Furthermore, we have made additional minor changes to clarify the wording.

      Knol, M. J., van der Tweel, I., Grobbee, D. E., Numans, M. F., & Geerlings, M. I. (2007). Estimating interaction on an additive scale between continuous determinants in a logistic regression model. Int J Epidemiol, 36(5), 1111-1118.

      Nikiforov, S. V., & Mamaev, V. B. (1998). The development of sex differences in cardiovascular disease mortality: a historical perspective. Am J Public Health, 88(9), 1348-1353. https://doi.org/10.2105/ajph.88.9.1348

      Paternoster, R., Brame, R., Mazerolle, P., & Piquero, A. (1998). Using the correct statistical test for the equality of regression coefficients. Criminology, 36(4), 859-866.

      Rothman, K. (2008). Greenland S, Lash TL (ed.). Modern Epidemiology. In: Philadelphia: Lippincott Wolliams & Wilkins.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary:

      The study identifies two types of activation: one that is cue-triggered and nonspecific to motion directions, and another that is specific to the exposed motion directions but occurs in a reversed manner. The finding that activity in the medial temporal lobe (MTL) preceded that in the visual cortex suggests that the visual cortex may serve as a platform for the manifestation of replay events, which potentially enhance visual sequence learning.

      Evaluations:

      Identifying the two types of activation after exposure to a sequence of motion directions is very interesting. The experimental design, procedures and analyses are solid. The findings are interesting and novel.

      In the original submission, it was not immediately clear to me why the second type of activation was suggested to occur spontaneously. The procedural differences in the analyses that distinguished between the two types of activation need to be a little better clarified. However, this concern has been satisfactorily addressed in the revision.

      We thank the reviewer for his/her positive evaluation and thoughtful comments. 

      Reviewer #2 (Public review):

      This paper shows and analyzes an interesting phenomenon. It shows that when people are exposed to sequences of moving dots (That is moving dots in one direction, followed by another direction etc.), that showing either the starting movement direction, or ending movement direction causes a coarsegrained brain response that is similar to that elicited by the complete sequence of 4 directions. However, they show by decoding the sensor responses that this brain activity actually does not carry information about the actual sequence and the motion directions, at least not on the time scale of the initial sequence. They also show a reverse reply on a highly-compressed time scale, which is elicited during the period of elevated activity, and activated by the first and last elements of the sequence, but not others. Additionally, these replays seem to occur during periods of cortical ripples, similar to what is found in animal studies.

      These results are intriguing. They are based on MEG recordings in humans, and finding such replays in humans is novel. Also, this is based on what seems to be sophisticated statistical analysis. The statistical methodology seems valid, but due to its complexity it is not easy to understand. The methods especially those described in figures 3 and 4 should be explained better.  

      We thank the reviewer’s detailed evaluation. As suggested, we have further revised the Methods and Results sections, particularly the descriptions related to Figures 3 and 4, to enhance clarity. Please see the revisions highlighted in red in the revised manuscript.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The most important results here are in Figure 4, and they rely on methods explained in Figure 3. Figure 4 and the results in the figure are confusing.

      What is the red bar in 4B,E. What are the units of the Y axis in figure 4B,E?

      Does sequenceness have units? How do we interpret these magnitudes apart from the line of statistical significance? Shouldn't there be two lines, one for forward replay and the other for backward replay rather than a single line with positive and negative values? The term sequnceness is defined in figure 3, and is key. The replayed sequence in figure 4A,D seems to last about 120 ms.

      What is the meaning of having significance only within a window of 28-36 ms?

      We thank the reviewer’s careful reading and insightful comments. We apologize for the lack of clarity regarding these details in the previous version. As mentioned above, we have revised the Methods and Results sections to enhance clarity throughout the manuscript. For convenience, we provide detailed explanations addressing the specific points raised by the reviewer below.

      First, the red bars in Figures 4B and 4E indicate the lags when the evidence of sequenceness surpassed the statistical significance threshold, as determined by permutation testing. We have now explicitly clarified this in the revised figure captions.

      Second, sequenceness doesn’t have units. It corresponds to the regression coefficient (β) obtained from the second-level GLM in the TDLM framework. Specifically, in the first step of TDLM, we constructed an empirical transition matrix that quantifies the evidence for all possible transitions (e.g., 0° → 90°) at each time lag (Δt). In the second step, we evaluated the extent to which each model transition matrix (e.g., forward or backward transitions) predicts the empirical transition matrix at each Δt, yielding second-level β values. Sequenceness is defined as the difference between the β values for the forward and backward transition models, reflecting the relative strength and directionality of sequential replay. As it is derived from regression coefficients, sequenceness is inherently a unitless measure.

      Regarding the interpretation of sequenceness magnitudes beyond statistical significance, the β values reflect the extent to which the model transition matrix explains variance in the empirical transition matrix. While larger β values suggest stronger sequenceness, absolute magnitudes are influenced by various factors, such as between-participant noise. Therefore, the key criterion for interpreting these values is whether they surpass permutationbased significance thresholds, which indicate that the observed sequenceness is unlikely to have occurred by chance.

      Third, as the reviewer correctly pointed out, we initially computed two separate regression lines, one for forward replay and the other for backward replay. We then defined sequenceness as the contrast between the forward and backward replay (forward minus backward). This contrast approach is commonly used in previous studies to remove between-participant variance in the sequential replay per se, which may arise due to variability in task engagement or measurement sensitivity (Liu et al., 2021; Nour et al., 2021).

      Finally, regarding the duration of replay events, the example sequences shown in Figures 4A and 4D indeed span about 120 ms in total. However, the time lag (Δt) between successive reactivation peaks within these sequences is about 30 ms. This is in line with the findings shown in Figures 4B and 4E, where statistical significance is observed at a time lag window of 28 – 36 ms on the x-axis. It is important to note that the x-axis in these plots represents the time lag (Δt) between sequential reactivations, rather than absolute time.

      We hope these clarifications address the reviewer’s concerns, and we have revised the manuscript accordingly to make these points clearer to readers.

      The methods here are not simple and not simple to explain. The new version is easier to understand. From the new version it seems that the methodology is sound. It should be still clarified and better explained.

      We have carefully revised the manuscript to better explain the methodology. We appreciate the reviewer’s feedback, which is valuable in improving the clarity of our work.

      Now that I understand what they mean by decoding probability, I think that this term is confusing or even misleading. The decoding accuracy is the probability that the direction of motion classification was correct. It seems the so-called decoding probability is value of the logistic regression after normalizing the sum to 1. If this is a standard term it can probably be kept, if not another term would be better.

      Thank you for the reviewer’s comment. We agree that the term decoding probability may initially seem confusing. However, decoding probability is a commonly used term in the neural decoding literature, particularly in human studies (e.g., Liu et al., 2019; Nour et al., 2021; Turner et al., 2023). To maintain consistency with previous work, we have kept this term in the manuscript. We appreciate the opportunity to clarify this point.

      References

      Liu, Y., Dolan, R. J., Higgins, C., Penagos, H., Woolrich, M. W., Ólafsdóttir, H. F., Barry, C., Kurth-Nelson, Z., & Behrens, T. E. (2021). Temporally delayed linear modelling (TDLM) measures replay in both animals and humans. eLife, 10, e66917. https://doi.org/10.7554/eLife.66917

      Liu, Y., Dolan, R. J., Kurth-Nelson, Z., & Behrens, T. E. J. (2019). Human Replay Spontaneously Reorganizes Experience. Cell, 178(3), 640-652.e14. https://doi.org/10.1016/j.cell.2019.06.012

      Nour, M. M., Liu, Y., Arumuham, A., Kurth-Nelson, Z., & Dolan, R. J. (2021). Impaired neural replay of inferred relationships in schizophrenia. Cell, 184(16), 4315-4328.e17. https://doi.org/10.1016/j.cell.2021.06.012

      Turner, W., Blom, T., & Hogendoorn, H. (2023). Visual Information Is Predictively Encoded in Occipital Alpha/Low-Beta Oscillations. Journal of Neuroscience, 43(30), 5537–5545. https://doi.org/10.1523/JNEUROSCI.0135-23.2023

    1. Author response:

      We thank the editors and the reviewers for their valuable comments and for taking the time to evaluate our manuscript.

      Answers to Reviewer 1:

      (1) The core contribution of our method is that it learns meaningful spatiotemporal embeddings directly from image data without requiring pose estimation or eigenworm-based features as input. The learned embedding space can serve as a foundation for downstream tasks such as behavioral classification, clustering, or anomaly detection, further supporting its utility beyond visualization through eigenworm-derived features. Here we use the Tierpsy-derived features for latent space interpretation and for validation that our approach does indeed encode meaningful postural information. Additionally, without any Tierpsy-calculated features users can still color embeddings by known metadata like mutation or age and compare different strains to each other. 

      (2) The numbers shown in Fig. 2.3 are illustrative placeholders intended to conceptually represent a vector of behavioral features. They do not correspond to any specific measurements or carry intrinsic meaning. We agree that this may lead to confusion, and we will clarify this in the revised manuscript.

      (3) The visualizations in Figs. 4 (b) and (c) show the embeddings of sequences of behavior, rather than individual poses. Therefore, motion-related features such as speed are related to temporal patterns in those sequences rather than static postures. The color overlays reflect average motion characteristics (e.g., speed) of short behavior clips projected into the embedding space, rather than being directly linked to any single frame or pose.

      Answers to Reviewer 2:

      (1) In the abstract, our use of the term "unbiased" refers specifically to the avoidance of human-generated bias through feature engineering—i.e., the model does not rely on handcrafted features or predefined pose representations – the representations are based on data only. However, we agree that the model is still subject to dataset biases and will rectify this in the revised manuscript.

      (2) The worm images are rotated to a common vertical orientation to remove orientation as a source of variability in the input. This ensures that the model focuses on learning pose and behavioral dynamics rather than arbitrary head-tail or angular positioning. While data augmentation could in theory account for this variability, we found in our preliminary experiments that applying this preprocessing step led to more stable and interpretable embeddings.

      (3) We agree that simplifying the technical explanations would enhance the manuscript’s accessibility. In the revised version, we will briefly introduce contrastive learning in a less technical language.

      (4) The gray points in Fig. 3a represent frames that Tierpsy could not resolve, primarily due to coiled, self-intersecting, or overlapping worm postures as Tierpsy uses skeletonization to estimate the centerline. This approach can fail if kind of challenging elements are part of the image.

      (5) We appreciate this suggestion and consider it for a revised version of the manuscript.

      (6) Although it may seem intuitive for highly bent (red) poses to lie near coiled (gray) ones in the embedding space, the clustering pattern observed reflects how the network organizes pose information. The red/orange cluster consists of distinguishable bent poses that are visually distinct and consistently separable from other postures. In contrast, the greenish and blueish poses are less strongly bent and may share more visual overlap with the unresolved (gray) images.

      (7) The overlap occurs because some highly bent or coiled worms can still be (partially) resolved by Tierpsy, depending on specific pose conditions (e.g., head and tail not touching, not self-overlapping). However, Tierpsy fails to consistently resolve such frames. We will describe these cases in more detail in the revised manuscript.

      (8) Thank you, we agree this claim needs to be better supported and will develop it in the revision.

      (9) To support this statement we mainly visualized the respective sequences embedded in this area of the embedding space and found that it mostly consists of common behaviors such as forward locomotion. 

      (10) We agree that interpretability is important and plan to include additional figures quantifications of the embedding space using more basic Tierpsy features.

      (11) Fig. 5a is indeed based solely on N2 animals. In the revised manuscript we will include quantitative measures of behavioral variability and its change with age.

      (12) We appreciate this suggestion and consider it for a revised version

      (13) We agree this would be a valuable analysis. However, our current dataset primarily includes aging data for N2 animals. We acknowledge this limitation and consider adding more strains for future work.

      (14) We will include links to our source code in the revised manuscript

      Answers to Reviewer 3:

      (1-2) Our current method is agnostic to head-tail orientation, which indeed restricts the ability to distinguish behaviors that rely on directional cues. We made this design choice as we believe that correctly identifying head/tail orientation can be a challenging task that may introduce additional biases or fail in difficult imaging conditions. However, we fully agree that integrating directional information would improve behavioral resolution, and this is a natural extension of our current framework. In future work, we aim to incorporate head-tail disambiguation.

      (3) We explicitly designed our preprocessing and training pipeline to encourage size invariance, for example by resizing individuals to a consistent scale, as the focus of our work is to encode posture and movement only. However, we acknowledge that absolute size information is lost in this process, which can be informative for distinguishing genotypes or age-related changes.

      (4) We agree that a direct quantitative comparison between our embedding-based representations and skeleton-based feature sets would strengthen the paper. Our current focus was to assess whether meaningful behavioral features could be learned from a skeleton-free representation.

    1. Author response:

      Reviewer 1:

      (1) In general, the representation of target and distractor processing is a bit of a reach. Target processing is represented by SSVEP amplitude, which is most likely going to be related to the contrast of the dots, as opposed to representing coherent motion energy, which is the actual target. These may well be linked (e.g., greater attention to the coherent motion task might increase SSVEP amplitude), but I would call it a limitation of the interpretation. Decoding accuracy of emotional content makes sense as a measure of distractor processing, and the supplementary analysis comparing target SSVEP amplitude to distractor decoding accuracy is duly noted.

      We agree with the reviewer. This is certainly a limitation and will be acknowledged as such in the revised manuscript.

      (2) Comparing SSVEP amplitude to emotional category decoding accuracy feels a bit like comparing apples with oranges. They have different units and scales and probably reflect different neural processes. Is the result the authors find not a little surprising in this context? This relationship does predict performance and is thus intriguing, but I think this methodological aspect needs to be discussed further. For example, is the phase relationship with behaviour a result of a complex interaction between different levels of processing (fundamental contrast vs higher order emotional processing)?

      Traditionally, the SSVEP amplitude at the distractor frequency is used to quantify distractor processing. Given that the target SSVEP amplitude is stronger than that for the distractor, it is possible that the distractor SSVEP amplitude is contaminated by the target SSVEP amplitude due to spectral power leakage; see Figure S4 for a demonstration of this. Because of this issue we therefore introduce the use of decoding accuracy as an index of distractor processing. This has not been done in the SSVEP literature. The lack of correlation between the distractor SSVEP amplitude and the distractor decoding accuracy, although it is kind of like comparing apples with oranges as pointed out by the reviewer, serves the purpose of showing that these two measures are not co-varying, and the use of decoding accuracy is free from the influence of the distractor SSVEP amplitude and thereby free from the influence by the target SSVEP amplitude. This is an important point. We will provide a more thorough discussion of this point in the revised manuscript. 

      Reviewer 2:

      (1) Incomplete Evidence for Rhythmicity at 1 Hz: The central claim of 1 Hz rhythmic sampling is insufficiently validated. The windowing procedure (0.5s windows with 0.25s step) inherently restricts frequency resolution, potentially biasing toward low-frequency components like 1 Hz. Testing different window durations or providing controls would significantly strengthen this claim.

      This is an important point. We plan to follow the reviewer’s suggestion and repeat our analysis using different window sizes to test the robustness of the observed 1Hz rhythmicity. In addition, we plan to also apply the Hilbert transform to extract time-point-by-time-point amplitude envelopes, which will provide a window-free estimation of the distractor strength and further validate the presence of the low-frequency 1Hz dynamics.

      (2) No-Distractor Control Condition: The study lacks a baseline or control condition without distractors. This makes it difficult to determine whether the distractor-related decoding signals or the 1 Hz effect reflect genuine distractor processing or more general task dynamics.

      We agree with the reviewer. This is certainly a limitation and will be acknowledged as such in the revised manuscript.

      (3) Decoding Near Chance Levels: The pairwise decoding accuracies for distractor categories hover close to chance (~55%), raising concerns about robustness. While statistically above chance, the small effect sizes need careful interpretation, particularly when linked to behavior.

      This is a good point. In addition to acknowledging this in the revised manuscript, we will carry out two additional analyses to test this issue further. First, we will implement a random permutation procedure, in which the trial labels are randomly shuffled and the null-hypothesis distribution for decoding accuracy is built, and compare the decoding accuracy from the actual data to this distribution. Second, we will perform a temporal generalization analysis to examine whether the neural representations of the distractor drift over the course of an entire trial, which is 11 seconds long. Recent studies suggest that even when the stimulus stays the same, their neural representations may drift over time.

      (4) No Clear Correlation Between SSVEP and Behavior: Neither target nor distractor signal strength (SSVEP amplitude) correlates with behavioral accuracy. The study instead relies heavily on relative phase, which - while interesting - may benefit from additional converging evidence.

      We felt that what the reviewer pointed out is actually the main point of our study, namely, it is not the overall target or distractor strength that matters for behavior, it is their temporal relationship that matters for behavior. This reveals a novel neuroscience principle that has not been reported in the past. We will stress this point further in the revised manuscript.

      (5) Phase-analysis: phase analysis is performed between different types of signals hindering their interpretability (time-resolved SSVEP amplitude and time-resolved decoding accuracy).

      The time-resolved SSVEP amplitude is used to index the temporal dynamics of target processing whereas the time-resolved decoding accuracy is used to index the temporal dynamics of distractor processing. As such, they can be compared, using relative phase for example, to examine how temporal relations between the two types of processes impact behavior. This said, we do recognize the reviewer’s concern that these two processes are indexed by two different types of signals. We plan to normalize each time course, make them dimensionless, and then compute the temporal relations between them.   

      Appraisal of Aims and Conclusions:

      The authors largely achieved their stated goal of assessing rhythmic sampling of distractors. However, the conclusions drawn - particularly regarding the presence of 1 Hz rhythmicity - rest on analytical choices that should be scrutinized further. While the observed phase-performance relationship is interesting and potentially impactful, the lack of stronger and convergent evidence on the frequency component itself reduces confidence in the broader conclusions.

      Impact and Utility to the Field:

      If validated, the findings will advance our understanding of attentional dynamics and competition in complex visual environments. Demonstrating that ignored distractors can be rhythmically sampled at similar frequencies to targets has implications for models of attention and cognitive control. However, the methodological limitations currently constrain the paper's impact.

      Thanks for these comments and positive assessment of our work’s potential implications and impact. We will try our best in the revision process to address the concerns.

      Additional Context and Considerations:

      (1) The use of EEG-fMRI is mentioned but not leveraged. If BOLD data were collected, even exploratory fMRI analyses (e.g., distractor modulation in visual cortex) could provide valuable converging evidence.

      Indeed, leveraging fMRI data in EEG studies would be very beneficial, as having been demonstrated in our previous work. However, given that this study concerns the temporal relationship between target and distractor processing, it is felt that fMRI, given its well-known limitation in temporal resolution, has limited potential to contribute. We will be exploring this rich dataset in other ways where the two modalities are integrated to gain more insights not possible with either modality used alone.

      (2) In turn, removal of fMRI artifacts might introduce biases or alter the data. For instance, the authors might consider investigating potential fMRI artifact harmonics around 1 Hz to address concerns regarding induced spectral components.

      We have done extensive work in the area of simultaneous EEG-fMRI and have not encountered artifacts with a 1Hz rhythmicity. Also, the fact that the temporal relations between target processing and distractor processing at 1Hz predict behavior is another indication that the 1Hz rhythmicity is a neuroscientific effect not an artifact. However, we will be looking into this carefully and address this in the revision process.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This computational modeling study builds on multiple previous lines of experimental and theoretical research to investigate how a single neuron can solve a nonlinear pattern classification task. The authors construct a detailed biophysical and morphological model of a single striatal medium spiny neuron, and endow excitatory and inhibitory synapses with dynamic synaptic plasticity mechanisms that are sensitive to (1) the presence or absence of a dopamine reward signal, and (2) spatiotemporal coincidence of synaptic activity in single dendritic branches. The latter coincidence is detected by voltage-dependent NMDA-type glutamate receptors, which can generate a type of dendritic spike referred to as a "plateau potential." The proposed mechanisms result in moderate performance on a nonlinear classification task when specific input features are segregated and clustered onto individual branches, but reduced performance when input features are randomly distributed across branches. Given the high level of complexity of all components of the model, it is not clear which features of which components are most important for its performance. There is also room for improvement in the narrative structure of the manuscript and the organization of concepts and data.

      Strengths:

      The integrative aspect of this study is its major strength. It is challenging to relate low-level details such as electrical spine compartmentalization, extrasynaptic neurotransmitter concentrations, dendritic nonlinearities, spatial clustering of correlated inputs, and plasticity of excitatory and inhibitory synapses to high-level computations such as nonlinear feature classification. Due to high simulation costs, it is rare to see highly biophysical and morphological models used for learning studies that require repeated stimulus presentations over the course of a training procedure. The study aspires to prove the principle that experimentally-supported biological mechanisms can explain complex learning.

      Weaknesses:

      The high level of complexity of each component of the model makes it difficult to gain an intuition for which aspects of the model are essential for its performance, or responsible for its poor performance under certain conditions. Stripping down some of the biophysical detail and comparing it to a simpler model may help better understand each component in isolation. That said, the fundamental concepts behind nonlinear feature binding in neurons with compartmentalized dendrites have been explored in previous work, so it is not clear how this study represents a significant conceptual advance. Finally, the presentation of the model, the motivation and justification of each design choice, and the interpretation of each result could be restructured for clarity to be better received by a wider audience.

      Thank you for the feedback! We agree that the complexity of our model can make it challenging to intuitively understand the underlying mechanisms. To address this, we have revised the manuscript to include additional simulations and clearer explanations of the mechanisms at play.

      In the revised introduction, we now explicitly state our primary aim: to assess to what extent a biophysically detailed neuron model can support the theory proposed by Tran-Van-Minh et al. and explore whether such computations can be learned by a single neuron, specifically a projection neuron in the striatum. To achieve this, we focus on several key mechanisms:

      (1) A local learning rule: We develop a learning rule driven by local calcium dynamics in the synapse and by reward signals from the neuromodulator dopamine. This plasticity rule is based on the known synaptic machinery for triggering LTP or LTD in the corticostriatal synapse onto dSPNs (Shen et al., 2008). Importantly, the rule does not rely on supervised learning paradigms and neither is a separate training and testing phase needed.

      (2) Robust dendritic nonlinearities: According to Tran-Van-Minh et al., (2015) sufficient supralinear integration is needed to ensure that e.g. two inputs (i.e. one feature combination in the NFBP, Figure 1A) on the same dendrite generate greater somatic depolarization than if those inputs were distributed across different dendrites. To accomplish this we generate sufficiently robust dendritic plateau potentials using the approach in Trpevski et al., (2023). 

      (3) Metaplasticity: Although not discussed much in more theoretical work, our study demonstrates the necessity of metaplasticity for achieving stable and physiologically realistic synaptic weights. This mechanism ensures that synaptic strengths remain within biologically plausible ranges during training, regardless of initial synaptic weights.

      We have also clarified our design choices and the rationale behind them, as well as restructured the interpretation of our results for greater accessibility. We hope these revisions make our approach and findings more transparent and easier to engage with for a broader audience.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This study extends three previous lines of work:  

      (1) Prior computational/phenomenological work has shown that the presence of dendritic nonlinearities can enable single neurons to perform linearly non-separable tasks like XOR and feature binding (e.g. Tran-Van-Minh et al., Front. Cell. Neurosci., 2015).

      Prior computational and phenomenological work, such as Tran-Van-Minh et al. (Front. Cell. Neurosci., 2015), directly inspired our study, as we now explicitly state in the introduction (page 4, lines 19-22). While Tran-Van-Minh theoretically demonstrated that these principles could solve the NFBP, it remains untested to what extent this can be achieved quantitatively in biophysically detailed neuron models using biologically plausible learning rules - which is what we test here.

      (2) This study and a previous biophysical modeling study (Trpevski et al., Front. Cell. Neurosci., 2023) rely heavily on the finding from Chalifoux & Carter, J. Neurosci., 2011 that blocking glutamate transporters with TBOA increases dendritic calcium signals. The proposed model thus depends on a specific biophysical mechanism for dendritic plateau potential generation, where spatiotemporally clustered inputs must be co-activated on a single branch, and the voltage compartmentalization of the branch and the voltage-dependence of NMDARs is not enough, but additionally glutamate spillover from neighboring synapses must activate extrasynaptic NMDARs. If this specific biophysical implementation of dendritic plateau potentials is essential to the findings in this study, the authors have not made that connection clear. If it is a simple threshold nonlinearity in dendrites that is important for the model, and not the specific underlying biophysical mechanisms, then the study does not appear to provide a conceptual advance over previous studies demonstrating nonlinear feature binding with simpler implementations of dendritic nonlinearities.

      We appreciate the feedback on the hypothesized role of glutamate spillover in our model. While the current manuscript and Trpevski et al. (2023) emphasize glutamate spillover as a plausible biophysical mechanism to provide sufficiently robust and supralinear plateau potentials, we acknowledge, however, that the mechanisms of supralinearity of dendritic integration, might not depend solely on this specific mechanism in other types of neurons. In Trpevski et al (2023) we, however, realized that if we allow too ‘graded’ dendritic plateaus, using the quite shallow Mg-block reported in experiments, it was difficult to solve the NFBP. The conceptual advance of our study lies in demonstrating that sufficiently nonlinear dendritic integration is needed and that this can be accounted for by assuming spillover in SPNs—but regardless of its biophysical source (e.g. NMDA spillover, steeper NMDA Mg block activation curves or other voltage dependent conductances that cause supralinear dendritic integration)—it enables biophysically detailed neurons to solve the nonlinear feature binding problem. To address this point and clarify the generality of our conclusions, we have revised the relevant sections in the manuscript to state this explicitly.

      (3) Prior work has utilized "sliding-threshold," BCM-like plasticity rules to achieve neuronal selectivity and stability in synaptic weights. Other work has shown coordinated excitatory and inhibitory plasticity. The current manuscript combines "metaplasticity" at excitatory synapses with suppression of inhibitory strength onto strongly activated branches. This resembles the lateral inhibition scheme proposed by Olshausen (Christopher J. Rozell, Don H. Johnson, Richard G. Baraniuk, Bruno A. Olshausen; Sparse Coding via Thresholding and Local Competition in Neural Circuits. Neural Comput 2008; 20 (10): 2526-2563. doi: https://doi.org/10.1162/neco.2008.03-07-486). However, the complexity of the biophysical model makes it difficult to evaluate the relative importance of the additional complexity of the learning scheme.

      We initially tried solving the NFBP with only excitatory plasticity, which worked reasonably well, especially if we assume a small population of neurons collaborates under physiological conditions. However, we observed that plateau potentials from distally located inputs were less effective, and we now explain this limitation in the revised manuscript (page 14, lines 23-37).

      To address this, we added inhibitory plasticity inspired by mechanisms discussed in Castillo et al. (2011) , Ravasenga et al., and Chapman et al. (2022) , as now explicitly stated in the text (page 32, lines 23-26). While our GABA plasticity rule is speculative, it demonstrates that distal GABAergic plasticity can enhance nonlinear computations. These results are particularly encouraging, as it shows that implementing these mechanisms at the single-neuron level produces behavior consistent with network-level models like BCM-like plasticity rules and those proposed by Rozell et al. We hope this will inspire further experimental work on inhibitory plasticity mechanisms.

      P2, paragraph 2: Grammar: "multiple dendritic regions, preferentially responsive to different input values or features, are known to form with close dendritic proximity." The meaning is not clear. "Dendritic regions" do not "form with close dendritic proximity."

      Rewritten (current page 2, line 35)

      P5, paragraph 3: Grammar: I think you mean "strengthened synapses" not "synapses strengthened".

      Rewritten (current page 14, line 36)

      P8, paragraph 1: Grammar: "equally often" not "equally much".

      Updated (current page 10, line 2)

      P8, paragraph 2: "This is because of the learning rule that successively slides the LTP NMDA Ca-dependent plasticity kernel over training." It is not clear what is meant by "sliding," either here or in the Methods. Please clarify.

      We have updated the text and removed the word “sliding” throughout the manuscript to clarify that the calcium dependence of the kernels are in fact updated

      P10, Figure 3C (left): After reading the accompanying text on P8, para 2, I am left not understanding what makes the difference between the two groups of synapses that both encode "yellow," on the same dendritic branch (d1) (so both see the same plateau potentials and dopamine) but one potentiates and one depresses. Please clarify.

      Some "yellow" and "banana" synapses are initialized with weak conductances, limiting their ability to learn due to the relatively slow dynamics of the LTP kernel. These weak synapses fail to reach the calcium thresholds necessary for potentiation during a dopamine peak, yet they remain susceptible to depression under LTD conditions. Initially, the dynamics of the LTP kernel does not allow significant potentiation, even in the presence of appropriate signals such as plateau potentials and dopamine (page 10, lines 22–26). We have added a more detailed explanation of how the learning rule operates in the section “Characterization of the Synaptic Plasticity Rule” on page 9 and have clarified the specific reason why the weaker yellow synapses undergo LTD (page 11, lines 1–7).

      As shown in Supplementary Figure 6, during subthreshold learning, the initial conductance is also low, which similarly hinders the synapses' ability to potentiate. However, with sufficient dopamine, the LTP kernel adapts by shifting closer to the observed calcium levels, allowing these synapses to eventually strengthen. This dynamic highlights how the model enables initially weak synapses to "catch up" under consistent activation and favorable dopaminergic conditions.

      P9, paragraph 1: The phrase "the metaplasticity kernel" is introduced here without prior explanation or motivation for including this level of complexity in the model. Please set it up before you use it.

      A sentence introducing metaplasticity has been added to the introduction (page 3, lines 36-42) as well as on page 9, where the kernel is introduced (page 9, lines 26-35)

      P10, Figure 3D: "kernel midline" is not explained.

      We have replotted fig 3 to make it easier to understand what is shown. Also, an explanation of the Kernel midpoint is added to the legend (current page 12, line 19)

      P11, paragraph 1; P13, Fig. 4C: My interpretation of these data is that clustered connectivity with specific branches is essential for the performance of the model. Randomly distributing input features onto branches (allowing all 4 features to innervate single branches) results in poor performance. This is bad, right? The model can't learn unless a specific pre-wiring is assumed. There is not much interpretation provided at this stage of the manuscript, just a flat description of the result. Tell the reader what you think the implications of this are here.

      Thanks for the suggestion - we have updated this section of the manuscript, adding an interpretation of the results that the model often fails to learn both relevant stimuli if all four features are clustered onto the same dendrite (page 13, lines 31-42). 

      In summary, when multiple feature combinations are encoded in the same dendrite with similar conductances, the ability to determine which combination to store depends on the dynamics of the other dendrite. Small variations in conductance, training order, or other stochastic factors can influence the outcome. This challenge, known as the symmetry-breaking problem, has been previously acknowledged in abstract neuron models (Legenstein and Maass, 2011). To address this, additional mechanisms such as branch plasticity—amplifying or attenuating the plateau potential as it propagates from the dendrite to the soma—can be employed (Legenstein and Maass, 2011). 

      P12, paragraph 2; P13, Figure 4E: This result seems suboptimal, that only synapses at a very specific distance from the soma can be used to effectively learn to solve a NFBP. It is not clear to what extent details of the biophysical and morphological model are contributing to this narrow distance-dependence, or whether it matches physiological data.

      We have added Figure 5—figure supplement 1A to clarify why distal synapses may not optimally contribute to learning. This figure illustrates how inhibitory plasticity improves performance by reducing excessive LTD at distal dendrites, thereby enhancing stimulus discrimination. Relevant explanations have been integrated into Page 18, Lines 25-39 in the revised manuscript.

      P14, paragraph 2: Now the authors are assuming that inhibitory synapses are highly tuned to stimulus features. The tuning of inhibitory cells in the hippocampus and cortex is controversial but seems generally weaker than excitatory cells, commensurate with their reduced number relative to excitatory cells. The model has accumulated a lot of assumptions at this point, many without strong experimental support, which again might make more sense when proposing a new theory, but this stitching together of complex mechanisms does not provide a strong intuition for whether the scheme is either biologically plausible or performant for a general class of problem.

      We acknowledge that it is not currently known whether inhibitory synapses in the striatum are tuned to stimulus features. However, given that the striatum is a purely inhibitory structure, it is plausible that lateral inhibition from other projection neurons could be tuned to features, even if feedforward inhibition from interneurons is not. Therefore, we believe this assumption is reasonable in the context of our model. As noted earlier, the GABA plasticity rule in our study is speculative. However, we hope that our work will encourage further experimental investigations, as we demonstrate that if GABAergic inputs are sufficiently specific, they can significantly enhance computations (This is discussed on page 17, lines 8-15.).

      P16, Figure 5E legend: The explanation of the meaning of T_max and T_min in the legend and text needs clarification.

      The abbreviations  T<sub>min</sub> and  T<sub>max</sub> have been updated to CTL and CTH to better reflect their role in calcium threshold tracking. The Figure 5E legend and relevant text have been revised for clarity. Additionally, the Methods section has been reorganized for better readability.

      P16, Figure 5B, C: When the reader reaches this paper, the conundrums presented in Figure 4 are resolved. The "winner-takes-all" inhibitory plasticity both increases the performance when all features are presented to a single branch and increases the range of somatodendritic distances where synapses can effectively be used for stimulus discrimination. The problem, then, is in the narrative. A lot more setup needs to be provided for the question related to whether or not dendritic nonlinearity and synaptic inhibition can be used to perform the NFBP. The authors may consider consolidating the results of Fig. 4 and 5 so that the comparison is made directly, rather than presenting them serially without much foreshadowing.

      In order to facilitate readability, we have updated the following sections of the manuscript to clarify how inhibitory plasticity resolves challenges from Figure 4:

      Figure 5B and Figure 5–figure supplement 1B: Two new panels illustrate the role of inhibitory plasticity in addressing symmetry problems.

      Figure 5–figure supplement 1A: Shows how inhibitory plasticity extends the effective range of somatodendritic distances.

      P18, Figure 6: This should be the most important figure, finally tying in all the previous complexity to show that NFBP can be partially solved with E and I plasticity even when features are distributed randomly across branches without clustering. However, now bringing in the comparison across spillover models is distracting and not necessary. Just show us the same plateau generation model used throughout the paper, with and without inhibition.

      Figure updated. Accumulative spillover and no-spillover conditions have been removed.

      P18, paragraph 2: "In Fig. 6C, we report that a subset of neurons (5 out of 31) successfully solved the NFBP." This study could be significantly strengthened if this phenomenon could (perhaps in parallel) be shown to occur in a simpler model with a simpler plateau generation mechanism. Furthermore, it could be significantly strengthened if the authors could show that, even if features are randomly distributed at initialization, a pruning mechanism could gradually transition the neuron into the state where fewer features are present on each branch, and the performance could approach the results presented in Figure 5 through dynamic connectivity.

      To model structural plasticity is a good suggestion that should be investigated in later work, however, we feel that it goes beyond what we can do in the current manuscript.  We now acknowledge that structural plasticity might play a role. For example we show that if we can assume ‘branch-specific’ spillover, that leads to sufficiently development of local dendritic non-linearities, also one can learn with distributed inputs. In reality, structural plasticity is likely important here, as we now state (current page 22, line 35-42). 

      P17, paragraph 2: "As shown in Fig. 6B, adding the hypothetical nonlinearities to the model increases the performance towards solving part of the NFBP, i.e. learning to respond to one relevant feature combination only. The performance increases with the amount of nonlinearity." This is not shown in Figure 6B.

      Sentence removed. We have added a Figure 6 - figure supplement 1 to better explain the limitations.

      P22, paragraph 1: The "w" parameter here is used to determine whether spatially localized synapses are co-active enough to generate a plateau potential. However, this is the same w learned through synaptic plasticity. Typically LTP and LTD are thought of as changing the number of postsynaptic AMPARs. Does this "w" also change the AMPAR weight in the model? Do the authors envision this as a presynaptic release probability quantity? If so, please state that and provide experimental justification. If not, please justify modifying the activation of postsynaptic NMDARs through plasticity.

      This is an important remark. Our plasticity model differs from classical LTP models as it depends on the link between LTP and increased spillover as described by Henneberger et al., (2020).

      We have updated the method section (page 27, lines 6-11), and we acknowledge, however, that in a real cell, learning might first strengthen the AMPA component, but after learning the ratio of NMDA/AMPA is unchanged ( Watt et al., 2004). This re-balancing between NMDA and AMPA might perhaps be a slower process.

      Reviewer #2 (Public Review):

      Summary:

      The study explores how single striatal projection neurons (SPNs) utilize dendritic nonlinearities to solve complex integration tasks. It introduces a calcium-based synaptic learning rule that incorporates local calcium dynamics and dopaminergic signals, along with metaplasticity to ensure stability for synaptic weights. Results show SPNs can solve the nonlinear feature binding problem and enhance computational efficiency through inhibitory plasticity in dendrites, emphasizing the significant computational potential of individual neurons. In summary, the study provides a more biologically plausible solution to single-neuron learning and gives further mechanical insights into complex computations at the single-neuron level.

      Strengths:

      The paper introduces a novel learning rule for training a single multicompartmental neuron model to perform nonlinear feature binding tasks (NFBP), highlighting two main strengths: the learning rule is local, calcium-based, and requires only sparse reward signals, making it highly biologically plausible, and it applies to detailed neuron models that effectively preserve dendritic nonlinearities, contrasting with many previous studies that use simplified models.

      Weaknesses:

      I am concerned that the manuscript was submitted too hastily, as evidenced by the quality and logic of the writing and the presentation of the figures. These issues may compromise the integrity of the work. I would recommend a substantial revision of the manuscript to improve the clarity of the writing, incorporate more experiments, and better define the goals of the study.

      Thanks for the valuable feedback. We have now gone through the whole manuscript updating the text, and also improved figures and added some supplementary figures to better explain model mechanisms. In particular, we state more clearly our goal already in the introduction.

      Major Points:

      (1) Quality of Scientific Writing: The current draft does not meet the expected standards. Key issues include:

      i. Mathematical and Implementation Details: The manuscript lacks comprehensive mathematical descriptions and implementation details for the plasticity models (LTP/LTD/Meta) and the SPN model. Given the complexity of the biophysically detailed multicompartment model and the associated learning rules, the inclusion of only nine abstract equations (Eq. 1-9) in the Methods section is insufficient. I was surprised to find no supplementary material providing these crucial details. What parameters were used for the SPN model? What are the mathematical specifics for the extra-synaptic NMDA receptors utilized in this study? For instance, Eq. 3 references [Ca2+]-does this refer to calcium ions influenced by extra-synaptic NMDARs, or does it apply to other standard NMDARs? I also suggest the authors provide pseudocodes for the entire learning process to further clarify the learning rules.

      The model is quite detailed but builds on previous work. For this reason, for model components used in earlier published work (and where models are already available via model repositories, such as ModelDB), we refer the reader to these resources in order to improve readability and to highlight what is novel in this paper - the learning rules itself. The learning rule is now explained in detail. For modelers that want to run the model, we have also provided a GitHub link to the simulation code. We hope this is a reasonable compromise to all readers, i.e, those that only want to understand what is new here (learning rule) and those that also want to test the model code. We explain this to the readers at the beginning of the Methods section.

      ii. Figure quality. The authors seem not to carefully typeset the images, resulting in overcrowding and varying font sizes in the figures. Some of the fonts are too small and hard to read. The text in many of the diagrams is confusing. For example, in Panel A of Figure 3, two flattened images are combined, leading to small, distorted font sizes. In Panels C and D of Figure 7, the inconsistent use of terminology such as "kernels" further complicates the clarity of the presentation. I recommend that the authors thoroughly review all figures and accompanying text to ensure they meet the expected standards of clarity and quality.

      Thanks for directing our attention to these oversights. We have gone through the entire manuscript, updating the figures where needed, and we are making sure that the text and the figure descriptions are clear and adequate and use consistent terminology for all quantities.

      iii. Writing clarity. The manuscript often includes excessive and irrelevant details, particularly in the mathematical discussions. On page 24, within the "Metaplasticity" section, the authors introduce the biological background to support the proposed metaplasticity equation (Eq. 5). However, much of this biological detail is hypothesized rather than experimentally verified. For instance, the claim that "a pause in dopamine triggers a shift towards higher calcium concentrations while a peak in dopamine pushes the LTP kernel in the opposite direction" lacks cited experimental evidence. If evidence exists, it should be clearly referenced; otherwise, these assertions should be presented as theoretical hypotheses. Generally, Eq. 5 and related discussions should be described more concisely, with only a loose connection to dopamine effects until more experimental findings are available.

      The “Metaplasticity” section (pages 30-32) has been updated to be more concise, and the abundant references to dopamine have been removed.

      (2) Goals of the Study: The authors need to clearly define the primary objective of their research. Is it to showcase the computational advantages of the local learning rule, or to elucidate biological functions?

      We have explicitly stated our goal in the introduction (page 4, lines 19-22). Please also see the response to reviewer 1.

      i. Computational Advantage: If the intent is to demonstrate computational advantages, the current experimental results appear inadequate. The learning rule introduced in this work can only solve for four features, whereas previous research (e.g., Bicknell and Hausser, 2021) has shown capability with over 100 features. It is crucial for the authors to extend their demonstrations to prove that their learning rule can handle more than just three features. Furthermore, the requirement to fine-tune the midpoint of the synapse function indicates that the rule modifies the "activation function" of the synapses, as opposed to merely adjusting synaptic weights. In machine learning, modifying weights directly is typically more efficient than altering activation functions during learning tasks. This might account for why the current learning rule is restricted to a limited number of tasks. The authors should critically evaluate whether the proposed local learning rule, including meta-plasticity, actually offers any computational advantage. This evaluation is essential to understand the practical implications and effectiveness of the proposed learning rule.

      Thank you for your feedback. To address the concern regarding feature complexity, we extended our simulations to include learning with 9 and 25 features, achieving accuracies of 80% and 75%, respectively (Figure 6—figure supplement 1A). While our results demonstrate effective performance, the absence of external stabilizers—such as error-modulated functions used in prior studies like Bicknell and Hausser (2021)—means that the model's performance can be more sensitive to occasional incorrect outcomes. For instance, while accuracy might reach 90%, a few errors can significantly affect overall performance due to the lack of mechanisms to stabilize learning.

      In order to clarify the setup of the rule, we have added pseudocode in the revised manuscript (Pages 31-32) detailing how the learning rule and metaplasticity update synaptic weights based on calcium and dopamine signals. Additionally, we have included pseudocode for the inhibitory learning rule on Pages 34-35. In future work, we also aim to incorporate biologically plausible mechanisms, such as dopamine desensitization, to enhance stability.

      ii. Biological Significance: If the goal is to interpret biological functions, the authors should dig deeper into the model behaviors to uncover their biological significance. This exploration should aim to link the observed computational features of the model more directly with biological mechanisms and outcomes.

      As now clearly stated in the introduction, the goal of the study is to see whether and to what quantitative extent the theoretical solution of the NFBP proposed in Tran-Van-Minh et al. (2015) can be achieved with biophysically detailed neuron models and with a biologically inspired learning rule. The problem has so far been solved with abstract and phenomenological neuron models (Schiess et al., 2014; Legenstein and Maass, 2011) and also with a detailed neuron model but with a precalculated voltage-dependent learning rule (Bicknell and Häusser, 2021).

      We have also tried to better explain the model mechanisms by adding supplementary figures.

      Reviewer #2 (Recommendations For The Authors):

      Minor:

      (1) The [Ca]NMDA in Figure 2A and 2C can have large values even when very few synapses are activated. Why is that? Is this setting biologically realistic?

      The elevated [Ca²⁺]NMDA with minimal synaptic activation arises from high spine input resistance, small spine volume, and NMDA receptor conductance, which scales calcium influx with synaptic strength. Physiological studies report spine calcium transients typically up to ~1 μM (Franks and Sejnowski 2002, DOI: 10.1002/bies.10193), while our model shows ~7 μM for 0.625 nS and around ~3 μM for 0.5 nS, exceeding this range. The calcium levels of the model might therefore be somewhat high compared to biologically measured levels - however, this does not impact the learning rule, as the functional dynamics of the rule remain robust across calcium variations.

      (2) In the distributed synapses session, the study introduces two new mechanisms "Threshold spillover" and "Accumulative spillover". Both mechanisms are not basic concepts but quantitative descriptions of them are missing.

      Thank you for your feedback. Based on the recommendations from Reviewer 1, we have simplified the paper by removing the "Accumulative spillover" and focusing solely on the "Thresholded spillover" mechanism. In the updated version of the paper, we refer to it only as glutamate spillover. However, we acknowledge (page 22, lines 40-42) that to create sufficient non-linearities, other mechanisms, like structural plasticity, might also be involved (although testing this in the model will have to be postponed to future work).

      (3) The learning rule achieves moderate performance when feature-relevant synapses are organized in pre-designed clusters, but for more general distributed synaptic inputs, the model fails to faithfully solve the simple task (with its performance of ~ 75%). Performance results indicate the learning rule proposed, despite its delicate design, is still inefficient when the spatial distribution of synapses grows complex, which is often the case on biological neurons. Moreover, this inefficiency is not carefully analyzed in this paper (e.g. why the performance drops significantly and the possible computation mechanism underlying it).

      The drop in performance when using distributed inputs (to a mean performance of 80%) is similar to the mean performance in the same situation in Bicknell and Hausser (2021), see their Fig. 3C. The drop in performance is due to that: i) the relevant feature combinations are not often colocalized on the same dendrite so that they can be strengthened together, and ii) even if they are, there may not be enough synapses to trigger the supralinear response from the branch spillover mechanism, i.e. the inputs are not summated in a supralinear way (Fig. 6B, most input configurations only reach 75%).

      Because of this, at most one relevant feature combination can be learned. In the several cases when the random distribution of synapses is favorable for both relevant feature combinations to be learned, the NFBP is solved (Figs. 6B, some performance lines reach 100 % and 6C, example of such a case). We have extended the relevant sections of the paper trying to highlight the above mentioned mechanisms.

      Further, the theoretical results in Tran-Van-Minh et al. 2015 already show that to solve the NFBP with supralinear dendrites requires features to be pre-clustered in order to evoke the supralinear dendritic response, which would activate the soma. The same number of synapses distributed across the dendrites i) would not excite the soma as strongly, and ii) would summate in the soma as in a point neuron, i.e. no supralinear events can be activated, which are necessary to solve the NFBP. Hence, one doesn’t expect distributed synaptic inputs to solve the NFBP with any kind of learning rule. 

      (4) Figure 5B demonstrates that on average adding inhibitory synapses can enhance the learning capabilities to solve the NFBP for different pattern configurations (2, 3, or 4 features), but since the performance for excitatory-only setup varies greatly between different configurations (Figure 4B, using 2 or 3 features can solve while 4 cannot), can the results be more precise about whether adding inhibitory synapses can help improve the learning with 4 features?

      In response to the question, we added a panel to Figure 5B showing that without inhibitory synapses, 5 out of 13 configurations with four features successfully learn, while with inhibitory synapses, this improves to 7 out of 13. Figure 5—figure supplement 1B provides an explanation for this improvement: page 18 line 10-24

      (5) Also, in terms of the possible role of inhibitory plasticity in learning, as only on-site inhibition is studied here, can other types of inhibition be considered, like on-path or off-path? Do they have similar or different effects?

      This is an interesting suggestion for future work. We observed relevant dynamics in Figure 6A, where inhibitory synapses increased their weights on-site when randomly distributed. Previous work by Gidon and Segev (2012) examined the effects of different inhibitory types on NMDA clusters, highlighting the role of on-site and off-path inhibition in shunting. In our context, on-site inhibition in the same branch, appears more relevant for maintaining compartmentalized dendritic processing.

      (6) Figure 6A is mentioned in the context of excitatory-only setup, but it depicts the setup when both excitatory and inhibitory synapses are included, which is discussed later in the paper. A correction should be made to ensure consistency.

      We have updated the figure and the text in order to make it more clear that simulations are run both with and without inhibition in this context (page 21 line 4-13)

      (7) In the "Ca and kernel dynamics" plots (Fig 3,5), some of the kernel midlines (solid line) are overlapped by dots, e.g. the yellow line in Fig 3D, and some kernel midlines look like dots, which leads to confusion. Suggest to separate plots of Ca and kernel dynamics for clarity. 

      The design of the figures has been updated to improve the visibility of the calcium and kernel dynamics during training.

      (8) The formulations of the learning rule are not well-organized, and the naming of parameters is kind of confusing, e.g. T_min, T_max, which by default represent time, means "Ca concentration threshold" here.

      The abbreviations of the thresholds  ( T<sub>min</sub>,  T<sub>max</sub> in the initial version) have been updated to CTL and CTH, respectively, to better reflect their role in tracking calcium levels. The mathematical formulations have further been reorganized for better readability. The revised Methods section now follows a more structured flow, first explaining the learning mechanisms, followed by the equations and their dependencies.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      In this manuscript, the authors use a large dataset of neuroscience publications to elucidate the nature of self-citation within the neuroscience literature. The authors initially present descriptive measures of self-citation across time and author characteristics; they then produce an inclusive model to tease apart the potential role of various article and author features in shaping self-citation behavior. This is a valuable area of study, and the authors approach it with a rich dataset and solid methodology.

      The revisions made by the authors in this version have greatly improved the validity and clarity of the statistical techniques, and as a result the paper's findings are more convincing.

      This paper's primary strengths are: 1) its comprehensive dataset that allows for a snapshot of the dynamics of several related fields; 2) its thorough exploration of how self-citation behavior relates to characteristics of research and researchers.

      Thank you for your positive view of our paper and for your previous comments.

      Its primary weakness is that the study stops short of digging into potential mechanisms in areas where it is potentially feasible to do so - for example, studying international dynamics by identifying and studying researchers who move between countries, or quantifying more or less 'appropriate' self-citations via measures of abstract text similarity.

      We agree that these are limitations of the existing study. We updated the limitations section as follows (page 15, line 539):

      “Similarly, this study falls short in several potential mechanistic insights, such as by investigating citation appropriateness via text similarity or international dynamics in authors who move between countries.”

      Yet while these types of questions were not determined to be in scope for this paper, the study is quite effective at laying the important groundwork for further study of mechanisms and motivations, and will be a highly valuable resource for both scientists within the field and those studying it.

      Reviewer #2 (Public review):

      The study presents valuable findings on self-citation rates in the field of Neuroscience, shedding light on potential strategic manipulation of citation metrics by first authors, regional variations in citation practices across continents, gender differences in early-career self-citation rates, and the influence of research specialization on self-citation rates in different subfields of Neuroscience. While some of the evidence supporting the claims of the authors is solid, some of the analysis seems incomplete and would benefit from more rigorous approaches.

      Thank you for your comments. We have addressed your suggestions presented in the “Recommendations for the authors” section by performing your recommended sensitivity analysis that specifically identifies authors who could be considered neurologists, neuroscientists, and psychiatrists (as opposed to just papers that are published in these fields). Please see the “Recommendations for the authors” section for more details.

      Reviewer #3 (Public review):

      This paper analyses self-citation rates in the field of Neuroscience, comprising in this case, Neurology, Neuroscience and Psychiatry. Based on data from Scopus, the authors identify self-citations, that is, whether references from a paper by some authors cite work that is written by one of the same authors. They separately analyse this in terms of first-author self-citations and last-author self-citations. The analysis is well-executed and the analysis and results are written down clearly. The interpretation of some of the results might prove more challenging. That is, it is not always clear what is being estimated.

      This issue of interpretability was already raised in my review of the previous revision, where I argued that the authors should take a more explicit causal framework. The authors have now revised some of the language in this revision, in order to downplay causal language. Although this is perfectly fine, this misses the broader point, namely that it is not clear what is being estimated. Perhaps it is best to refer to Lundberg et al. (2021) and ask the authors to clarify "What is your Estimand?" In my view, the theoretical estimands the authors are interested in are causal in nature. Perhaps the authors would argue that their estimands are descriptive. In either case, it would be good if the authors could clarify that theoretical estimand.

      Thank you for your comment and for highlighting this insightful paper. After reading this paper, we believe that our theoretical estimand is descriptive in nature. For example, in the abstract of our paper, we state: “This work characterizes self-citation rates in basic, translational, and clinical Neuroscience literature by collating 100,347 articles from 63 journals between the years 2000-2020.” This goal seems consistent with the idea of a descriptive estimand, as we are not interested in any particular intervention or counterfactual at this stage. Instead, we seek to provide a broad characterization of subgroup differences in self-citations such that future work can ask more focused questions with causal estimands.

      Our analysis included subgroup means and generalized additive models, both of which were described as empirical estimands for a theoretical descriptive estimand in Lundberg et al. We added the following text to the paper (page 3, line 112):

      “Throughout this work, we characterized self-citation rates with descriptive, not causal, analyses. Our analyses included several theoretical estimands that are descriptive 17, such as the mean self-citation rates among published articles as a function of field, year, seniority, country, and gender. We adopted two forms of empirical estimands. First, we showed subgroup means in self-citation rates. We then developed smooth curves with generalized additive models (GAMs) to describe trends in self-citation rates across several variables.”

      In addition, we added to the limitations section as follows (page 15, line 539):

      “Yet, this study may lay the groundwork for future works to explore causal estimands.”

      Finally, in my previous review, I raised the issue of when self-citations become "problematic". The authors have addressed this issue satisfactorily, I believe, and now formulate their conclusions more carefully.

      Thank you for your previous comments. We agree that they improved the paper.

      Lundberg, I., Johnson, R., & Stewart, B. M. (2021). What Is Your Estimand? Defining the Target Quantity Connects Statistical Evidence to Theory. American Sociological Review, 86(3), 532-565. https://doi.org/10.1177/00031224211004187

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Thank you for your thorough revisions and responses to the reviews

      Reviewer #2 (Recommendations for the authors):

      I appreciate the authors' responses and am satisfied with all their replies except for my second comment. I still find the message conveyed slightly misleading, as the results seem to be generalized to neurologists, neuroscientists, and psychiatrists. It is important to refine the analysis to focus specifically on neuroscientists, identified as first or last authors based on their publication history. This approach is common in the science of science literature and would provide a more accurate representation of the findings specific to neuroscientists, avoiding the conflation with other related fields. This refinement could serve as a robustness check in the supplementary. I think adding this sub-analysis is essential to the validity of the results claimed in this paper.

      Thank you for your comment. We added a sensitivity analysis where fields are defined by an author’s publication history, not by the journal of each article.

      In the main text, we added the following:

      (Page 3, line 129) “When determining fields by each author’s publication history instead of the journal of each article, we observed similar rates of self-citation (Table S7). The 95% confidence intervals for each field definition overlapped in most cases, except for Last Author self-citation rates in Neuroscience (7.54% defined by journal vs. 8.32% defined by author) and Psychiatry (8.41% defined by journal vs. 7.92% defined by author).”

      Further details are provided in the methods section (page 21, line 801):

      “4.11 Journal-based vs. author-based field sensitivity analyses

      We refined our field-based analysis to focus only on authors who could be considered neuroscientists, neurologists, and psychiatrists. For each author, we looked at the number of articles they had in each subfield, as defined by Scopus. We considered 12 subfields that fell within Neurology, Neuroscience, and Psychiatry. These subfields are presented in Table S12. For each First Author and Last Author, we excluded them if any of their three most frequently published subfields did not include one of the 12 subfields of interest. If an author’s top three subfields included multiple broader fields (e.g., both Neuroscience and Psychiatry), then that author was categorized according to the field in which they published the most articles. Among First Authors, there were 86,220 remaining papers, split between 33,054 (38.33%) in Neurology, 23,216 (26.93%) in Neuroscience, and 29,950 (34.73%) in Psychiatry. Among Last Authors, there were 85,954 remaining papers, split between 31,793 (36.98%) in Neurology, 25,438 (29.59%) in Neuroscience, and 28,723 (33.42%) in Psychiatry.”

      Reviewer #3 (Recommendations for the authors):

      I would like to thank the authors for their responses the points that I raised, I do not have any new comments or further responses.

    1. Author response:

      We appreciate that the reviewers recognize the conceptual novelty of our work and find our work interesting.

      Reviewer #1:

      We thank Reviewer #1 for making us aware that the image presentation of some of what we see as very clear phenotypes in our work might not have been optimal in the reviewed pdf file, presumably due to the relatively low resolution and lack of appropriately magnified images in the merged pdf file. This issue– if not caught and corrected now– might have caused future readers to similarly not appreciate these clear phenotypes. We will carefully revise the figures and ensure maintenance of appropriate pdf resolution in the merged file so that image presentation is optimal and our findings are appropriately represented.

      We appreciate that Reviewer #1 carefully and critically assessed the growth cone transcriptomic data. We agree that future additional validation is warranted, and this will be clearly stated in our revised paper. Because we judge that these data – even in their current form – will be of potential interest to other investigators sooner rather than later, we respectfully offer and request that we should share them in this paper as our attempt so far to identify elements of the relevant growth cone biology, rather than waiting for years before completing additional validation.

      Even upon repeated reflection, we judge and respectfully submit that our CRISPR in utero electroporation experiments are, indeed, conducted with appropriate controls. We thought through the potential controls deeply prior to completing these complex experiments. We will describe our reasoning in detail in our point-by-point response.

      Reviewer #2:

      We thank Reviewer #2 for encouraging us to elaborate on the direction and cross- repressive interplay between Bcl11a and Bcl11b, which we previously identified (Woodworth*, Greig* et al., Cell Rep, 2016). We omitted deep discussion because we had already published this result, cited that work, and did not want to seem overly self- referential, as well as for reasons of length. Though we know and have reported that Bcl11a and Bcl11b are cross-repressive in SCPN development, we currently do not know whether increased Bcl11a expression in Bcl11b-null SCPN contributes to reduced Cdh13 expression. Also, we do not know if there is a similar Bcl11a-Bcl11b cross repression in striatal medium spiny neurons. This will be clarified in our revised paper.

      We agree fully with the reviewer that “the common practice of picking from a list of differentially expressed genes the most likely ones” has been useful for and has substantially contributed to the elucidation of molecular mechanisms in many systems, including in CNS development. Indeed, the current paper identifies Cdh13 as a newly recognized functional molecule in SCPN axon development by in part using this approach. Cdh13 belongs to a well-known gene family, and its expression by SCPN was already reported by us (Arlotta*, Molyneauz* et al., Neuron, 2005). Despite these two facts, we newly identify its function in SCPN development, which has never been investigated or reported. We appreciate the reviewer encouraging us to elaborate on this here.

      Recent technical advancement allows functional screening of a larger list of genes in vivo (Jin et al., Science, 2020; Ramani et al., bioRxiv, 2024; Zheng et al., Cell, 2024). That said, it is still a challenge to specifically access SCPN in vivo and apply such a high-throughput screening assay for axon development. We agree and predict that future work of this type might likely lead to identification of other new and unknown molecular regulators. We respectfully submit that our work reported here will provide useful foundation for many such future studies.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript reports that expression of the E. coli operon topAI/yjhQ/yjhP is controlled by the translation status of a small open reading frame, that authors have discovered and named toiL, located in the leader region of the operon. The authors propose the following model for topAI activation: Under normal conditions, toiL is translated but topAI is not expressed because of Rho-dependent transcription termination within the topAI ORF and because its ribosome binding site and start codon are trapped in an mRNA hairpin. Ribosome stalling at various codons of the toiL ORF, caused by the presence of some ribosome-targeting antibiotics, triggers an mRNA conformational switch which allows translation of topAI and, in addition, activation of the operon's transcription because the presence of translating ribosomes at the topAI ORF blocks Rho from terminating transcription. Even though the model is appealing and several of the experimental data support some aspects of it, several inconsistencies remain to be solved. In addition, even though TopAI was shown to be an inhibitor of topoisomerase I (Yamaguchi & Inouye, 2015, NAR 43:10387), the authors suggest, without offering any experimental support, that, because ribosome-targeting antibiotics act as inducers, expression of the topAI/yjhQ/yjhP operon may confer resistance to these drugs.

      Strengths:

      - There is good experimental support of the transcriptional repression/activation switch aspect of the model, derived from well-designed transcriptional reporters and ChIP-qPCR approaches.

      - There is a clever use of the topAI-lacZ reporter to find the 23S rRNA mutants where expression topAI was upregulated. This eventually led the authors to identify that translation events occurring at toiL are important to regulate the topAI/yjhQ/yjhP operon. Is there any published evidence that ribosomes with the identified mutations translate slowly (decreased fidelity does not necessarily mean slow translation, does it?)?

      G2253 is in helix 80 of the 23S rRNA, which has been proposed to be involved in correct positioning of the tRNA. Mutations in helix 80 have been reported to cause defects in peptidyl transferase center activity, which could reduce the rate of ribosome movement along the mRNA. If ribosomes are sufficiently slowed when translating toiL, this could induce expression of topAI. G1911 and Ψ1917 are in helix 69 of the 23S rRNA, which is involved in forming the inter-subunit bridge, as well as interactions with release factors. Mutations in helix 69 cause a decrease in the processivity of translation, suggesting that the mutations we identified may increase the occupancy of ribosomes within toiL, thereby inducing expression of topAI. We have added text to the Discussion section to include this speculation.

      - Authors incorporate relevant links to the antibiotic-mediated expression regulation of bacterial resistance genes. Authors can also mention the tryptophan-mediated ribosome stalling at the tnaC leader ORF that activates the expression of tryptophan metabolism genes through blockage of Rho-mediated transcriptional attenuation.

      We have added a citation to a recent structural study of ribosomes translating the tnaC uORF. Specifically, we speculate in the Discussion that toiL may have evolved to sense a ribosome-targeting antibiotic, or another ribosome-targeting small molecule such as an amino acid.

      Weaknesses:

      The main weaknesses of the work are related to several experimental results that are not consistent with the model, or related to a lack of data that needs to be included to support the model.

      The following are a few examples:

      - It is surprising that authors do not mention that several published Ribo-seq data from E. coli cells show active translation of toiL (for example Li et al., 2014, Cell 157: 624). Therefore, it is hard to reconcile with the model that starts codon/Shine-Dalgarno mutations in the toiL-lux reporter have no effect on luciferase expression (Figure 2C, bar graphs of the no antibiotic control samples).

      These data are for a topAI-lux reporter construct rather than toiL-lux. In our model, ribosome stalling within toiL is required to induce expression of the downstream genes; preventing translation of toiL by mutating the start codon or Shine-Dalgarno sequence would not cause ribosome stalling, consistent with the lack of an effect on topAI expression.

      - The SHAPE reactivity data shown in Figure 5A are not consistent with the toiL ORF being translated. In addition, it is difficult to visualize the effect of tetracycline on mRNA conformation with the representation used in Figure 5B. It would be better to show SHAPE reactivity without/with Tet (as shown in panel A of the figure).

      We have modified this figure (now Figure 6) so that we no longer show the SHAPE-seq data +/- tetracycline overlayed on the predicted RNA structure, since at best, the predicted structure likely only represents uninduced state. We have included the predicted structure together with the SHAPE-seq data for untreated cells as a separate panel because it is part of the basis for our model. We have also added a supplementary figure showing a similar RNA structure prediction based on conservation of the topAI upstream region across species (Figure 6 – figure supplement 1), and we describe this in the text.

      - The "increased coverage" of topAI/yjhP/yjhQ in the presence of tetracycline from the Ribo-seq data shown in Figure 6A can be due to activation of translation, transcription, or both. For readers to know which of these possibilities apply, authors need to provide RNA-seq data and show the profiles of the topAI/yjhQ/yjhP genes in control/Tet-treated cells.

      A previous study (Li et al., 2014, PMID 24766808) compared RNA-seq and Ribo-seq data for E. coli to measure normalized ribosome occupancy for each gene. However, sequence coverage for topAI was too low to confidently quantify either the RNA-seq or the Ribo-seq data. Presumably RNA levels were low because of Rho termination. Hence, we were not confident that RNA-seq would provide information on the regulation of topAI-yjhQP. Other data in our study provide strong evidence that regulation is primarily at the level of translation. And the key conclusion from Figure 6 (now Figure 7) is that tetracycline stalls ribosomes on start codons.

      - Similarly, to support the data of increased ribosomal footprints at the toiL start codon in the presence of Tet (Figure 6B), authors should show the profile of the toiL gene from control and Tet-treated cells.

      Figure 6B shows data for both treated and untreated cells. The overall ribosome occupancy is much lower for untreated cells, making it difficult to draw strong conclusions about the relative distribution of ribosomes across toiL.

      - Representation of the mRNA structures in the model shown in Figure 5, does not help with visualizing 1) how ribosomes translate toiL since the ORF is trapped in double-stranded mRNA, and 2) how ribosome stalling on toiL would lead to the release of the initiation region of topAI to achieve expression activation.

      We now show the predicted structure with only SHAPE-seq data for untreated cells. The comparison of SHAPE-seq +/- tetracycline is shown without reference to the predicted structure.

      - The authors speculate that, because ribosome-targeting antibiotics act as expression inducers [by the way, authors should mention and comment that, more than a decade ago, it had been reported that kanamycin (PMID: 12736533) and gentamycin (PMID: 19013277) are inducers of topAI and yjhQ], the genes of the topAI/yjhQ/yjhP operon may confer resistance to these antibiotics. Such a suggestion can be experimentally checked by simply testing whether strains lacking these genes have increased sensitivity to the antibiotic inducers.

      We thank the reviewer for pointing out these references, which we now cite. The fact that another group found that gentamycin induces topAI expression – it is one of the most highly induced genes in that paper – strongly suggests that we missed the key inducing concentrations for one or more antibiotics, meaning that topAI is induced by even more ribosome-targeting antibiotics than we realized.

      We did some preliminary experiments to look for effects of TopAI, YjhQ, and/or YjhP on antibiotic sensitivity, but generated only negative results. Since these experiments were preliminary and far from exhaustive, we have chosen not to include them in the manuscript. Other studies of genes regulated by ribosome stalling in a uORF have looked at genes whose functions in responding to translation stress were already known, so the environmental triggers were more obvious. With so many possible triggers for topAI-yjhQP, it will likely require considerable effort to find the relevant trigger(s). Hence, we consider this an important question, but beyond the scope of this manuscript.

      Reviewer #2 (Public Review):

      Summary:

      In this important study, Baniulyte and Wade describe how the translation of an 8-codon uORF denoted toiL upstream of the topAI-yjhQP operon is responsive to different ribosome-targeting antibiotics, consequently controlling translation of the TopAI toxin as well as Rho-dependent termination with the gene.

      Strengths:

      I appreciate that the authors used multiple different approaches such as a genetic screen to identify factors such as 23S rRNA mutations that affect topA1 expression and ribosome profiling to examine the consequences of various antibiotics on toiL-mediated regulation. The results are convincing and clearly described.

      Weaknesses:

      I have relatively minor suggestions for improving the manuscript. These mainly relate to the figures.

      Reviewer #3 (Public Review):

      Summary:

      The authors nicely show that the translation and ribosome stalling within the ToiL uORF upstream of the co-transcribed topAI-yjhQ toxin-antitoxin genes unmask the topAI translational initiation site, thereby allowing ribosome loading and preventing premature Rho-dependent transcription termination in the topAI region. Although similar translational/transcriptional attenuation has been reported in other systems, the base pairing between the leader sequence and the repressed region by the long RNA looping is somehow unique in toiL-topAI-yjhQP. The experiments are solidly executed, and the manuscript is clear in most parts with areas that could be improved or better explained. The real impact of such a study is not easy to appreciate due to a lack of investigation on the physiological consequences of topAI-yjhQP activation upon antibiotic exposure (see details below).

      Strengths:

      Conclusion/model is supported by the integrated approaches consisting of genetics, in vivo SHAPE-seq and Ribo-Seq.

      Provide an elegant example of cis-acting regulatory peptides to a growing list of functional small proteins in bacterial proteomes.

      Recommendations for the authors:

      Reviewing Editor Comments:

      (1) Examine the consequences of mutations impeding translation of the topAI/yjhQ/yjhP operon on cell growth in the presence and absence of antibiotics.

      See response to Reviewer 1’s comment.

      (2) Resolve discrepancies between the SHAPE data indicating constitutive sequestration of the toiL Shine Dalgarno sequence with antibiotic-regulated translation of the toiL ORF.

      See response to Reviewer 1’s comment.

      (3) Reconcile published Ribo-Seq data with the model that start codon/Shine-Dalgarno mutations in the toiL-lux reporter have no effect on luciferase expression in the absence of antibiotics.

      See response to Reviewer 1’s comment.

      (4) Clarify whether antibiotic MIC values were employed to select antibiotic concentrations for different experiments.

      The antibiotic concentrations we used are in line with reported MICs for E. coli. We now list the reported ECOFFs/MICs and include relevant citations.

      (5) Provide RNA-seq data to complement the Ribo-Seq data for the topAI/yjhQ/yjhP genes in control vs. Tet-treated cells.

      See response to Reviewer 1’s comment.

      (6) Revise the text to address as many of the reviewers' suggestions as reasonably possible.

      Changes to the text have been made as indicated in the responses to the reviewers’ comments.

      Reviewer #2 (Recommendations for the Authors):

      (1) Page 6: I would have liked to have more information about the 39 suppressor mutations in rho. Do any of the cis-acting mutations give support for the model proposed in Figure 8?

      We only know the specific mutation for some of the strains, and we now list those mutations in the Methods section. For other mutants, we mapped the mutation to either the rho gene or to Rho activity, but we did not sequence the rho gene. Most of the specific mutations we did identify fall within the primary RNA-binding site of Rho and hence should be considered partial-loss-of-function mutations (complete loss of function would be lethal).

      We identified cis-acting mutations by re-transforming the lacZ reporter plasmid into a wild-type strain. We did not sequence any of these plasmids.

      (2) Page 12-13, Section entitled "Mapping ribosome stalling sites induced by different antibiotics": This section should start with a better transition regarding the logic of why the experiments were carried out and should end with an interpretation of the results.

      We have added a few sentences at the start of this section to explain the rationale. We have also added two sentences at the end of this section to summarize the interpretation of the data.

      (3) Page 15: The authors should discuss under what conditions the expression of TopAI (and YjhQ/YjhP might be induced? Is expression also elevated upon amino acid starvation?

      We have looked through public RNA-seq data but have not identified growth conditions other than antibiotic treatment that induce expression of topAI, yjhQ or yjhP.

      (4) References: The authors should be consistent about capitalization, italics, and abbreviations in the references.

      These formatting errors will be fixed in the proofing stage.

      (5) All graph figures: There should be more uniformity in the sizes of individual data points (some are almost impossible to see) and error bars across the figures.

      We have tried to make the data points and error bars more visible for figures where they were smaller.

      (6) Figure 1B: I do not think the left arrow labeling is very intuitive and suggest renaming these constructs.

      We have removed the arrows to improve clarity.

      (7) Figure 2A: toiL should be introduced at the first mention of Figure 2A.

      We have added a schematic of the topAI-yjhQ-yjhP region as Figure 1A, including the toiL ORF, which we briefly mention in the text. We have opted to split Figure 2C into two panels. In Figure 2C we now only show data for the wild-type construct. Data for the mutant constructs are now shown in a new figure (Figure 5), alongside data for the wild-type constructs. We have simplified Figure 2A, since the mutations are not relevant to this revised figure, and we now show the schematic with the mutations as Figure 5A.

      (8) Figure 3C and 3D: I suggest giving these graphs headings (or changing the color of the bars in Figure 3D) to make it more obvious that different things are measured in the two panels.

      We have added headers to panels B-D make it clear that which graphs show ChIP-qPCR data which graph shows qRT-PCR data.

      (9) Figure 6: It might be nice to show the topAI-yjhPQ operon here.

      We now show the operon in Figure 1A.

      (10) Figure 8: This figure could be optimized by adding 5' and 3' end labels and having more similarity with the model in Figure 7.

      The constructs shown in Figure 7 lack most of the topAI upstream region, so they aren’t readily comparable to the schematic in Figure 8. However, we have changed the color of the ribosome in Figure 7 to match that in Figure 8. We also indicate the 5’ end of the RNA in Figure 8.

      Reviewer #3 (Recommendations for the Authors):

      Areas to improve:

      (1) While it's important to learn about ToiL-dependent regulation of the downstream topAI-yjhQ toxin-antitoxin genes, the physiological consequence of topAI-yjhQ activation seems to be lost in the manuscript. Everything was done with a reporter lacZ/lux. In the absence of toiL translation (i.e. SD mutant) and/or ribosome stalling, does premature transcription termination result in non-stochiometric synthesis of toxin vs. antitoxin, leading to growth arrest or other measurable phenotype? Knowing the impact of ToiL in the native topAI-yjhQ context will be valuable.

      See response to Reviewer 1’s comment.

      (2) It was indicated in Figure 4-figure supplement 1 that toiL homologs are found in many other proteobacteria, are the UR sequences in those species also form a similar inhibitory RNA loop?? The nt sequence identity of toiL is likely to be constrained by the base pairing of the topAI 5' region.

      We have added a supplementary figure panel showing an RNA structure prediction for the topAI upstream region based on sequence alignment of homologous regions from other species (Figure 6 – figure supplement 1).

      What is the frequency of the MLENVII hepta-peptide in the E. coli genome-wide. Is the sequence disfavored to avoid spurious multi-antibiotic sensing?

      LENVII is not found in any annotated E. coli K-12 protein. However, this is a sufficiently long sequence that we would expect few to no instances in the E. coli proteome.

      (3) Figure 1A, it would be helpful to indicate the location of the toiL (red arrow as in Figure 2A) relative to the putative rut site early in the beginning of the results. Does TSS mark the transcription start site? There is no annotation of TSS in the figure legend. Was TSS previously mapped experimentally? Please include relevant citations.

      We now indicate the position of the TSS relative to the topAI start codon. Similarly, we indicate the position of the start of toiL relative to the topAI start codon in Figure 2A. We now explain “TSS” in the figure legend. There is a reference in the text for the TSS (Thomason et al., 2015).

      (4) Please consider rearranging the results section, perhaps more helpful to introduce the toiL in Figure 1 or earlier. The current format requires readers to switch back-and-forth between Figure 4 and Figure 2.

      We have added a schematic of the topAI upstream region as Figure 1A, and we have separated Figure 2C as described in a response to a comment from Reviewer 2.

      (5) Figure 2A and Figure 2-Figure Suppl 1A, for clarity, please mark the rut site upstream of the red arrow.

      Rather than mark the rut on Figure 2A, which would make for a busy schematic, readers can compare the positions of the rut to those of toiL, which we have now added to Figures 1B (formerly Figure 1A) and 2A.

      (6) The following conclusion seems speculative: "...but does not trigger termination until RNAP ..., >180 nt further downstream…". Shouldn't the authors already know where the termination site is based on their previous Term-seq data (see Ref 1, Adams PP et al 2021)?

      Sites of Rho-dependent transcription termination cannot be mapped precisely from Term-seq data because exoribonucleases rapidly process the unstructured RNA 3’ ends.

      (7) Genetic screen: Please discuss why the 23S rRNA mutations that cause translational infidelity could promote topAI translation. Wouldn't the mutant ribosome be affected in translating toiL?

      See response to Reviewer 1’s comment.

      (8) Although antibiotic concentrations were provided in Figure 2 legend, please provide the MIC values of each antibiotic, e.g., in Table S2, for the tested E. coli strain, to inform readers how specific subinhibitory concentrations were chosen.

      See response to Reviewing Editor.

      (9) Please clarify the calculation of luciferase units in the y-axis of Figure 2A, why the scale is drastically higher than that of Figure 7C using the same antibiotics?

      These reporter assays use different constructs. The reporter construct used for experiments in Figure 7 includes a portion of the ermCL gene and associated downstream sequence. We have enlarged Figure 7A to highlight the difference in reporter constructs.

      (10) Table S4 needs a few more details. It is unclear how those numbers in columns G-H were generated. Do those numbers correspond to ribosome density per nt/ORF?

      We have added footnotes to Table S4 to indicate that the numbers in columns G and H represent sequence read coverage normalized by region length and by the upper quartile of gene expression.

      (11) Figure 5, if the SHAPE results were true, the Shine Dalgarno sequence of toiL is sequestered in the hairpin structure with and without tetracycline treatment. It is inconceivable that translational initiation will occur efficiently, please discuss.

      Our representation of the SHAPE-seq data was confusing since we overlayed the SHAPE-seq changes on a predicted structure that likely corresponds to the uninduced state. We hope that the new version of Figure 5 is clearer.

      We presume the reviewer is referring to the Shine-Dalgarno sequence of topAI rather than toiL, since the Shine-Dalgarno sequence of toiL is predicted to be unstructured even in the absence of tetracycline treatment. The ribosome-binding site of topAI is more accessible in cells treated with tetracycline, although the SHAPE-seq data suggest that this is a transient event. The binding of the initiating ribosome may also reduce reactivity in this region under inducing conditions. We now discuss this briefly in the text.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews: 

      Reviewer #1 (Public review): 

      The manuscript consists of two separate but interlinked investigations: genomic epidemiology and virulence assessment of Salmonella Dublin. ST10 dominates the epidemiological landscape of S. Dublin, while ST74 was uncommonly isolated. Detailed genomic epidemiology of ST10 unfolded the evolutionary history of this common genotype, highlighting clonal expansions linked to each distinct geography. Notably, North American ST10 was associated with more antimicrobial resistance compared to others. The authors also performed long read sequencing on a subset of isolates (ST10 and ST74), and uncovered a novel recombinant virulence plasmid in ST10 (IncX1/IncFII/IncN). Separately, the authors performed cell invasion and cytotoxicity assays on the two S. Dublin genotypes, showing differential responses between the two STs. ST74 replicates better intracellularly in macrophage compared to ST10, but both STs induced comparable cytotoxicity levels. Comparative genomic analyses between the two genotypes showed certain genetic content unique to each genotype, but no further analyses were conducted to investigate which genetic factors likely associated with the observed differences. The study provides a comprehensive and novel understanding on the evolution and adaptation of two S. Dublin genotypes, which can inform public health measures. The methodology included in both approaches were sound and written in sufficient detail, and data analysis were performed with rigour. Source data were fully presented and accessible to readers. 

      Comments on revised version: 

      The authors have addressed all the points raised by the reviewer. The manuscript is now much enhanced in clarity and accuracy. The re-written Discussion is more relevant and brings in comparison with other invasive Salmonella serotypes. 

      Comments: 

      In light of the metadata supplied in this revision, for Australian isolates, all human cases of ST74 (n=7) were from faeces (assuming from gastroenteritis) while 18/40 of ST10 were from invasive specimen (blood and abscess). This may contradict with the manuscript's finding and discussion on different experiment phenotypes of the two STs, with ST74 showing more replication in macrophages and potentially more invasive. Thus, the reviewer suggests the authors to mention this disparity in the Discussion, and discuss possible reasons underlying this disparity. This can strengthen the author's rationale for further in vivo studies. 

      We thank the reviewer for pointing out this important observation. We have amended the text in the Discussion to address the differences in source of human cases as suggested by the Reviewer (lines 392-430). We have also included text highlighting the important knowledge gaps in understanding the drivers for emerging iNTS with broad host ranges and identify future avenues of research that could be explored to better understand the observed differences in the host-pathogen interactions.  

      Reviewer #2 (Public review): 

      This is a comprehensive analysis of Salmonella Dublin genomes that offers insights into the global spread of this pathogen and region-specific traits that are important to understand its evolution. The phenotyping of isolates of ST10 and ST74 also offer insights into the variability that can be seen in S. Dublin, which is also seen in other Salmonella serovars, and reminds the field that it is important to look beyond lab-adapted strains to truly understand these pathogens. This is a valuable contribution to the field. The only limitation, which the authors also acknowledge, is the bias towards S. Dublin genomes from high income settings. However, there is no selection bias; this is simply a consequence of publicly available sequences. 

      We thank the reviewer for their comments and acknowledge the limitations of this study.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      (1) The authors repeatedly assert that an individual's behavior in the foraging assay depends on its prior history (particularly cultivation conditions). While this seems like a reasonable expectation, it is not fully fleshed out. The work would benefit from studies in which animals are raised on more or less abundant food before the behavioral task.

      Cultivation density: While we agree with the reviewer that testing the effects of varying bacterial density during animal development (cultivation) is an interesting experiment, it is not feasible at this time. We previously attempted this experiment but found it nontrivial to maintain stable bacterial density conditions over long timescales as this requires matching the rate of bacterial growth with the rate of bacterial consumption. Despite our best efforts, we have not been able to identify conditions that satisfy these requirements. Thus, we focused our revised manuscript to include only assertions about the effects of recent experiences and added this inquiry as a future direction (lines 618-624).

      (2) The authors convincingly show that the probability of particular behavioral outcomes occurring upon patch encounter depends on time-associated parameters (time since last patch encounter, time since last patch exploitation). There are two concerns here. First, it is not clear how these values are initialized - i.e., what values are used for the first occurrence of each behavioral state? More importantly, the authors don't seem to consider the simplest time parameter, the time since the start of the assay (or time since worm transfer). Transferring animals to a new environment can be associated with significant mechanical stimulus, and it seems quite possible that transferring animals causes them to enter a state of arousal. This arousal, which certainly could alter sensory function or decision-making, would likely decay with time. It would be interesting to know how well the model performs using time since assay starts as the only time-dependent parameter.

      Parameter Initialization: We thank the reviewer for pointing out an oversight in our methods section regarding the model parameter values used for the first encounter. We clarified the initialization of parameters in the manuscript (lines 1162-1179). In short, for the first patch encounter where k = 1:

      ρ<sub>k</sub> is the relative density of the first patch.

      τ<sub>s</sub> is the duration of time spent off food since the beginning of the recorded experiment. For the first patch, this is equivalent to the total time elapsed.

      ρ<sub>h</sub> is the approximated relative density of the bacterial patch on the acclimation plates (see Assay preparation and recording in Methods). Acclimation plates contained one large 200 µL patch seeded with OD<sub>600</sub> = 1 and grown for a total of ~48 hours. As with all patches, the relative density was estimated from experiments using fluorescent bacteria OP50-GFP as described in Bacterial patch density estimation in Methods.

      ρ<sub>e</sub> is equivalent to ρ<sub>h</sub>.

      Transfer Method: We thank the reviewer for their thoughtful comment on how the stress of transferring animals to a new plate may have resulted in an increased arousal state and thus a greater probability of rejecting patches. We anticipated this possibility and, in order to mitigate the stress of moving, we used an agar plug method where animals were transferred using the flat surface of small cylinders of agar. Importantly, the use of agar as a medium to transfer animals provides minimal disruption to their environment as all physical properties (e.g. temperature, humidity, surface tension) are maintained. Qualitatively, we observed no marked change in behavior from before to after transfer with the agar plug method, especially as compared to the often drastic changes observed when using a metal or eyelash pick. We added these additional methodological details to the methods (lines 791-796).

      Time Parameter: However, the reviewer’s concern that the simplest time parameter (time since start of the assay) might better predict animal behavior is valid. We thank the reviewer for pointing out the need to specifically test whether the time-dependent change in explore-exploit decision-making corresponds better with satiety (time off patch) or arousal (time since transfer/start of assay) state. To test this hypothesis, we ran our model with varying combinations of the satiety term τ<sub>s</sub> and a transfer term τ<sub>t</sub>. We found that when both terms were included in the model, the coefficient of the transfer term was non-significant. This result suggests that the relevant time-dependent term is more likely related to satiety than transfer-induced stress (lines 343-358; Figure 4 - supplement 4D).

      (3) Similarly, Figures 2L and M clearly show that the probability of a search event occurring upon a patch encounter decreases markedly with time. Because search events are interpreted as a failure to detect a patch, this implies that the detection of (dilute) patches becomes more efficient with time. It would be useful for the authors to consider this possibility as well as potential explanations, which might be related to the point above.

      Time-dependent changes in sensing: We agree with the reviewer that we observe increased responsiveness to dilute patches with time. Although this is interesting, our primary focus was on what decision an animal made given that they clearly sensed the presence of the bacterial patch. Nonetheless, we added this observation to the discussion as an area of future work to investigate the sensory mechanisms behind this effect (lines 563-568).

      (4) Based on their results with mec-4 and osm-6 mutants, the authors assert that chemosensation, rather than mechanosensation, likely accounts for animals' ability to measure patch density. This argument is not well-supported: mec-4 is required only for the function of the six non-ciliated light-touch neurons (AVM, PVM, ALML/R, PLML/R). In contrast, osm-6 is expected to disrupt the function of the ciliated dopaminergic mechanosensory neurons CEP, ADE, and PDE, which have previously been shown to detect the presence of bacteria (Sawin et al 2000). Thus, the paper's results are entirely consistent with an important role of mechanosensation in detecting bacterial abundance. Along these lines, it would be useful for the authors to speculate on why osm-6 mutants are more, rather than less, likely to "accept" when encountering a patch.

      Sensory mutant behavior: We thank the reviewer for pointing out the error in our interpretation of the behavior of osm-6 and mec-4 animals. We further elaborated on our findings and edited the text to better reflect that osm-6 mutants lack both chemosensory and mechanosensory ciliated sensory neurons (lines 406-448; lines 567-577). Specifically, we provided some commentary on the finding that osm-6 mutants show an augmented ability to detect the presence of bacterial patches but a reduced ability to assess their bacterial density. While this finding seems contradictory, it suggests that in the absence of the ability to assess bacterial density, animals must prioritize exploiting food resources when available.

      (5) While the evidence for the accept-reject framework is strong, it would be useful for the authors to provide a bit more discussion about the null hypothesis and associated expectations. In other words, what would worm behavior in this assay look like if animals were not able to make accept-reject decisions, relying only on exploit-explore decisions that depend on modulation of food-leaving probability?

      Accept-reject vs. stay-switch: We thank the reviewer for alerting us to this gap in our discussion. We have revised the text to further extrapolate upon our point of view on this somewhat philosophical distinction and what it predicts about C. elegans behavior (lines 507-533).

      Reviewer #3 (Public review):

      (1) Sensing vs. non-sensing

      The authors claim that when animals encounter dilute food patches, they do not sense them, as evidenced by the shallow deceleration that occurs when animals encounter these patches. This seems ethologically inaccurate. There is a critical difference between not sensing a stimulus, and not reacting to it. Animals sense numerous stimuli from their environment, but often only behaviorally respond to a fraction of them, depending on their attention and arousal state. With regard to C. elegans, it is well-established that their amphid chemosensory neurons are capable of detecting very dilute concentrations of odors. In addition, the authors provide evidence that osm-6 animals have altered exploit behaviors, further supporting the importance of amphid chemosensory neurons in this behavior.

      Interpretation of “non-sensing” encounters: We thank the reviewer for their comment and agree that we do not know for certain whether the animals sensed these patches or were merely non-responsive to them. We are, however, confident that these encounters lack evidence of sensing. Specifically, we note that our analyses used to classify events as sensing or non-sensing examined whether an animal’s slow-down upon patch entry could be distinguished from either that of events where animals exploited or that of encounters with patches lacking bacteria. We found that  “non-sensing” encounters are indeed indistinguishable from encounters with bacteria-free patches where there are no bacteria to be sensed (see Figure 2 - Supplement 8A-C and Patch encounter classification as sensing or non-responding in Methods). Regardless, we agree with the reviewer that all that can be asserted about these events is that animals do not appear to respond to the bacterial patch in any way that we measured. Therefore, we have replaced the term “non-sensing” with “non-responding” to better indicate the ethological interpretation of these events and clarified the text to reflect this change (lines 193-200; lines 211-212).

      (2) Search vs. sample & sensing vs. non-sensing

      In Figures 2H and 2I, the authors claim that there are three behavioral states based on quantifying average velocity, encounter duration, and acceleration, but I only see three. Based on density distributions alone, there really only seem to be 2 distributions, not 3. The authors claim there are three, but to come to this conclusion, they used a QDA, which inherently is based on the authors training the model to detect three states based on prior annotations. Did the authors perform a model test, such as the Bayesian Information Criterion, to confirm whether 2 vs. 3 Gaussians is statistically significant? It seems like the authors are trying to impose two states on a phenomenon with a broad distribution. This seems very similar to the results observed for roaming vs. dwelling experiments, which again, are essentially two behavioral states.

      Validation of sensing clusters: We are grateful to the reviewer for pointing out the difficulty in visualizing the clusters and the need for additional clarity in explaining the semi-supervised QDA approach. We added additional visualizations and methods to validate the clusters we have discovered. Specifically, we used Silverman’s test to show that the sensing vs. non-responding data were bi-modal (i.e. a two-cluster classification method fits best) and accompanied this statistical test with heat maps which better illustrate the clusters (lines 171-173; lines 190-191; lines 948-972; lines 1003-1005; Figure 2 - supplement 6A-C; Figure 2 - supplement 7C-F).

      Further, it seems that there may be some confusion as to how we arrived at 3 encounter types (i.e. search, sample, exploit). It’s important to note that two methods were used on two different (albeit related) sets of parameters. We first used a two-cluster GMM to classify encounters as explore or exploit. We then used a two-cluster semi-supervised QDA to classify encounters as sensing or non-sensing (now changed to “non-responding”, see above response) using a different set of parameters. We thus separated the explore cluster into two (sensing and non-responding exploratory events) resulting in three total encounter types: exploit, sample (explore/sensing), and search (explore/non-sensing).

      (4) History-dependence of the GLM

      The logistic GLM seems like a logical way to model a binary choice, and I think the parameters you chose are certainly important. However, the framing of them seems odd to me. I do not doubt the animals are assessing the current state of the patch with an assessment of past experience; that makes perfect logical sense. However, it seems odd to reduce past experience to the categories of recently exploited patch, recently encountered patch, and time since last exploitation. This implies the animals have some way of discriminating these past patch experiences and committing them to memory. Also, it seems logical that the time on these patches, not just their density, should also matter, just as the time without food matters. Time is inherent to memory. This model also imposes a prior categorization in trying to distinguish between sensed vs. not-sensed patches, which I criticized earlier. Only "sensed" patches are used in the model, but it is questionable whether worms genuinely do not "sense" these patches.

      Model design: We thank the reviewer for their thoughtful comments on the model. We completed a number of analyses involving model selection including model selection criteria (AIC, BIC) and optimization with regularization techniques (LASSO and elastic nets) and found that the problem of model selection was compounded by the enormous array of highly-correlated variables we had to choose from. Additionally, we found that both interaction terms and non-linear terms of our task variables could be predictive of accept-reject decisions but that the precise set of terms selected depended sensitively on which model selection technique was used and generally made rather small contributions to prediction. The diverse array of results and combinatorial number of predictors to possibly include failed to add anything of interpretable value. We therefore chose to take a different approach to this problem. Rather than trying to determine what the “best” model was we instead asked whether a minimal model could be used to answer a set of core questions. Indeed, our goal was not maximal predictive performance but rather to distinguish between the effects of different influences enough to determine if encounter history had a significant, independent effect on decision making. We thus chose to only include task variables that spanned the most basic components of behavioral mechanisms to ask very specific questions. For example, we selected a time variable that we thought best encapsulated satiety. While we could have included many additional terms, or made different choices about which terms to include, based on our analyses these choices would not have qualitatively changed our results. Further, we sought to validate the parameters we chose with additional studies (i.e. food-deprived and sensory mutant animals). We regard our study as an initial foray into demonstrating accept-reject decision-making in nematodes. The exact mechanisms and, consequently, the best model design are therefore beyond the scope of this study.

      Lastly, in regards to the use of only sensed patches in the model; while we acknowledge that we are not certain as to whether the “non-responding” encounters are truly not sensed, we find qualitatively similar results when including all exploratory patches in our analyses. However, we take the position that sensation is necessary for decision-making and thus believe that while our model’s predictive performance may be better using all encounters, the interpretation of our findings is stronger when we only include sensing events. We have added additional commentary about our model to the discussion section (lines 667-695).

      (5) osm-6

      The osm-6 results are interesting. This seems to indicate that the worms are still sensing the food, but are unable to assess quality, therefore the default response is to exploit. How do you think the worms are sensing the food? Clearly, they sense it, but without the amphid sensory neurons, and not mechanosensation. Perhaps feeding is important? Could you speculate on this?

      We thank the reviewer for their thoughtful remarks. We have added additional commentary about the result of our sensory mutant experiments as described above in response to Reviewer #1 under Sensory mutant behavior.

      (7) Impact:

      I think this work will have a solid impact on the field, as it provides tangible variables to test how animals assess their environment and decide to exploit resources. I think the strength of this research could be strengthened by a reassessment of their model that would both simplify it and provide testable timescales of satiety/starvation memory.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The authors title the work as an "ethological study" and emphasize the theme of "foraging in naturalistic environments" in contrast to typical laboratory conditions. The only difference in this study relative to typical laboratory conditions is that the food bacteria is distributed in many small patches as compared to one large patch. First, it is not clear to the reviewer that the size of the food patches in these experiments is more relevant to C. elegans in its natural context than the standard sizes of food patches. Furthermore, all the other highly unnatural conditions typical of laboratory cultivation still apply: the use of a 2D agar substrate, a single food bacteria that is not a component of a naturalistic diet, and the use of a laboratory-adapted strain of C. elegans with behavior quite distinct from that of natural isolates. The reviewer is not suggesting that the authors need to make their experiments more naturalistic, only that the experiments as described here should not be described as naturalistic or ethological as there is no support for such claims.

      Ethological interpretation: We thank the reviewer for their comments about the use of the term ethological to describe this study. We chose to develop a patchy bacterial assay to mimic the naturalistic “boom-or-bust” environment. While we agree with the reviewer that we do not know if the size and distribution of the food patches in these experiments is more relevant to C. elegans, we maintain that these experiments were ecologically-inspired and revealed behavior that is difficult to observe in environments with large, densely-seeded bacterial patches. We have updated our text to better reflect that this study was “ecologically-inspired” rather than truly “ethological” in nature (lines 94, 693).

      The main finding of the paper is that worms explore and then exploit, i.e. they frequently reject several bacterial patches before accepting one. This result requires additional scrutiny to reject other possible interpretations. In particular, when worms are transferred to a new plate we would expect some period of increased arousal due to the stressful handling process. A high arousal state might cause rejection of food patches. Could the measured accept/reject decisions be influenced by this effect? One approach to addressing this concern would be to allow the animals to acclimate to the new plate on a bare region before encountering the new food patches.

      We thank the reviewer for their comment on how the stress of transferring animals to a new plate may have resulted in an increased arousal state and thus a greater probability of rejecting patches. We addressed this above in response to Reviewer #1 under Transfer Method and Time Parameter. In brief, we used a worm picking method that mitigated stress and added additional analyses showing that a transfer-related term was less predictive than a satiety-related term.

      Related to the above, in what circumstances exactly are the authors claiming that worms first explore and then exploit? After being briefly deprived of food? After being handled?

      Explore-then-exploit: All animals were well-fed and handled gently as described above under Transfer Method (lines 787-795). Our results suggest that the appearance of an explore-then-exploit strategy is a byproduct of being transferred from an environment with high bacterial density to an environment with low bacterial density as described in the manuscript (lines 461-466).

      The authors emphasize their analysis of the accept/reject decision as a critical innovation. However, the accept/reject decision does not strike me as substantially different from the previously described stay/switch decision. When a worm encounters a new patch of bacteria, accepting this bacteria is equivalent to staying on it and rejecting (leaving) it is equivalent to switching away from it. The authors should explain how these concepts are significantly distinct.

      Accept-reject vs. stay-switch: We thank the reviewer for alerting us to this gap in our discussion. We have revised the text to further extrapolate upon our point of view on this somewhat philosophical distinction and what it predicts about C. elegans behavior (lines 507-533).

      During patch encounter classification, the authors computed three of the animals' behavioral metrics (Line 801-804) and claimed that the combination of these three metrics reveals two non-Gaussian clusters representing encounters where animals sensed the patch or did not appear to sense the patch. The authors also refer to a video to demonstrate the two clusters by rotating the 3-dimension scatter plot. However, the supposed clusters, if any, are difficult to see in a 3D (Video 5) or in a 2D scatter plot (Figure 3I). The authors need to clearly demonstrate the distinct clustering as claimed in the paper as this feature is fundamental and necessary for the model implementation and interpretation of results.

      We are grateful to the reviewer for pointing out the difficulty in visualizing the clusters. We added additional visualizations and methods to validate the clusters we have discovered as described in our above response to Reviewer #3 under Validation of sensing clusters.

      When selecting parameters (covariates) for their model, it is critical to avoid overfitting. Therefore, the authors used AIC and BIC (Figure 4- supplement 1) to demonstrate that the full GLM model has a better model performance than the other models which contain only a subset of the full covariates (in a total of 5). However, the authors compare the full set with only 4 other models whereas the total number of models that need to be compared with is 2^5-2. The authors at least need to include the AIC and BIC scores of all possible models in order to draw the conclusion about the performance of the full model.

      Model selection criterion: We thank the reviewer for pointing out this gap in our methodology. We have now run the model with all combinations of subsets of model parameters and have confirmed that the model with all 5 covariates outperforms all other models even when using BIC, the strictest criterion for overfitting (Figure 1 - supplement 1A). The only other model that performs well (though not as often as the 5-term model) is the 4-term model lacking ρ<sub>h</sub>. This result is not surprising as ρ<sub>h</sub> only changes substantially once in an animal’s encounter history for the single-density, multi-patch data that this model was fit to. For example, for an animal foraging on patches of density 10, on the first encounter ρ<sub>h</sub> = ~200 (see Parameter initialization above), but on every subsequent encounter ρ<sub>h</sub> = ~10. Resultantly, the effect of ρ<sub>h</sub> on the probability of exploiting is somewhat binary on the single-density, multi-patch data set. Nevertheless, we see significantly improved prediction of behavior in the novel multi-density, multi-patch data (Figure 4F) as we observe an effect of the most recently encountered patch. Additionally, we observe a similar impact (i.e., significant coefficient of negative sign) of the ρ<sub>h</sub> term when the model is fit to the multi-density, multi-patch data set (Figure 4 - supplement 4D).

      In any bacterial patch, the edges have a higher density of bacteria than the patch center. Thus, it is possible that a worm scans the patch edge density, on the basis of which it decides to accept or reject the patch whose average density is smaller. This could potentially cause an underestimate of the bacteria density used in the model. Furthermore, the potential inhomogeneity of the patch may further complicate the worm's decision-making, and the discrepancy between the reality and the model assumption will reduce the validity of the model. The authors need to estimate the inhomogeneity of the bacterial patches used in their assays and discuss how the edge effects may affect their results and conclusions.

      Bacterial patch inhomogeneity: We extensively tested the landscape of the bacterial patches by imaging fluorescently-labeled bacteria OP50-GFP (Bacterial Patch Density in Methods; Figure 2 - supplement 1-3). As the reviewer mentions, we observe significantly greater bacterial density at the patch edge. This within-patch spatial inhomogeneity results from areas of active proliferation of bacteria and likely complicates an animal’s ability to accurately assess the quantity of bacteria within a patch and, consequently, our ability to accurately compute a metric related to our assumptions of what the animal is sensing. In our study, we used the relative density of the patch edge where bacterial density is highest as a proxy for an animal’s assessment of bacterial patch density (Figure 2 – supplement 1). This decision was based on a previous finding that the time spent on the edge of a bacterial patch affected the dynamics of subsequent area-restricted search. While within-patch spatial inhomogeneity likely affects an animal’s ability to assess patch density, we do not believe that this qualitatively affects the results of our study. Both the patch densities tested (Figure 2 – supplement 3A) as well as our observations of time-dependent changes in exploitation (Figure 2E,N-O; Figure 3H-I) maintained a monotonic relationship. Therefore, alternative methods of patch density estimation should yield similar results. We have added additional discussion on this topic to our manuscript (lines 578-593).

      The authors claim that their methods (GMM and semi-supervised QDA) are unbiased. This seems unlikely as the QDA involves supervision. The authors need to provide additional explanation on this point.

      Semi-supervised QDA labelling: We have removed the term “unbiased” to avoid any misinterpretation of the methodology and clarified our method of labelling used for “supervising” QDA. Specifically, we made two simple assumptions: 1) animals must have sensed the patch if they exploited it and 2) animals must not have sensed the patch if there was no bacteria to sense. Thus, we labeled encounters as sensing if they were found to be exploitatory as we assume that sensation is prerequisite to exploitation; and we labeled encounters as non-sensing for events where animals encountered patches lacking bacteria (OD<sub>600</sub> = 0). All other points were non-labeled prior to learning the model. In this way, our labels were based on the experimental design and results of the GMM, an unsupervised method; rather than any expectations we had about what sensing should look like. The semi-supervised QDA method then used these initial labels to iteratively fit a paraboloid that best separated these clusters, by minimizing the posterior variance of classification (lines 1012-1021). See Figure 2 - supplement 8A-B for a visualization showing the labelled data.

      Based on the authors' result, worms behaviorally exhibit their preferences toward food abundance (density), which results in a preference scale for a range of densities. Does this scale vary with the worms' initial cultivation states? The author partially verified that by observing starved worms. This hypothesis could be better tested if the authors could analyze the decision-making of the worms that were initially cultivated with different densities of bacterial food.

      While we agree with the reviewer that testing the effects of varying bacterial density during animal development (cultivation) is a very interesting experiment, it is not feasible at this time. We focused our revised manuscript to include only assertions about the effects of recent experiences and added this inquiry as a future direction as described above in our response to Reviewer #1 under Cultivation density.

      It would be helpful to elaborate more on how the framework developed in this paper can be applied more broadly to other behaviors and/or organisms and how it may influence our understanding of decision-making across species.

      We thank the reviewer for alerting us to this gap in our discussion. We have added additional commentary about our model and its utility to the discussion section (lines 667-695).

      Reviewer #3 (Recommendations for the authors):

      Sensing vs. non-sensing

      Perhaps a more ethologically accurate term to describe this behavior would be "ignoring" rather than "not sensing". If the authors feel strongly about using the term "not sensing", then they should provide experimental evidence supporting this claim. However, I think simply changing the terminology negates these experiments.

      We thank the reviewer for their thoughtful comments. While we agree with the reviewer that the term “non-sensing” may not be ethologically accurate (see response to Public Review above under Interpretation of “non-sensing” encounters), we interpret the term “ignoring” to mean that the animal sensed the patches but decided not to react. We have chosen to replace the term “non-sensing” with “non-responding” to best indicate the ethological interpretation of our observation. Nonetheless, we believe that it remains possible that animals are truly not sensing the bacterial patches as our method of classification compared the behavior against encounters with patches lacking bacteria (as described above in response to Reviewer #2 under Semi-supervised QDA labelling).

      History-dependence of the GLM

      Perhaps a simpler approach would be to say the worm senses everything, and this accumulative memory affects the decision to exploit. For example, the animal essentially experiences two feeding states: feeding on patches, and starvation off of patches.

      The level of satiety could be modeled linearly:

      Satiety(t_enter:t_leave) = k_feed*patch_density*delta_t

      Where k_feed is some model parameter for rate of satiety signal accumulation, t_enter is the time the animal entered the patch, t_leave is the time the animal left the patch, and delta_t is the difference between the two. Perhaps you could add a saturation limit to this, but given your data, I doubt that is the case.

      Starvation could be modeled as simply a decay from the last satiety signal:

      Starvation(t_leave:t_enter) = Satiety(t_leave)*exp(-k_starve*delta_t).

      Where starvation is the rate constant for the decay of the satiety signal.

      For the logistic model, the logistic parameter is simply the difference between the current patch density and the current satiety signal.

      A nice thing about this approach is that it negates the need to categorize your patches. All patch encounters matter. Brief patch encounters (categorized as non-sensing and not used in the prior GLM) naturally produce a very small satiety signal and contribute very little to the exploit decision. Another nice thing about this approach is that it gives you memory timescales, that are testable. There is a rate of satiety accumulation and a rate of satiety loss. You should be able to predict behavior with lower patch density, assuming the rate constants hold. (I am not advocating you do more experiments here, just pointing out a nice feature of this approach).

      You could possibly apply this to a GLM for velocity on a non-exploited patch as well, though I assume this would be a linear GLM, given the velocity distributions you provided.

      We thank the reviewer for their time and thoughtfulness in thinking about our model. The reviewer’s proposed model seems entirely reasonable and could aid in elucidating the time component of how prior experience affects decision-making. However, we decided to keep our paper focused on using a minimal model to answer a set of core questions (e.g., Does encounter history or satiety influence decision-making?) (see above under Model design for a more detailed response). Future studies investigating the mechanisms of these foraging decisions should open the door for more mechanistically accurate models. We have expanded our discussion of the model to include this assertion (lines 667-695).

    1. Author response:

      The following is the authors’ response to the original reviews

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Sample size: If the sample size of the study is increased, more confidence and new insights can be inferred about myometrial enhancer-mediated gene regulation in term pregnancy. Such a small sample size (N = 3) limits the statistical power of the study. As mentioned in the manuscript they failed to identify chromatin loops in the second subject's biopsy is observed due to a limited sample.

      We agree with the reviewer’s comment about the sample size. We sincerely hope the result of this study would increase the interest of stakeholders to fund future projects in a larger scale.

      (2) Figure quality: There is a lack of good representations of the results (e.g., screenshots of tables as figure panels!) as well as missing interpretations that might add value to the manuscript.

      Figure 1B and 2B have been converted to the pie chart format.

      (3) Definition of super-enhancer: The definition of super-enhancer is not clear. Also, the computational merging of enhancers to define super-enhancers should be described better.

      Added more details about tool and parameter setting in the Method section of “Identification of super enhancers”:

      “Identification of super enhancers

      H3K27ac-positive enhancers were defined as regions of H3K27ac ChIP-seq peaks in each sample. The enhancers within 12.5Kb were merged by using bedtools merge function with parameter “-d 12500”. The combined enhancer regions were called super enhancers if they were larger than 15Kb. The common super enhancers from multiple samples were used for downstream analysis.”

      Reference:

      Whyte WA, Orlando DA, Hnisz D, Abraham BJ, Lin CY, Kagey MH, Rahl PB, Lee TI, Young RA. Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell. 2013 Apr 11;153(2):307-19. doi: 10.1016/j.cell.2013.03.035. PMID: 23582322; PMCID: PMC3653129.

      (4) Assay-Specific Limitations: Each assay employed in the study, such as ChIP-Seq and CRISPRa-based Perturb-Seq, has its limitations, including potential biases, sensitivity issues, and technical challenges, which could impact the accuracy and reliability of the results. These limitations should be addressed properly to avoid false-positive results and improve the interpretability of the results.

      The major limitations of the CRISPRa-based Perturb-Seq protocol in this study are the use of the hTERT-HM cells and the two-vector system for transduction. While hTERT-HM cells are a much easier platform in terms of technical operation, primary human myometrial cells are generally considered retaining a molecular context that is closer to the in vivo tissues. Due to the limitation on the efficiency of having two vectors simultaneously present in the same cell, hTERT-HM cells are much more affordable and operationally feasible to conduct the experiment. Future advancements on the increase of viral vector payload capacity may overcome this challenge and open the venue to perform the assay on primary human myometrial cells.

      (5) Sample collection and comparison: There is mention of matched gravid term and non-gravid samples whereas no description or use of control samples was found in the results. Also, the comparison of non-labor samples with labor samples would provide a better understanding of epigenomic and transcriptomic events of myometrium leading to laboring events.

      The description has been updated:

      “Collection of myometrial specimens

      Permission to collect human tissue specimens was prospectively obtained from individuals undergoing hysterectomy or cesarean section for benign clinical indications (H-33461). Gravid myometrial tissue was obtained from the margin of the hysterotomy in women undergoing term cesarean sections (>38 weeks estimated gestational age) without evidence of labor. Non-gravid myometrial tissue was collected from pre-menopausal women undergoing hysterectomy for benign conditions. Specimens from gravid women receiving treatment for pre-eclampsia, eclampsia, pregnancy-related hypertension, or pre-term labor were excluded.”

      (6) Lack of clarity:

      (6a) It is written as 'Chromatin Conformation Capture (Hi-C)'. I think Hi-C is Histone Capture and 3C is Chromosome Conformation Capture! This needs clear writing.

      As the reviewer suggested, to make it clear, we have changed the text “A high throughput chromatin conformation capture (Hi-C) assay” to “A High-throughput Chromosome Conformation Capture (Hi-C) assay”.

      (6b) In multiple places, 'PLCL2' gene is written as 'PCLC2'.

      Corrected as suggested.

      (6c) What is the biological relevance of considering 'active' genes with FPKM {greater than or equal to} 1? This needs clarification.

      In RNA-seq analysis, the gene expression levels are often quantified using FPKM (Fragments Per Kilobase of transcript per Million mapped reads). Setting a threshold of FPKM for defining "active" genes in RNA-seq analysis is biologically relevant, because it helps to distinguish between genuinely expressed genes and background noise. It helps researchers focus on genes, which are more likely to have a significant biological impact. A common threshold for defining "active" genes is FPKM ≥ 1. Genes with FPKM values below this threshold may be transcribed at very low levels or could be background noise.

      (6d) The understanding of differentially methylated genes at promoters is underrated as per the authors. But, why leaving DNA methylation apart, they selected histone modification as the basis of epigenetic reprogramming in terms of myometrium is unclear.

      DNA methylation indeed plays a crucial role in evaluating the impact of cis-acting elements on gene regulation. Large-scale studies, such as the comprehensive analysis of the myometrial methylome landscape in human biopsies (Paul et al., JCI Insight, 2022, PMID: 36066972), have provided valuable insights. When integrated with histone modification and chromatin looping data, contributed by our group and collaborators, future secondary analyses leveraging machine learning are poised to further elucidate the mechanisms underlying myometrial transcriptional regulation.

      (6e) How does the identification of PGR as an upstream regulator of PLCL2 gene expression in human myometrial cells contribute to our understanding of progesterone signaling in myometrial function?

      In a previous study, we demonstrated a positive correlation between PLCL2 and PGR expression in a mouse model and identified PLCL2's role in negatively modulating oxytocin-induced myometrial cell contraction (Peavy et al., PNAS, 2021, PMID: 33707208). The present study builds on this by providing evidence for a direct regulatory mechanism in which PGR influences PLCL2 transcription, likely through a cis-acting element located 35 kb upstream. These findings suggest that PLCL2 acts as a mediator of PGR-dependent myometrial quiescence prior to labor, rather than merely participating in a parallel pathway. Further in vivo studies are necessary to delineate the extent to which PLCL2 mediates PGR activity, particularly the contraction-dampening function of the PGR-B isoform.

      (7) Grammatical error: The manuscript has numerous grammatical errors. Please correct them.

      Corrections have been made as suggested.

      (8) Use of single-cell data: Though from the Methods section, it can be understood that single-cell RNA-seq was done to identify CRISPRa gRNA expressing cells to characterize the effect of gene activation, some results from single-cell data e.g., cell clustering, cell types, gRNA expression across clusters could be added for better elucidation.

      As reviewer suggested, we have prepared a file “PerturbSeq_summary.xlsx” (Dataset S9) to provide additional results of perturb-seq data analysis. It includes 2 spreadsheets, “Cell_per_gRNA” for clustering and “Protospacer_calls_per_cell” for gRNA expression across clusters.

      Reviewer #2 (Recommendations For The Authors):

      (1) The following are a number of grammatical issues in the abstract. I suggest having a careful read of the entire manuscript to identify additional grammatical issues as I may not be able to highlight all of these issues.

      (1a) "The myometrium plays a critical component during pregnancy." change component to role.

      (1b) "It is responsible for the uterus' structural integrity and force generation at term," à replace "," with "."

      (1c) Also, I suggest rephrasing the first 2 sentences to: The myometrium plays a critical role during pregnancy as it is responsible for both the structural integrity of the uterus and force generation at term.

      (1d) "Here we investigated the human term pregnant nonlabor myometrial biopsies for transcriptome, enhancer histone mark cistrome, and chromatin conformation pattern mapping." Remove "the", and modify to "Here we investigated human term pregnant".

      (1e) Missing period and sentence fragment, "PGR overexpression facilitated PLCL2 gene expression in myometrial cells Using CRISPR activation the functionality of a PGR putative enhancer 35-kilobases upstream of the contractile-restrictive gene PLCL2.

      Corrections have been made as suggested.

      (2) Sentence fragment: Studies on the role of steroid hormone receptors in myometrial remodeling have provided evidence that the withdrawal of functional progesterone signaling at term is due to a stoichiometric increase of progesterone receptor (PGR) A to B isoform-related estrogen receptor (ESR) alpha expression activation at term. (Mesiano, Chan et al. 2002) (Merlino, Welsh et al. 2007) (Nadeem, Shynlova et al. 2016).

      The statement has been updated:

      “Studies on the role of steroid hormone receptors in myometrial remodeling suggest that the withdrawal of functional progesterone signaling at term results from a stoichiometric shift favoring the PGR-A isoform over PGR-B. This shift is associated with increased activation of estrogen receptor alpha (ESR1) expression at term (Mesiano, Chan et al. 2002) (Merlino, Welsh et al. 2007) (Nadeem, Shynlova et al. 2016).”

      (3) FOS:JUN heterodimers are implicated to be critical for the initiation of labor through transcriptional regulation of gap junction proteins such as Cx43 (Nadeem, Farine et al. 2018) (Balducci, Risek et al. 1993).

      Use Gja1 (Gap junction alpha 1) as the current correct gene, not Cx43.

      Also, several references predate Nadeem, Farine et al. 2018 and are more appropriate to use as references for the role of Ap-1 proteins in regulating Gja1; PMID: 15618352 and PMID: 12064606 were the first to show this relationship in myometrial cells.

      The statement has been updated as suggested:

      “FOS:JUN heterodimers are implicated to be critical for the initiation of labor through transcriptional regulation of gap junction proteins such as GJA1 (Nadeem, Farine et al. 2018) (Balducci, Risek et al. 1993)”

      (4) Define PLCL2 on first use.

      Updated as suggested.

      (5) There are a number of issues with this section, "Matched sSpecimens of gravid myometrium were collected at the margin of hysterotomy from women undergoing clinically indicated cesarean section at term (>38 weeks estimated gestation age) without evidence of labor. Specimens of healthy, non-gravid myometrium were also pecimens were collected from uteri removed from pre-menopausal women undergoing hysterectomy for benign clinical indications."

      The description has been updated:

      “Collection of myometrial specimens

      Permission to collect human tissue specimens was prospectively obtained from individuals undergoing hysterectomy or cesarean section for benign clinical indications (H-33461). Gravid myometrial tissue was obtained from the margin of the hysterotomy in women undergoing term cesarean sections (>38 weeks estimated gestational age) without evidence of labor. Non-gravid myometrial tissue was collected from pre-menopausal women undergoing hysterectomy for benign conditions. Specimens from gravid women receiving treatment for pre-eclampsia, eclampsia, pregnancy-related hypertension, or pre-term labor were excluded.”

      (6) Enriched motifs were identified by HOMER (Hypergeometric Optimization of Motif EnRichment) v4.11 (Heinz, Benner et al. 2010).

      Please clarify what background is used for motif enrichment.

      We used the default background sequences generated by HOMER from a set of random genomic sequences matching the input sequences in terms of basic properties, such as GC content and length. We have added more details in the Method section:

      “DNA-binding factor motif enrichment analysis

      Enriched motifs were identified by HOMER (Hypergeometric Optimization of Motif EnRichment) v4.11 with default background sequences matching the input sequences (Heinz, Benner et al. 2010).”

      (7) "Six of the seven regions are also co-localized with previously published genome occupancy of transcription regulators curated by the ReMap Atlas"

      Please clarify if this Atlas includes myometrial tissues or not and clarify the cell types included in the atlas.

      According to the UCSC Genome Browser and the reference by Hammal et al. (2022), the current ReMap database includes PGR ChIP-seq data from human myometrial biopsies, available under NCBI GEO accession number GSE137550, alongside data from various other cell and tissue types. ReMap provides valuable insights into potential functional cis-acting elements in the genome from a systems biology perspective. However, tissue specificity requires independent validation.

      (8) "Notably, 76% of the putative super-enhancers are co-localized with known PGR-occupied regions in the human myometrial tissue (Figure S2). This is significantly higher than the 20% co-localization in the regular enhancer group (Figure S2)."

      Because there is a huge difference in the size of the putative super enhancer regions and the isolated enhancers this comparison is not appropriate as conducted. The comparison needs to account for the difference in size of the regions. Please provide P values for significance statements.

      We acknowledge the reviewer's concern that our initial statement was overstated and potentially misleading, given the substantial difference in size between putative super-enhancer regions and regular enhancers. Rather than emphasizing the enrichment, it would be more accurate to simply describe our observation that super-enhancers encompass more PGR-occupied regions.

      Here is the updated version:

      “Notably, 76% of the putative super-enhancers co-localize with known PGR-occupied regions in human myometrial tissue, compared to 20% co-localization observed in regular enhancers (Figure S2).”

      Reviewer #3 (Recommendations For The Authors):

      (1) Title is extremely misleading, as here we do not get a view of the epigenomic landscape, but rather sparce data related to H3K27ac and H3K4me (focusing on enhancers) and chromatin conformation associated with the PLCL2 transcription start site (TSS).

      As suggested, the title is modified to “Assessment of the Histone Mark-based Epigenomic Landscape in Human Myometrium at Term Pregnancy”.

      (2) Improve the first result paragraph by providing a clear rationale for the experiments and their objectives, as well as introducing the samples used. Rather than simply listing approaches and end results in Table 1, offer concise explanations for the experiments alongside the supporting data presented in detailed figures. Using appropriate figures/graphs to effectively contextualize these datasets would be greatly appreciated by readers and would add more value to this research. Currently, it is difficult for us to assess and appreciate the quality of the data.

      The following statement is included in the beginning of the Result section:

      "To better understand the regulatory network shaping the myometrial transcriptome before labor, we analyzed transcriptome and putative enhancers in individual human myometrial specimens. Using RNA-seq, we identified actively expressed RNAs, while ChIP-seq for H3K27ac and H3K4me1 was used to map putative enhancers. Active genes were associated with nearby putative enhancers based on their genomic proximity. Additionally, chromatin looping patterns were mapped using Hi-C to further link active genes and putative enhancers within the same chromatin loops."

      (3) The statistics for every sequencing approach need to be provided for each sample (e.g., RNA-seq: number of total reads, number of mapped reads, % of mapped reads; ChIP-Seq: number of mapped reads, % of mapped reads, % of duplicates).

      We have generated the summary table of each dataset included in this study (Dataset S7) [NGS-summary.xls].

      (4) Figure S1: The rationale behind comparing the Dotts study and yours regarding H3K27ac-positive regions needs to be better defined. Why is this performed if the data will not be used afterwards? What are the conserved regions associated with vs the ones that are variable? Is this biologically relevant? Why not use only the regions conserved between the 6 samples, to have more robust conclusions?

      The purpose of comparing our data with the Dotts dataset is to highlight the degree of variation across studies. In this study, we focused on addressing specific biological questions using our own dataset rather than developing methodologies for meta-analysis. Future advancements in meta-analysis techniques could leverage the combined power of multiple datasets to provide deeper insights.

      (5) Perhaps due to a lack of details, I am unable to ascertain how the putative myometrial enhancers were defined. In Dataset S1, it is stated, "we define the regions that have overlapping H3K27ac and H3K4me1 marks as putative myometrial enhancers at the term pregnant nonlabor stage (Dataset S1)". Within Dataset S1, for subjects 1, 2, and 3, H3K27ac and H3K4me1 double-positive enhancers are shown in term pregnant, non-labor human myometrial specimens, with approximately 100 regions corresponding to 131 (sample 1), 127 (sample 2), and 140 (sample 3) common peaks. However, in Figure 1a, reference is made to the 13114 putative enhancers commonly present across the three specimens. Is Dataset S1 intended to represent only a small fraction of the 13114 putative enhancers? Detailed analyses need to be conducted and better showcased.

      Dataset S1 has been updated to list all 13,114 putative enhancers.

      (6) For the gene expression analyses of RNA-seq data, FPKM values were utilized. However, it is unclear why the gene expression count matrix was normalized based on the ratio of total mapped read pairs in each sample to 56.5 million for the term myometrial specimens. I would recommend exercising caution regarding the use of FPKM expression units, as samples are normalized only within themselves, lacking cross-sample normalization. Consequently, due to external factors unaccounted for by this normalization method, a value of 10 in one sample may not equate to 10 in another.

      We value the reviewer’s input. This question will be addressed in future secondary data analyses with suitable methodologies, as it is beyond the scope of this study.

      (7) In Figure 1b, the authors have categorized their 12157 active genes into 3 bins based on FPKM values: >5 FPKM >1, >15 FPKM >5, and >15 FPKM. However, in the text, they describe these as 'actively high-expressing genes (FPKM >= 15)'. I would advise caution regarding the interpretation of these values, as an FPKM of 15 is not typically associated with highly expressed genes. According to literature and resources such as the Expression Atlas, an FPKM of 15 is generally considered to represent a low to medium expression level.

      We appreciate the reviewer’s feedback. This question will be revisited during secondary data analyses using appropriate methodologies, as it falls outside the scope of the present study.

      To increase readability and clarity, we modified the sentence as following: More than 40% of the 540 putative super enhancers are located within a 100-kilobase distance to high-expressing genes (FPKM >= 15), while only 7.3% of putative myometrial super enhancers are found near low-expressing genes (5 > FPKM >=1) (Figure 2B).

      (8) Out of the 12157 active genes, approximately two-thirds have an FPKM >15. Was this expected? How does this correspond to what is observed in the literature, particularly in other similar studies (https://pubmed.ncbi.nlm.nih.gov/30988671/ ; https://pubmed.ncbi.nlm.nih.gov/35260533/ ) .

      This is indeed an intriguing question that merits further exploration in future secondary analyses.

      (9) It is also surprising to see that for the motif enrichment analysis (Fig. 1C), the P-values are small. This is probably because the percentage of target sequences with the motif is very similar to the percentage of background sequences with the motif. For instance, for selected genes in Figure 1C: AP-1 (50.68% vs. 46.50%), STAT5 (28.08% vs. 25.04%), PGR (17.90% vs. 16.12%), etc. Can one really say that you have a biologically relevant enrichment for values that are so close between target sequences and background sequences?

      Reviewer’s comment is noted. Biological relevance shall be experimentally examined though wet-lab assays in future studies.

      (10) For Figure 2, again not convinced that FPKM >= 15 can be used to say: Compared with the regular putative enhancers, the putative myometrial super-enhancers are found more frequently near active genes that are expressed at relatively higher levels (Figure 1B and Figure 2B). A higher threshold should be used if they want to say this.

      To compare the association of putative enhancers with active genes expressed at different levels, we categorized the active genes into three groups based on their FPKM (Fragments Per Kilobase of transcript per Million mapped reads) values. These groups are defined as follows: the top third active genes (FPKM ≥ 15), the middle third active genes (5 ≤ FPKM < 15), and the bottom third active genes (1 ≤ FPKM < 5). By "active genes expressed at relatively higher levels," we refer specifically to the top third active genes with FPKM values of 15 or higher, indicating their relatively higher expression levels compared to the other groups of active genes.

      (11) More detailed explanations and methods are needed regarding how the data for Figure S2 was obtained.

      The following details were added to the methods section:

      “Colocalization of super enhancers and PGR genome occupancy was compared by calling peaks from previously published PGR ChIP-seq data (GSM4081683 and GSM4081684). The percentages of enhancers and super enhancers that manifest PGR occupancy were calculated by overlapping the genomic regions in each category with PGR occupancy regions.”

      (12) In Figure 2C, there is no information provided on the genes used to obtain the results. It would be helpful to include examples of these genes, along with their expression values, for instance.

      The expression levels of the 346 active genes that are associated with myometrial super enhancers are included in Dataset S4, along with results of the updated gene ontology enrichment analysis using the Database for Annotation, Visualization, and Integrated Discovery (DAVID) of Knowledgebase v2024q4. Selected pathways of interest are listed in updated Figure 2C.

      (13) The linking of PLCL2-related data to the first part of the story is lacking, and the rationale behind it is missing. This entire section should be more detailed, and the data should be expanded to better reflect the context.

      As suggested, we included the following statement at the beginning of the section “Cis-acting elements for the control of the contractile gene PLCL2”:

      “We previously demonstrated the positive correlation of PLCL2 and PGR expression in a mouse model and PLCL2’s function on negatively modulating oxytocin-induced myometrial cell contraction (Peavy et al., 2021). However, the mechanism underlies the PGR regulation of PLCL2 remains unclear. Taking advantage of the mapped myometrial cis-acting elements, we aimed to identify the cis-acting elements that may contribute to the PLCL2 transcriptional regulation with a special interest on the PGR-related enhancers.”

      The context is that our results provide additional evidence to support a direct regulation mechanism of PGR on the PLCL2 transcription, likely though the 35-kb upstream cis-acting element. This finding suggests that PLCL2 likely plays a mediator’s role of PGR dependent myometrial quiescence before laboring rather than a mere passenger on a parallel pathway. Further studies using in vivo models are needed to determine the extent of PLCL2 in mediating PGR, especially PGR-B isoform’s contraction-dampening function.

      (14) The entire Hi-C data should be presented to allow for the assessment of its quality and further value.

      The revised manuscript has included the Hi-C quality control summary in Dataset S8 [HiC-QC-Summary.xlsx].

      (15) The authors state: "For the purpose of functional screening, we focus on H3K27ac signals instead of using H3K27ac/H3K4me1 double positive criterium to cast a wider net." However, it is unclear how many of the targeted regions contained H3K27ac/H3K4me1 peaks. Were enhancers or super-enhancers targeted, and if so, how did they compare to H3K27ac sites?

      The numbers of H3K27ac/H3K4me1 double positive peaks are recorded in Figure 1A. Compared to the numbers of H3K27ac intervals (Table 1), the H3K27ac/H3K4me1 double positive peaks are 62.9%, 70.7%, and 61.2% of corresponding H3K27ac intervals in each individual specimen.

      (16) For the first set of data (Table 1), the authors state, "Together, these results reveal an epigenomic landscape in the human term pregnant myometrial tissue before the onset of labor, which we use as a resource to investigate the molecular mechanisms that prepare the myometrium for subsequent parturition." While it is acknowledged that an epigenetic landscape exists in all tissues, there is a lack of clarity regarding this landscape in the current manuscript, as we are only presented with a table containing numbers.

      This sentence has been revised to: “Together, these results delineate a map of H3K27ac and H3K4me1 positive signals in the human term pregnant myometrial tissue before the onset of labor, which we use as a resource to investigate the molecular mechanisms that prepare the myometrium for subsequent parturition.”

      (17) For S1, the authors conclude: These data together highlight the degree of variation in mapping the epigenome among specimens and datasets. This conclusion seems somewhat perplexing, and I find myself in partial disagreement. Firstly, providing a clear rationale for this section would strengthen the conclusions. It's important to consider what factors may contribute to this variability. It could simply be attributed to differences in experimental settings, such as variations in samples, protocols used, antibodies, sequencing departments, or overall data quality. Deeper analyses of the data could have provided more information.

      We agree with the reviewer that deeper analyses are needed in order to extract more information among studies. However, appropriate methods for meta-analyses should be carefully evaluated and employed for this purpose. We humbly believe that such a task should belong to future studies that may combine available datasets for secondary analyses, leveraging the collective contribution of the reproductive biology community.

      (18) In the methods section, please include an explanation of how enhancers and super-enhancers were defined or add appropriate citations for reference.

      Added more details about tool and parameter setting in the Method section of “Identification of super enhancers”.

      “Identification of super enhancers

      H3K27ac-positive enhancers were defined as regions of H3K27ac ChIP-seq peaks in each sample. The enhancers within 12.5Kb were merged by using bedtools merge function with parameter “-d 12500”. The combined enhancer regions were called super enhancers if they were larger than 15Kb. The common super enhancers from multiple samples were used for downstream analysis.”

      Reference:

      Whyte WA, Orlando DA, Hnisz D, Abraham BJ, Lin CY, Kagey MH, Rahl PB, Lee TI, Young RA. Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell. 2013 Apr 11;153(2):307-19. doi: 10.1016/j.cell.2013.03.035. PMID: 23582322; PMCID: PMC3653129.

      (19) Additional description on the "Inferred myometrial PGR activities and the correlation analysis "method section should be included to enhance clarity and understanding.

      The description has been updated:

      “The inferred PGR activities were represented by the T-score, which was derived by inputting the mouse myometrial Pgr gene signature, based on the differentially expressed genes between control and myometrial Pgr knockout groups at mid-pregnancy (Wu, Wang et al., 2022), into the SEMIPs application (Li, Bushel et al., 2021). The T-scores were computed using this signature alongside the normalized gene expression counts (FPKM) from 43 human myometrial biopsy specimens.”

      (20) How was the qPCR analysis performed? Was the ddCT method utilized, and was a reference gene used for control? Additional information would be beneficial.

      Quantifying relative mRNA levels was performed via the standard curve method.

      The following details were added: “Relative levels of genes of interest were normalized to the 18S rRNA.”

      (21) Regarding the RNA-Seq analysis of Provera-treated human Myometrial Specimens, the continued use of FPKM is not ideal due to potential differences in RNA composition between libraries. Additionally, clarification is needed on why Cufflinks 2.0.2 was used, considering it is no longer supported.

      FPKM (Fragments Per Kilobase of transcript per Million mapped reads) is used in RNA-Seq analysis, because it allows for the normalization of gene expression data, accounting for differences in gene length and sequencing depth, and facilitates comparability across different genes and libraries. This makes it one of the essential tools for accurately measuring and comparing gene expression levels in various biological and clinical research contexts.

      CuffLinks was once a popular tool for analyzing RNA-seq data, transcriptome assembly, and DEG identification. Its usage has declined in recent years due to the emergence of newer and more advanced tools. The main reason is that it was used for RNA-seq analysis at early stage of this study a few years ago. For the purpose of comparison and consistency, we continued using this tool for later RNA-seq analysis. If we start a new project now, we will choose newer tools, such as HISAT2, Salmon, and DEseq2.

      (22) Overall, sentence structure and typos need to be corrected across the text. Here are some examples:

      Line 17: at term, emerging studies.

      Line 20-22: Here we investigated the human term pregnant nonlabor myometrial biopsies for transcriptome, enhancer histone mark cistrome, and chromatin conformation pattern mapping.

      Line 30-32: PGR overexpression facilitated PLCL2 gene expression in myometrial cells Using CRISPR activation the functionality of a PGR putative enhancer 35-kilobases upstream of the contractile-restrictive gene PLCL2.

      Line 66-70: However, the role of differential myometrial DNA methylation at contractility-driving gene promoter CpG islands in preterm birth is not thought to be major (Mitsuya, Singh et al. 2014), but given that DNA methylation-mediated gene regulation often occurs outside of CpG islands (Irizarry, Ladd-Acosta et al. 2009), there is still work to be done at this interface.

      Line 80-83: Putative enhancers upstream of the PLCL2, a gene encoding for the protein PLCL2 which has been implicated in the modulation of calcium signaling (Uji, Matsuda et al. 2002) and maintenance of myometrial quiescence (Peavey, Wu et al. 2021), transcriptional start site were subject to functional assessment using CRISPR activation based assays.

      Line 290 : sSpecimens

      We appreciate the reviewer’s kind efforts and have made changes accordingly.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Recommendations for the authors):

      Major comments

      (1) The section on page 20 describing the proteomic analysis of EVs is poorly written and confusing, with a lot of data in the supplement. It is not clear what the proteomics data actually means.

      We appreciate your feedback on the clarity of the proteomic analysis section. We have rewritten the section on page 20 with more detained information to provide a clearer explanation of the proteomics data and its biological significance. Additionally, we have incorporated a comparative analysis of the EV and total cell lysate proteomes (Fig. 8E, Supplementary Fig. S7A, Supplementary Tables 3 and 4) for supplemental data interpretation.

      (2) The order of the data could be improved.

      We appreciate your feedback regarding the data organization. We have reorganized the order and position of some data in a more structured and coherent manner, as suggested by the reviewers.

      - Reorganization of the qPCR data (previously Fig. 1C) as Fig. 3A

      - Removal of the data on the growth analysis on raffinose media (previously Fig. 7H).

      -Reorganization of the spotting data of the double mutant (previously Fig 3B) to Supplementary Fig. S3B

      - Reorganization of the subcellular localization data (previously Fig 3E) to Supplementary Fig. S4A

      (3) The discussion is repetitive with the introduction and merely summarizes the results and speculates on the mechanism of how the absence of UGGT, leading to ERQC defects, results in defective EV biogenesis/cargo loading in C. neoformans.

      We removed several repetitive sentences in the discussion and provided additional information on proteome analysis.

      Other questions and comments

      (1) Instead of comprehensively analyzing EVs from the UGG1 mutant, a more informative approach to better understanding how defects in N-linked glycosylation impact secretion, would be to do a proteomic analysis on the total secretions (including beta glucanase-treated cells to release classically secreted proteins from the cell wall) and EVs.

      We agree that a comprehensive proteomic analysis of total secretions and classically secreted proteins would provide deeper insights into how defects in N-glycosylation impact secretion in C. neoformans. To address this concern, we performed an additional set of proteomic analyses, the proteome profiles of total cell lysates and the secretome of C. neoformans cultivated in SD broth and presented the results as Supplementary Table S5 and Supplementary Fig. S7B. These additional analyses provide further insights into the impact of UGG1 deletion on both conventional and unconventional secretion pathways, supporting a more pronounced effect of the UGG1 defect on EV-mediated trafficking. The discussion has been updated accordingly (Page 22, lines 509-514).

      (2) The melanization defect in Ugg1 mutant is not strong. Could the reduction be due to partially compromised Ugg1 mutant growth at 30{degree sign}C as indicated in the spot tests. Were photos of the spot dilution assays taken at 1 and 2 days to investigate slower growth? Or alternatively were growth curves taken in a liquid culture?

      For accuracy of melanin synthesis defect, in addition to analysis on L-DOPA plates, we had assessed melanin production in liquid L-DOPA medium following a 3-day incubation, and the melanin production in liquid media was normalized by cell density (OD<sub>600</sub>). The data on normalized melanin production is now included as Fig. 4B in the revised manuscript. The defective laccase activity in the _ugg1_Δ mutant (Fig. 7C) further corroborates our melanization assay results, which is additionally mentioned in the text (Page 18, lines 393-395).

      (3) Is it accurate to say that some virulence factors (i.e. melanin, capsule and phosphatases) are predominantly trafficked through EV's in C. neoformans? Have studies been done to determine the proportion of virulence factors trafficked via EV's versus traditional secretion?

      We thank you for the thoughtful comments. Some virulence factors, such as urease, melanin and capsule polysaccharides, lack a signal peptide required for targeting for the conventional ER/Golgi secretion pathway. It is generally assumed that the trafficking of these factors in C. neoformans is predominantly mediated by non-conventional secretion via EVs. Additionally, even some virulence factors with signal peptides, such as laccase and phosphatases, are also transported via EVs besides the conventional secretion. The quantitative analysis to compare the proportion of virulence factors secretion via EVs versus the conventional pathway has not been yet reported, despite that genetic evidence suggests that conventional secretion also plays a significant role in the export of capsule polysaccharides. Thus, we were also careful not to highlight EV as the main route of virulence factors in the manuscript.

      (4) There is insufficient background in the introduction linking what is known about the ERQC process to secretion in general. The topic changes from the ERQC process to fungal virulence factor, with a primary focus on non-classical (EV-based) secretion. Classical secretion should also be discussed without assuming that non classical (EV) secretion is the major pathway contributing to fungal virulence.

      We appreciate your insightful comments highlighting the need for more background on the ERQC process and its relationship with secretion. To address the reviewer’s concerns, we have added sentences to describe the key roles of ERQC in conventional protein secretion in the Introduction (Page 5, lines 102-106).

      (5) Figure 1A. What does the blue filled circle with the red outline signify? Fig 1 A legend is not well explained. A summary using material provided in the intro/discussion should be included to briefly explain the process and the differences between fungal species. Please also be aware that the intro starts describing the human ERQC process and then switches to what happens in S. cerevisiae.

      We have revised Figure 1A by removing the red circle and updated the figure legend in the revised manuscript to include more detailed information about the ERQC differences across higher eukaryotes and fungal species.

      (6) Figure 2A. There are no units on the Y-axis. Presumably, the scale is the same for all 3 strains.

      Thank you for your comments. The Y-axis is the same for all three strains and, as in Fig. 2C, and represents the relative fluorescence intensity obtained from the HPLC analysis. We added the units on the Y-axis in Fig. 2A.

      (7) If Mnl1 and 2 have proposed roles in proteasomal degradation, wouldn't they be expected to have ER retention signals, like Ugg1?

      We appreciate your valuable insights regarding the absence of ER retention signals in Mnl1 and Mnl2. Previous studies have shown that Saccharomyces cerevisiae Mnl1/Htm1 does not possess canonical KDEL/HDEL-like ER retention signals. Instead, its retention in the ER lumen is facilitated through its interaction with protein disulfide isomerase Pdi1, which contains an HDEL sequence (Gauss et al. 2011). Thus, it is expected that non-canonical retention mechanisms—such as interactions with other ER proteins—could contribute to the retention of Mnl1 and Mnl2 within the ER. We added this information to the revised manuscript (Page 8, lines 154-159).

      (8) Figure 1 C qPCR showing change in mRNA in response to ER stress should not be grouped in this figure. It could be standalone or discussed when the spot dilution assays are performed. Anyway, spots tests are more convincing of a role in stress response than qPCR as the ugg1 mutant is sensitive to tunicamycin, DTT and cell wall stressing agents.

      As suggested by the reviewer, we have reorganized the qPCR data as a part of Figure 3 (Figure 3A) in the revised manuscript.

      (9) It is odd that mns1/101 mutants are not sensitive to ER and CW stress given their proposed differing location/function in the pathway (Figure 1) determined from the N-linked profiling. Any explanation? Could there be redundancy?

      We appreciate the reviewer’s observation regarding the lack of ER and CW stress sensitivity in the mns1_Δ and _mns101_Δ mutants, despite their proposed roles in _N-glycan processing. We had previously reported that the C. neoformans alg3_Δ mutant, lacking a critical enzyme responsible for the synthesis of Dol-PP-Man<sub>6</sub>GlcNAc<sub>2</sub> in the _N-glycosylation pathway, exhibited clearly impaired N-glycan elongation, but showed no detectable growth defects even under stress conditions in vitro. However, alg3_Δ is avirulent in _in vivo pathogenicity (Thak et al., 2020). Similarly, the mns1_Δ_101_Δ double mutant shows glycan-processing defects that do not compromise cellular fitness under stress conditions but result in attenuated virulence in animal models. These findings suggest that some glycosylation-related defects may impact more severely _in vivo pathogenicity rather than in vitro stress sensitivity.

      (10) Although the Silver-stained gels of the ugg1 mutant are not particularly informative, why weren't they (and Con A blots) performed for the other mutants?

      The overall decrease of hypermannosylated glycans observed in the ugg1_Δ mutant allowed us to detect clear alterations in protein glycosylation patterns in the lectin blot using _Galanthus nivalis agglutinin, which recognizes terminal α1,2-, α1,3-, and α1,6-linked mannose residues. In contrast, the limited changes of a few glycan species in other mutants, including mns1_Δ, _mns101_Δ, and _mns1_Δ_101_Δ, are relatively subtle to be detected in the lectin blot, due to only minor differences in the average lengths of their _N-glycans compared to the WT. Therefore, we presented the lectin blotting data only for the _ugg1_Δ mutant.

      (11) If there is ER stress under normal conditions in the Ugg1 mutant then technically this mutant should be growing more slowly under normal conditions. This is difficult to predict in a spot dilution assay where growth is only visualized at day three when any growth defect may have been corrected. The slower growth rather than the reduced secretion of GXM specifically is therefore more likely to be responsible for the reduced virulence.

      We appreciate the reviewer’s insightful comment regarding the interplay between ER stress, growth defects, and virulence attenuation in the ugg1_Δ mutant. While retarded growth in _C. neoformans is often associated with reduced virulence, there are a few exceptions. For instance, disruptions in cell cycle progression in C. neoformans have been reported to result in larger capsule sizes, which rather enhance in vivo virulence when analyzed in Galleria mellonella infection models (García-Rodas et al., 2014). This highlights that growth defect alone is not sufficient for virulence attenuation. In the case of the _ugg1_Δ mutant, we speculate that the almost complete loss of virulence is attributed not only to its growth retardation but also to its impaired secretion of key virulence factors, including the polysaccharide capsule.

      (12) The rationale for using leucine analogue 5',5',5'-trifluoroleucine (TFL), in a growth assay (Fig. 3C) to determine whether the defective ugg1Δ phenotypes are induced by ER stress caused by misfolded protein accumulation is not explained.

      The leucine analogue 5',5',5'-trifluoroleucine (TFL) can be incorporated into newly synthesized proteins, disrupting normal folding and thus leading to the generation of misfolded proteins (Trotter et al., 2002; Cowie et al., 1959). In the context of a defective ERQC pathway, these misfolded proteins cannot be adequately repaired, resulting in their accumulation and triggering ER stress. Excessive ER stress may ultimately inhibit cell growth in the presence of TFL. This explanation has been incorporated into the revised manuscript (Page 11, lines 236–241).

      (13) I would argue that only the Ugg1 and double Mns mutant were defective in virulence. For the single mutants, it looks like no difference was found relative to WT. The longer median survival of these mutants (if significant) is most likely due to poor infection technique.

      We agree with the reviewer’s opinion that the mns1_Δ and _mns101_Δ single mutants have no significant difference in _in vivo virulence compared to the WT strain, unlike the _mns1_Δ_101_Δ double mutant which showed significant attenuated virulence. We had previously addressed that in the manuscript (Page 13, lines 267-269).

      (14) The authors conclude that the ugg1Δ strain specifically is impaired in extracellular secretion of capsular polysaccharides but is this via classical (SAV1) secretion or EVs?

      In addition to EV-mediated transport, capsular polysaccharide secretion can occur via the Sav1 (Sec4p)-mediated classical secretion pathway. However, our proteome data of total cell lysates indicated that the protein levels of Sav1 were comparable between the WT and _ugg1_Δ strains, suggesting that Sav1p function itself might not be impaired. Given that the _ugg1_Δ mutant exhibits altered vesicular structures (Supplementary Fig. S6) and loss of microvesicles (Fig. 8A), we speculate that a defect might occur at a post-Sav1p step, such as vesicle fusion with the plasma membrane, likely contributing to the complete defect in secretion of capsular polysaccharides in the _ugg1_Δ strain, in which EV biogenesis and defective cargo loading are severely impaired, producing EVs that lack capsular polysaccharides (Figure 8F). However, further studies should be carried out to define the contribution of SAV1 to the secretion of capsular polysaccharides in in the _ugg1_Δ strain.

      (15) The rationale for doing 7 H is very confusing.

      The experiment assessing raffinose utilization as a carbon source was inspired by the previous work of Garcia-Rivera et al., reporting that the _cap59_Δ mutant is unable to utilize raffinose due to a defect in the secretion of raffinose-hydrolyzing enzymes. As another way to investigate potential defects in the conventional secretion pathway, we investigated the growth of the _ugg1_Δ mutant in the presence of raffinose. Due to our extensive data length, we have decided to remove this complementary data from the manuscript.

      (16) It is speculated in the discussion that ER stress impacts lipid/sterol synthesis and that LDs (lipid droplets?) aid the UPR and ERAD in degrading misfolded proteins during ER stress in S. cerevisiae. The authors mention that they observed a drastic increase in LDs in the ugg1Δ mutant. Where is this data? Even with the data, this is all speculation. The authors also speculate that increased numbers of vacuoles in ugg1 (where is the data?) could be the cause of the altered vesicular structures observed in the mutants, which may indicate abnormal lipid homeostasis caused by the ERQC defects, which could, in turn, affect EV biogenesis. Again, this is speculative.

      The data on lipid droplets (LDs) and vacuole staining are presented in Supplementary Figure S6, showing a drastic increase in LDs and an increased in vacuolar size in the _ugg1_Δ mutant compared to the wild-type strain, especially in capsule-inducing conditions. In addition to such changes in vesicular structures, our preliminary data on sphingolipids and sterol analysis in the surface lipid fraction of the _ugg1_Δ mutant led us to propose the hypothesis that ERQC defects may impact lipid metabolism, which in turn could influence EV biogenesis and membrane properties. It is expected that these findings would provide a strong foundation for future studies exploring the link between ERQC, lipid homeostasis, and EV biogenesis. We have revised our speculation on the association of abnormal lipid homeostasis, caused by ERQC, with EV biogenesis more appropriately by adding the information on our preliminary data of lipid profiles and mentioning that the _ugg1_Δ mutant lacks microvesicles, which are derived from the plasma membrane (Page 24, lines 554-559).

      Reviewer #2 (Recommendations for the authors):

      (1) My suggestions for the authors are the same as those presented in the public review: (1) reducing the text in certain sections of the paper to improve readability for the audience, and (2) reconsidering the figures to reduce the amount of information in each one, moving some of the content to the supplementary material.

      We thank the reviewer for their constructive suggestions regarding the organization and readability of the manuscript. As suggested, we addressed your concerns as follows:

      (1) Reducing the text in the Introduction, Results, and Discussion sections by removing repetitive statements and simplifying complex descriptions where possible.

      (2) Changing the presentation of figures: we have also reorganized the presentation of some data by moving non-essential data to the supplementary material. The updated figures and supplementary materials have been clearly referenced in the text to guide readers.

      (3) Reorganization of materials and methods: some parts of methods were moved to Supplementary Information

      (4) Removal of Figure 7H and the sentences describing the result

      More detailed explanations on the reduction and reorganization are also described in the response to the major comments (2) and (3) made by Reviewer #1.

      (2) Figure 3, for example, shows no difference in fungal growth under different cultivation conditions. This information is valuable but could be mentioned in the text, with the image provided as supplementary material, focusing the figure only on images that show significant growth differences among the strains. I suggest a similar approach for other figures so that the authors can include only the most relevant results in the main body of the article and move some figures to the supplementary materials.

      For Fig. 3, the spotting data of the double mutant (previously Fig. 3B) is now presented in the supplementary information (Supplementary Fig. S3B). Additionally, the subcellular localization data (previously Fig 3E) was also moved to the supplementary material (Supplementary Fig. S4A).

      Reviewer #3 (Recommendations for the authors):

      (1) Line 43 "EV-mediated transport of virulence bags" doesn't make sense. EVs have been described as "virulence bags" (and are in this work later in the introduction) but this should here be "transport of virulence factors" or "compounds associated with virulence" but only if you have confirmed that the "cargo" is consistent with this- which is not evident in the abstract.

      Thank you for your insightful comment. We have revised this to "EV-mediated transport of virulence factors" in line with your suggestion.

      (2) Line 49 "secretory pathway" - is there not more than one secretion pathway?

      Thank you for pointing this out. The term "secretory pathway" has been updated to "secretory pathways" to acknowledge the presence of both conventional and unconventional secretion mechanisms.

      (3) Line 53 "recognizes folding defects, repairs them, and ensures the translocation of irreparable misfolded proteins" should be "recognizes folding defects and repairs them or ensures the translocation of irreparable misfolded proteins.

      Thank you for pointing this out. We have revised the sentence as you suggested.

      (4) Lines 88-90 ALG needs to be written out the first time - Asn-linked glycans. Also, consider adding that ALG genes are present in most eukaryotes as it is unclear what you are comparing C. neoformans to.

      Thank you for your helpful comment. We have revised the text to write out "ALG" as "Asn-linked glycosylation" and added the sentence “ALG genes are evolutionary conserved in most eukaryotes” in the revised manuscript (Page 4, line 84).

      (5) Line 99 Cryptococcus has already been abbreviated to C. so don't write it out again.

      We have corrected "Cryptococcus" to “C.” throughout the manuscript after its first mention.

      (6) Line 152- tunicamycin and DTT are not described yet, which may make it challenging for some readers to understand what these drugs are doing/why they were used. What is on lines 156 and 157 for these drugs should go up with the first mention of these drugs.

      Thank you for your helpful suggestion. We have revised the manuscript to include the descriptions and purpose of using tunicamycin (TM) and dithiothreitol (DTT) immediately following their first mention, as recommended (Page 10, lines 208-210).

      (7) The text for Figure 1 C is inaccurate. High temperature also induced KAR2, as noted above, but inaccurately stated in line 160. There is no comment on the significant UGG1 increase with tunicamycin or that KAR2 was highest in this condition.

      Thank you for your thoughtful comment. We have better clarified the significant increase of UGG1 expression following tunicamycin treatment and KAR2 induction upon heat stress in the revised manuscript (Page 10, lines 216-217). Please note that Fig. 1C was revised and is now referred to as Fig. 3A.

      (8) Figure 2B is not well explored/explained. There appears to be more protein in the mutant, including of higher weight in the intracellular compartment. It is difficult to ascertain if there is more too in the secretion phase with this gel. The methods do not specifically describe the concentration of protein added - just volume. Is what we are seeing a loading issue vs real differences?

      Thank you for your insightful comments regarding Figure 2B. We added information on amounts of protein (30 µg per lane) in the legend of Figure 2B.

      The main purpose of Fig. 2B is to examine the altered glycosylation pattern of ERQC by detecting glycoproteins using the Galanthus nivalis agglutinin, which specifically bind terminal α1,2-, α1,3-, and α1,6-linked mannose residues. The result of lectin blotting indicated that glycoproteins are more abundantly detected in the secretion fraction compared to in the soluble intracellular fraction, consistent with the general notion that more than 50% of secretory proteins are glycoproteins. Also, the more abundant proteins with decreased molecular weight in the secretion fraction of ugg1_Δ mutant supported the _N-glycan profiles with decreased hypermannosylation in _ugg1_Δ mutant. We added the purpose and more detailed interpretation on Figure 2B in the revised manuscript (Page 9, lines 174-179).

      (9) Line 242 "melanin pigment" is redundant as melanin is a pigment.

      We thank the reviewer for pointing out the redundancy in the phrase. We revised the text to simply state "melanin".

      (10) Line 250 drops "completely" especially as the mutant did colonize the lungs of mice.

      To avoid any possible misleading, we removed the term "completely" in the revised manuscript.

      (11) Line 275- need to reference 18B7 as it is first introduced here.

      We added the reference on the antibody 18B7 in the revised manuscript.

      (12) Line 308- there are specific techniques to measure GXM size that could validate or refute the statement on "incomplete" polysaccharides. For example, DOI:10.1128/EC.00268-09.

      We appreciated the valuable suggestion on specific techniques to measure GXM size, which will be one of key experiments in our future study. In the revised manuscript we cited the suggested reference to indicate the need for validation of our statement (Page 14, lines 316-318).

      (13) Line 496 "mammals" - why is this used when the study is on a fungus, not a mammal? The structure of the first 2 paragraphs can be clearer to focus more on fungal biology.

      We have compared both mammals and fungi to emphasize that the ERQC system is conserved among eukaryotes but diverged with a few species-specific features. This comparison is relevant in the context of understanding the evolutionary unique features of ERQC pathways in C. neoformans. We modified the first 2 paragraphs to clarify the main issue of our present study (Page 21, lines 472-483).

      (14) Line 525- the ugg mutant was not avirulent as CFU was present and histopathology in the supplementary figures shows the tissue with ugg1 deletion was not normal (although the images are not especially easy to review). Yes, the mutant did not kill under your test conditions, but it was not avirulent (incapable of causing disease). Significantly attenuated or other descriptors should be utilized. Line 548 is also thus incorrect "complete loss of virulence").

      We appreciate the reviewer’s concern regarding the description of the _ugg1_Δ mutant as avirulent. We agree that the use of merely “avirulent" may not fully capture the observed phenotypes in the CFU and histopathological data, since we cannot exclude the possibility that the _ugg1_Δ mutant retains the ability to establish an infection. Thus, we have revised the text by describing the _ugg1_Δ mutant as "almost avirulent".

      (15) Line 597- the study by Fukuoka used kidney cells. It is misleading to not clearly state that this finding of ER stress was NOT done in fungi as the way it is presented makes it read as if this work was performed in C. neoformans. This should be clarified. This should also be double-checked and clarified for other statements, such as the reference to Harada in line 606, as this study used melanoma cells. These cell types are very different from cryptococcus- though I absolutely concur that lessons can be learned from comparative assessments.

      We thank the reviewer for pointing out the need to clarify the experimental context of the cited studies. We explicitly stated the host cell types used in the referenced studies by Fukuoka et al. and by Harada et al., respectively, in the revised manuscript (Page 25, lines 560 and 568).

    1. Author response:

      Joint Public Review:

      Summary:

      In this study, Daniel et al. used three cognitive tasks to investigate behavioral signatures of cerebellar degeneration. In the first two tasks, the authors found that if an equation was incorrect, reaction times slowed significantly more for cerebellar patients than for healthy controls. In comparison, the slowing in the reaction times when the task required more operations was comparable to normal controls. In the third task, the authors show increased errors in cerebellar patients when they had to judge whether a letter string corresponded to an artificial grammar.

      Strengths:

      Overall, the work is methodologically sound and the manuscript well written. The data do show some evidence for specific cognitive deficits in cerebellar degeneration patients.

      Thank you for the thoughtful summary and constructive feedback. We are pleased that the methodological rigor and clarity of the manuscript were appreciated, and that the data were recognized as providing meaningful evidence regarding cognitive deficits in cerebellar degeneration.

      Weaknesses:

      The current version has some weaknesses in the visual presentation of results. Overall, the study lacks a more precise discussion on how the patterns of deficits relate to the hypothesized cerebellar function. The reviewers and the editor agreed that the data are interesting and point to a specific cognitive deficit in cerebellar patients. However, in the discussion, we were somewhat confused about the interpretation of the result: If the cerebellum (as proposed in the introduction) is involved in forming expectations in a cognitive task, should they not show problems both in the expected (1+3 =4) and unexpected (1+3=2) conditions? Without having formed the correct expectation, how can you correctly say "yes" in the expected condition? No increase in error rate is observed - just slowing in the unexpected condition. But this increase in error rate was not observed. If the patients make up for the lack of prediction by using some other strategy, why are they only slowing in the unexpected case? If the cerebellum is NOT involved in making the prediction, but only involved in detecting the mismatch between predicted and real outcome, why would the patients not show specifically more errors in the unexpected condition?

      Thank you for asking these important questions and initiating an interesting discussion. While decision errors and processing efficiency are not fully orthogonal and are likely related, they are not necessarily the same internal construct. The data from Experiments 1 and 2 suggest impaired processing efficiency rather than increased decision error. Reaction time slowing without increased error rates suggests that the CA group can form expectations but respond more slowly, possibly due to reduced processing efficiency. Thus, this analysis of our data can indicate that the cerebellum is not essential for forming expectations, but it plays a critical role in processing their violations.

      Relatedly, two important questions remain open in the literature concerning the cerebellum’s role in expectation-related processes. The first is whether the cerebellum contributes to the formation of expectations or the processing of their violations. In Experiments 1 and 2, the CA group did not show impairments in the complexity manipulation. As mentioned by the editors, solving these problems requires the formation of expectations during the reasoning process. Given the intact performance of the CA group, these results suggest that they are not impaired in forming expectations. However, in both Experiments 1 and 2, patients exhibited selective impairments in solving incorrect problems compared to correct problems. Since expectation formation is required in both conditions, but only incorrect problems involve a violation of expectation (VE), we hypothesize that the cerebellum is involved in VE processes. We suggest that the CA group can form expectations in familiar tasks, but are impaired in processing unexpected compared to expected outcomes. This supports the notion that the cerebellum contributes to VE, rather than to forming expectations.

      Importantly, while previous experimental manipulations(1–6) have provided important insights, some may have confounded these two internal constructs due to task design limitations (e.g., lack of baseline conditions). Notably, some of these previous studies did not include control conditions (e.g., correct trials) where there was no VE. In addition, other studies did not include a control measure (e.g., complexity effect), which limits their ability to infer the specific cerebellar role in expectation manipulation.

      In addition to the editors’ question, we would like to raise a second important question regarding cerebellar contributions to expectations-related processes. While our findings point to a both unique and consistent cerebellar role in VE processes in sequential tasks, we do not aim to generalize this role to all forms of expectations(2,7,8). Another interesting process is how expectations are formed. Expectations can be formed by different processes(2,7,8), and this should be taken into account when defining cerebellar function. For instance, previous experimental paradigms(1–6), aiming to assess VE, utilized tasks that manipulated rule-based errors or probability-based errors, but did not fully dissociate these constructs. In our Experiments 1 and 2, we specifically manipulated error signals derived from previous top-down effects. However, in Experiment 3, the participant’s VE was derived from within-task processes. In Experiment 3, expectations were formed either by statistical learning or by rule-based learning. During the test stage, when evaluating sensitivity to correct and incorrect problems, the CA group showed deficits only when expectations were formed based on rules. These findings suggest that cerebellar patients may retain a general ability to form expectations. However, their deficit appears to be specific to processing rule-based VE, but not statistically derived VE. This pattern of results aligns with the results of Experiments 1 and 2 where the rules are known and based on pre-task knowledge.

      We suggest that these two key questions are relevant to both motor and non-motor domains and were not fully addressed even in the previous, well-studied motor domain. Thus, the current experimental design used in three different experiments provides a valuable novel experimental perspective, allowing us to distinguish between some, but not all, of the processes involved in the formation of expectations and their violations. For instance, to our knowledge, this is the first study to demonstrate a selective impairment in rule-based VE processing in cerebellar patients across both numerical reasoning and artificial grammar tasks.

      If feasible, we propose that future studies should disentangle different forms of VE by operationalizing them in experimental tasks in an orthogonal manner. This will allow us, as a scientific community, to achieve a more detailed, well-defined cerebellar motor and non-motor mechanistic account.

      References

      (1) Butcher, P. A. et al. The cerebellum does more than sensory prediction error-based learning in sensorimotor adaptation tasks. J. Neurophysiol. 118, 1622–1636 (2017).

      (2) Moberget, T., Gullesen, E. H., Andersson, S., Ivry, R. B. & Endestad, T. Generalized role for the cerebellum in encoding internal models: Evidence from semantic processing. J. Neurosci. 34, 2871–2878 (2014).

      (3) Riva, D. The cerebellar contribution to language and sequential functions: evidence from a child with cerebellitis. Cortex. 34, 279–287 (1998).

      (4) Sokolov, A. A., Miall, R. C. & Ivry, R. B. The Cerebellum: Adaptive Prediction for Movement and Cognition. Trends Cogn. Sci. 21, 313–332 (2017).

      (5) Fiez, J. A., Petersen, S. E., Cheney, M. K. & Raichle, M. E. Impaired non-motor learning and error detection associated with cerebellar damage. A single case study. Brain 115 Pt 1, 155–178 (1992).

      (6) Taylor, J. A., Krakauer, J. W. & Ivry, R. B. Explicit and Implicit Contributions to Learning in a Sensorimotor Adaptation Task. J. Neurosci. 34, 3023–3032 (2014).

      (7) Sokolov, A. A., Miall, R. C. & Ivry, R. B. The Cerebellum: Adaptive Prediction for Movement and Cognition. Trends Cogn. Sci. 21, 313–332 (2017).

      (8) Fiez, J. A., Petersen, S. E., Cheney, M. K. & Raichle, M. E. IMPAIRED NON-MOTOR LEARNING AND ERROR DETECTION ASSOCIATED WITH CEREBELLAR DAMAGEA SINGLE CASE STUDY. Brain 115, 155–178 (1992).

      (9) Picciotto, Y. De, Algon, A. L., Amit, I., Vakil, E. & Saban, W. Large-scale evidence for the validity of remote MoCA administration among people with cerebellar ataxia administration among people with cerebellar ataxia. Clin. Neuropsychol. 0, 1–17 (2024).

      (10) Binoy, S., Monstaser-Kouhsari, L., Ponger, P. & Saban, W. Remote Assessment of Cognition in Parkinsons Disease and Cerebellar Ataxia: The MoCA Test in English and Hebrew. Front. Hum. Neurosci. 17, (2023).

      (11) Saban, W. & Ivry, R. B. Pont: A protocol for online neuropsychological testing. J. Cogn. Neurosci. 33, 2413–2425 (2021).

      (12) Algon, A. L. et al. Scale for the assessment and rating of ataxia : a live e ‑ version. J. Neurol. (2025). doi:10.1007/s00415-025-13071-7

      (13) McDougle, S. D. et al. Continuous manipulation of mental representations is compromised in cerebellar degeneration. Brain 145, 4246–4263 (2022).

    1. Author response:

      eLife Assessment

      This important study uses an innovative task design combined with eye tracking and fMRI to distinguish brain regions that encode the value of individual items from those that accumulate those values for value-based choices. It shows that distinct brain regions carry signals for currently evaluated and previously accumulated evidence. The study provides solid evidence in support of most of its claims, albeit with current minor weaknesses concerning the evidence in favour of gaze-modulation of the fMRI signal. The work will be of interest to neuroscientists working on attention and decision-making.

      We thank the Editor and Reviewers for their summary of the strengths of our study, and for their thoughtful review and feedback on our manuscript. We plan to undertake some additional analyses suggested by the Reviewers to bolster the evidence in favor of gaze-modulation of the fMRI signal.

      Reviewer #1 (Public review):

      Summary:

      This study builds upon a major theoretical account of value-based choice, the 'attentional drift diffusion model' (aDDM), and examines whether and how this might be implemented in the human brain using functional magnetic resonance imaging (fMRI). The aDDM states that the process of internal evidence accumulation across time should be weighted by the decision maker's gaze, with more weight being assigned to the currently fixated item. The present study aims to test whether there are (a) regions of the brain where signals related to the currently presented value are affected by the participant's gaze; (b) regions of the brain where previously accumulated information is weighted by gaze.

      To examine this, the authors developed a novel paradigm that allowed them to dissociate currently and previously presented evidence, at a timescale amenable to measuring neural responses with fMRI. They asked participants to choose between bundles or 'lotteries' of food times, which they revealed sequentially and slowly to the participant across time. This allowed modelling of the haemodynamic response to each new observation in the lottery, separately for previously accumulated and currently presented evidence.

      Using this approach, they find that regions of the brain supporting valuation (vmPFC and ventral striatum) have responses reflecting gaze-weighted valuation of the currently presented item, whereas regions previously associated with evidence accumulation (preSMA and IPS) have responses reflecting gaze-weighted modulation of previously accumulated evidence.

      Strengths:

      A major strength of the current paper is the design of the task, nicely allowing the researchers to examine evidence accumulation across time despite using a technique with poor temporal resolution. The dissociation between currently presented and previously accumulated evidence in different brain regions in GLM1 (before gaze-weighting), as presented in Figure 5, is already compelling. The result that regions such as preSMA respond positively to |AV| (absolute difference in accumulated value) is particularly interesting, as it would seem that the 'decision conflict' account of this region's activity might predict the exact opposite result. Additionally, the behaviour has been well modelled at the end of the paper when examining temporal weighting functions across the multiple samples.

      Thank you!

      Weaknesses:

      The results relating to gaze-weighting in the fMRI signal could do with some further explication to become more complete. A major concern with GLM2, which looks at the same effects as GLM1 but now with gaze-weighting, is that these gaze-weighted regressors may be (at least partially) correlated with their non-gaze-weighted counterparts (e.g., SVgaze will correlate with SV). But the non-gaze-weighted regressors have been excluded from this model. In other words, the authors are not testing for effects of gaze-weighting of value signals *over and above* the base effects of value in this model. In my mind, this means that the GLM2 results could simply be a replication of the findings from GLM1 at present. GLM3 is potentially a stronger test, as it includes the value signals and the interaction with gaze in the same model. But here, while the link to the currently attended item is quite clear (and a replication of Lim et al, 2011), the link to previously accumulated evidence is a bit contorted, depending upon the interpretation of a behavioural regression to interpret the fMRI evidence. The results from GLM3 are also, by the authors' own admission, marginal in places.

      We thank the Reviewer for their thoughtful critique. We acknowledge that our formulation of GLM2 does not test for the effects of gaze-weighted value signals beyond the base effects of value, only in place of the base effects of value. In our revision, we plan to examine alternative ways of quantifying the relative importance of gaze in these results.  

      Reviewer #2 (Public review):

      Summary:

      In this paper, the authors seek to disentangle brain areas that encode the subjective value of individual stimuli/items (input regions) from those that accumulate those values into decision variables (integrators) for value-based choice. The authors used a novel task in which stimulus presentation was slowed down to ensure that such a dissociation was possible using fMRI despite its relatively low temporal resolution. In addition, the authors leveraged the fact that gaze increases item value, providing a means of distinguishing brain regions that encode decision variables from those that encode other quantities such as conflict or time-on-task. The authors adopt a region-of-interest approach based on an extensive previous literature and found that the ventral striatum and vmPFC correlated with the item values and not their accumulation, whereas the pre-SMA, IPS, and dlPFC correlated more strongly with their accumulation. Further analysis revealed that the pre-SMA was the only one of the three integrator regions to also exhibit gaze modulation.

      Strengths:

      The study uses a highly innovative design and addresses an important and timely topic. The manuscript is well-written and engaging, while the data analysis appears highly rigorous.

      Weaknesses:

      With 23 subjects, the study has relatively low statistical power for fMRI.

      We thank the Reviewer for their comments on the strengths of the manuscript, and for highlighting an important limitation. We agree that the number of participants in the study, after exclusions, was lower than your typical fMRI study. However, it is important to note that we do have a lot of data for each subject. Due to our relatively fast, event-related design, we have on average 65 trials per subject (SD = 18) and 5.95 samples per trial (SD \= 4.03), for an average of 387 observations per subject (SD = 18). Our model-based analysis looks for very specific neural time courses across these ~387 observations, giving us substantial power to detect our effects of interest. Still, we acknowledge that our small number of subjects does still limit our power and our ability to generalize to other subjects. We plan to add the following disclaimer to the Discussion section:

      “Together with our limited sample size (n = 23), we may not have had adequate statistical power required to observe consistent effects. Additional research with larger sample sizes is needed to resolve this issue.”

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      A cortico-centric view is dominant in the study of the neural mechanisms of consciousness. This investigation represents the growing interest in understanding how subcortical regions are involved in conscious perception. To achieve this, the authors engaged in an ambitious and rare procedure in humans of directly recording from neurons in the subthalamic nucleus and thalamus. While participants were in surgery for the placement of deep brain stimulation devices for the treatment of essential tremor and Parkinson's disease, they were awakened and completed a perceptual-threshold tactile detection task. The authors identified individual neurons and analyzed single-unit activity corresponding with the task phases and tactile detection/perception. Among the neurons that were perception-responsive, the authors report changes in firing rate beginning ~150 milliseconds from the onset of the tactile stimulation. Curiously, the majority of the perception-responsive neurons had a higher firing rate for missed/not perceived trials. In summary, this investigation is a valuable addition to the growing literature on the role of subcortical regions in conscious perception.

      Strengths:

      The authors achieved the challenging task of recording human single-unit activity while participants performed a tactile perception task. The methods and statistics are clearly explained and rigorous, particularly for managing false positives and non-normal distributions. The results offer new detail at the level of individual neurons in the emerging recognition of the role of subcortical regions in conscious perception.

      We thank the reviewer for their positive comments.

      Weaknesses:

      "Nonetheless, it remains unknown how the firing rate of subcortical neurons changes when a stimulus is consciously perceived." (lines 76-77) The authors could be more specific about what exactly single-unit recordings offer for interrogating the role of subcortical regions in conscious perception that is unique from alternative neural activity recordings (e.g., local field potential) or recordings that are used as proxies of neural activity (e.g., fMRI).

      We agree with the reviewer that the contribution of micro-electrode recordings was not sufficiently put forward in our manuscript. We added the following sentences to the discussion, when discussing the multiple types of neurons we found:

      Single-unit recordings provide a much higher temporal resolution than functional imaging, which helps assess how the neural correlates of consciousness unfold over time. Contrary to local field potentials, single-unit recordings can expose the variety of functional roles of neurons within subcortical regions, thereby offering a potential for a better mechanistic understanding of perceptual consciousness.

      Related comment for the following excerpts:

      "After a random delay ranging from 0.5 to 1 s, a "respond" cue was played, prompting participants to verbally report whether they felt a vibration or not. Therefore, none of the reported analyses are confounded by motor responses." (lines 97-99).

      "These results show that subthalamic and thalamic neurons are modulated by stimulus onset, irrespective of whether it was reported or not, even though no immediate motor response was required." (lines 188190).

      "By imposing a delay between the end of the tactile stimulation window and the subjective report, we ensured that neuronal responses reflected stimulus detection and not mere motor responses." (lines 245247).

      It is a valuable feature of the paradigm that the reporting period was initiated hundreds of milliseconds after the stimulus presentation so that the neural responses should not represent "mere motor responses". However, verbal report of having perceived or not perceived a stimulus is a motor response and because the participants anticipate having to make these reports before the onset of the response period, there may be motor preparatory activity from the time of the perceived stimulus that is absent for the not perceived stimulus. The authors show sensitivity to this issue by identifying task-selective neurons and their discussion of the results that refer to the confound of post-perceptual processing. Still, direct treatment of this possible confound would help the rigor of the interpretation of the results.

      We agree with the reviewer that direct treatment would have provided the best control. One way to avoid motor preparation is to only provide the stimulus-effector mapping after the stimulus presentation (Bennur & Gold, 2011; Twomey et al., 2016; Fang et al., 2024). Other controls to avoid post-perceptual processing used in consciousness research consist of using no-report paradigms (Tsuchiya et al., 2015) as we did in previous studies (Pereira et al., 2021; Stockart et al., 2024). Unfortunately, neither of these procedures was feasible during the 10 minutes allotted for the research task in an intraoperative setting with auditory cues and vocal responses. We would like to highlight nonetheless that the effects we report are shortlived and incompatible with sustained motor preparation activity.

      We added the following sentence to the discussion:

      Future studies ruling out the presence of motor preparation triggered by perceived stimuli (Bennur & Gold, 2011; Fang et al., 2024; Twomey et al., 2016) and verifying that similar neuronal activity occurs in the absence of task-demands (no-reports; Tsuchiya et al., 2015) or attention (Wyart & Tallon-Baudry, 2008) will be useful to support that subcortical neurons contribute specifically to perceptual consciousness.

      "When analyzing tactile perception, we ensured that our results were not contaminated with spurious behavior (e.g. fluctuation of attention and arousal due to the surgical procedure)." (lines 118-117).

      Confidence in the results would be improved if the authors clarified exactly what behaviors were considered as contaminating the results (e.g., eye closure, saccades, and bodily movements) and how they were determined.

      This sentence was indeed unclear. It introduced the trial selection procedure we used to compensate for drifts in the perceptual threshold, which can result from fluctuations in attention or arousal. We modified the sentence, which now reads:

      When analyzing tactile perception, we ensured that our results were not contaminated by fluctuating attention and arousal due to the surgical procedure. Based on objective criteria, we excluded specific series of trials from analyses and focused on time windows for which hits and misses occurred in commensurate proportions (see methods).

      During the recordings, the experimenter stood next to the patients and monitored their bodily movements, ensuring they did not close their eyes or produce any other bodily movements synchronous with stimulus presentation.

      The authors' discussion of the thalamic neurons could be more precise. The authors show that only certain areas of the thalamus were recorded (in or near the ventral lateral nucleus, according to Figure S3C). The ventral lateral nucleus has a unique relationship to tactile and motor systems, so do the authors hypothesize these same perception-selective neurons would be active in the same way for visual, auditory, olfactory, and taste perception? Moreover, the authors minimally interpret the location of the task, sensory, and perception-responsive neurons. Figure S3 suggests these neurons are overlapping. Did the authors expect this overlap and what does it mean for the functional organization of the ventral lateral nucleus and subthalamic nucleus in conscious perception?

      These are excellent questions, the answers to which we can only speculate. In rodents, the LT is known as a hub for multisensory processing, as over 90% of LT neurons respond to at least two sensory modalities (for a review, see Yang et al., 2024). Yet, no study has compared how LT neurons in rodents encode perceived and nonperceived stimuli across modalities. Evidence in humans is scarce, with only a few studies documenting supramodal neural correlates of consciousness at the cortical level with noninvsasive methods (Noel et al., 2018; Sanchez et al., 2020; Filimonov et al., 2022). We now refer to these studies in the revised discussion: Moreover, given the prominent role of the thalamus in multisensory processing, it will be interesting to assess if it is specifically involved in tactile consciousness or if it has a supramodal contribution, akin to what is found in the cortex (Noel et al., 2018; Sanchez et al., 2020; Filimonov et al., 2022).

      Concerning the anatomical overlap of neurons, we could not reconstruct the exact locations of the DBS tracts for all participants. Because of the limited number of recorded neurons, we preferred to refrain from drawing strong conclusions about the functional organization of the ventral lateral nucleus.

      "We note that, 6 out of 8 neurons had higher firing rates for missed trials than hit trials, although this proportion was not significant (binomial test: p = 0.145)." (lines 215-216).

      It appears that in the three example neurons shown in Figure 4, 2 out of 3 (#001 and #068) show a change in firing rate predominantly for the missed stimulations. Meanwhile, #034 shows a clear hit response (although there is an early missed response - decreased firing rate - around 150 ms that is not statistically significant). This is a counterintuitive finding when compared to previous results from the thalamus (e.g., local field potentials and fMRI) that show the opposite response profile (i.e., missed/not perceived trials display no change or reduced response relative to hit/perceived trials). The discussion of the results should address this, including if these seemingly competing findings can be rectified.

      We thank the reviewer for pointing out this limitation of the discussion. We avoided putting too much emphasis on these aspects due to the limited number of perception-selective neurons. Although subcortical connectivity models would predict that neurons in the thalamus should increase their firing rate for perceived stimuli, we were not surprised to see this heterogeneity as we had previously found neurons decreasing their firing rates for missed stimuli in the posterior parietal cortex (Pereira et al., 2021). We answer these points in response to the reviewer’s last comment below on the latencies of the effects.

      The authors report 8 perception-responsive neurons, but there are only 5 recording sites highlighted (i.e., filled-in squares and circles) in Figures S3C and 4D. Was this an omission or were three neurons removed from the perception-responsive analysis?

      Unfortunately, we could not obtain anatomical images for all participants. This information was present in the methods section, although not clearly enough:

      For 34 / 50 neurons, preoperative MRI and postoperative CT scans (co-registered in patient native space using CranialSuite) were available to precisely reconstruct surgical trajectories and recording locations (for the remaining 16 neurons, localizations were based on neurosurgical planning and confirmed by electrophysiological recordings at various depths).

      Therefore, we added the following sentence in Figures 2, 3, 4 and S3.

      [...] for patients for which we could obtain anatomical images.

      Could the authors speak to the timing of the responses reported in Figure 4? The statistically significant intervals suggested both early (~160-200ms) to late responses (~300ms). Some have hypothesized that subcortical regions are early - ahead of cortical activation that may be linked with conscious perception. Do these results say anything about this temporal model for when subcortical regions are active in conscious perception?

      We agree that response timing could have been better described. We performed a new analysis of the latencies at which our main effects were observed. This analysis revealed the existence of the two clusters mentioned by the reviewer very clearly. We now include this analysis in a new Figure 5 in the revised manuscript.

      We also performed a new analysis to support the existence of bimodal distributions and quantified the latencies. We added this text to the result section:

      We note that the timings of sensory and perception effects in Figures 3 and 4 showed a bimodal distribution with an early cluster (149 ms for sensory neurons; 121 ms for perception neurons; c.f. methods) and a later cluster (330 ms for sensory neurons; 315 ms for perception neurons; Figure 5). and this section to the methods:

      To measure bimodal timings of effect latencies, we fitted a two-component Gaussian mixture distribution to the data in Figure 5 by minimizing the mean square error with an interior-point method. We took the best of 20 runs with random initialization points and verified that the resulting mean square error was markedly (> 4 times) better than using a single component.

      We updated the discussion, including the points made in the comment about higher activity for missed stimuli (above):

      The early cluster’s average timing around 150 ms post-stimulus corresponds to the onset of a putative cortical correlate of tactile consciousness, the somatosensory awareness negativity (Dembski et al., 2021). Similar electroencephalographic markers are found in the visual and auditory modality. It is unclear, however, whether these markers are related to perceptual consciousness or selective attention (Dembski et al., 2021). The later cluster is centered around 300 ms and could correspond to a well known electroencephalographic marker, the P3b (Polich, 2007) whose association with perceptual consciousness has been questioned (Pitts et al., 2014; Dembski et al., 2021) although brain activity related to consciousness has been observed at similar timing even in the absence of report demands (Sergent et al., 2021; Stockart et al., 2024). It is also important to note that these clusters contain neurons with both increased and decreased firing rates following stimulus onset, similar to what was observed previously in the posterior parietal cortex (Pereira et al., 2021).

      Reviewer #2 (Public Review):

      The authors have studied subpopulations of individual neurons recorded in the thalamus and subthalamic nucleus (STN) of awake humans performing a simple cognitive task. They have carefully designed their task structure to eliminate motor components that could confound their analyses in these subcortical structures, given that the data was recorded in patients with Parkinson's Disease (PD) and diagnosed with an Essential Tremor (ET). The recorded data represents a promising addition to the field. The analyses that the authors have applied can serve as a strong starting point for exploring the kinds of complex signals that can emerge within a single neuron's activity. Pereira et. al conclude that their results from single neurons indicate that task-related activity occurs, purportedly separate from previously identified sensory signals. These conclusions are a promising and novel perspective for how the field thinks about the emergence of decisions and sensory perception across the entire brain as a unit.

      We thank the reviewer for these positive comments.

      Despite the strength of the data that was obtained and the relevant nature of the conclusions that were drawn, there are certain limitations that must be taken into consideration:

      (1) The authors make several claims that their findings are direct representations of consciousnessidentifiable in subcortical structures. The current context for consciousness does not sufficiently define how the consciousness is related to the perceptual task.

      This is indeed a complex issue in all studies concerned with perceptual consciousness and we were careful not to make such “direct” claims. Instead, we used the state-of-the-art tools available to study consciousness (see below) and only interpreted our findings with respect to consciousness in the discussion. For example, in the abstract, our claim is that “Our results provide direct neurophysiological evidence of the involvement of the subthalamic nucleus and the thalamus for the detection of vibrotactile stimuli, thereby calling for a less cortico-centric view of the neural correlates of consciousness.”

      In brief, first, we used near-threshold stimuli which allowed us to contrast reported vs. unreported trials while keeping the physical properties of the stimulus comparable. Second, we used subjective reports without incentive for participants to be more conservative or liberal in their response (e.g. through reward). Third, we introduced a random delay before the responses to limit confounding effects due to the report. We also acknowledged that “... it will be important in future studies to examine if similar subcortical responses are obtained when stimuli are unattended (Wyart & Tallon-Baudry, 2008), task-irrelevant (Shafto & Pitts, 2015), or when participants passively experience stimuli without the instruction to report them (i.e., no-report paradigms) (Tsuchyia et al., 2015)”. This last sentence now reads (to address a point made by Reviewer 1 about motor preparation):

      Future studies ruling out the presence of motor preparation triggered by perceived stimuli (Bennur & Gold, 2011; Fang et al., 2024; Twomey et al., 2016) and verifying that similar neuronal activity occurs in the absence of task-demands (no-reports; Tsuchiya et al., 2015) or attention (Wyart & Tallon-Baudry, 2008) will be useful to support that subcortical neurons contribute specifically to perceptual consciousness.

      (2) The current work would benefit greatly from a description and clarification of what all the neurons thathave been recorded are doing. The authors' criteria for selecting subpopulations with task-relevant activity are appropriate, but understanding the heterogeneity in a population of single neurons is important for broader considerations that are being studied within the field.

      We followed the reviewer’s suggestions and added new results regarding the latencies of the reported effects (new Figure 5). We also now show firing rates for hits, misses and overall sensory activity (hits and misses combined) for all perception-selective or sensory-selective (when behavior was good enough; Figure S5). Although a more detailed characterization of the heterogeneity of the neurons identified would have been relevant, it seems beyond the scope of the present study, especially given the relatively small number of neurons we identified, as well as the relative simplicity of the paradigm imposed by the clinical context in which we worked.

      (3) The authors have omitted a proper set of controls for comparison against the active trials, forexample, where a response was not necessary. Please explain why this choice was made and what implications are necessary to consider.

      We had mentioned this limitation in the discussion: Nevertheless, it will be important in future studies to examine if similar subcortical responses are obtained when stimuli are unattended (Wyart & TallonBaudry, 2008), task-irrelevant (Shafto & Pitts, 2015), or when participants passively experience stimuli without the instruction to report them (i.e., no-report paradigms) (Tsuchyia et al., 2015). We agree that such a control would have been relevant, but this was not feasible during the 10 minutes allotted for the research task in an intraoperative setting. These constraints are both clinical, to minimize discomfort for patients and practical, as is difficult to track neurons in an intraoperative setting for more than 10 minutes.

      We added a sentence to this effect in the discussion.

      Reviewer #3 (Public Review):

      Summary:

      This important study relies on a rare dataset: intracranial recordings within the thalamus and the subthalamic nucleus in awake humans, while they were performing a tactile detection task. This procedure allowed the authors to identify a small but significant proportion of individual neurons, in both structures, whose activity correlated with the task (e.g. their firing rate changed following the audio cue signalling the start of a trial) and/or with the stimulus presentation (change in firing rate around 200 ms following tactile stimulation) and/or with participant's reported subjective perception of the stimulus (difference between hits and misses around 200 ms following tactile stimulation). Whereas most studies interested in the neural underpinnings of conscious perception focus on cortical areas, these results suggest that subcortical structures might also play a role in conscious perception, notably tactile detection.

      Strengths:

      There are two strongly valuable aspects in this study that make the evidence convincing and even compelling. First, these types of data are exceptional, the authors could have access to subcortical recordings in awake and behaving humans during surgery. Additionally, the methods are solid. The behavioral study meets the best standards of the domain, with a careful calibration of the stimulation levels (staircase) to maintain them around the detection threshold, and an additional selection of time intervals where the behavior was stable. The authors also checked that stimulus intensity was the same on average for hits and misses within these selected periods, which warrants that the effects of detection that are observed here are not confounded by stimulus intensity. The neural data analysis is also very sound and well-conducted. The statistical approach complies with current best practices, although I found that, in some instances, it was not entirely clear which type of permutations had been performed, and I would advocate for more clarity in these instances. Globally the figures are nice, clear, and well presented. I appreciated the fact that the precise anatomical location of the neurons was directly shown in each figure.

      We thank the reviewer for this positive evaluation.

      Weaknesses:

      Some clarification is needed for interpreting Figure 3, top rows: in my understanding the black curve is already the result of a subtraction between stimulus present trials and catch trials, to remove potential drifts; if so, it does not make sense to compare it with the firing rate recorded for catch trials.

      The black curve represents the firing rate without any subtraction. We only subtracted the firing rates of catch trials in the statistical procedure, as the reviewer noted, to remove potential drift. We added (before baseline correction) to the legend of Figure 3.

      I also think that the article could benefit from a more thorough presentation of the data and that this could help refine the interpretation which seems to be a bit incomplete in the current version. There are 8 stimulus-responsive neurons and 8 perception-selective neurons, with only one showing both effects, resulting in a total of 15 individual neurons being in either category or 13 neurons if we exclude those in which the behavior is not good enough for the hit versus miss analysis (Figure S4A). In my opinion, it should be feasible to show the data for all of them (either in a main figure, or at least in supplementary), but in the present version, we get to see the data for only 3 neurons for each analysis. This very small selection includes the only neuron that shows both effects (neuron #001; which is also cue selective), but this is not highlighted in the text. It would be interesting to see both the stimulus-response data and the hit versus miss data for all 13 neurons as it could help develop the interpretation of exactly how these neurons might be involved in stimulus processing and conscious perception. This should give rise to distinct interpretations for the three possible categories. Neurons that are stimulus-responsive but not perception-selective should show the same response for both hits and misses and hence carry out indifferently conscious and unconscious responses. The fact that some neurons show the opposite pattern is particularly intriguing and might give rise to a very specific interpretation: if the neuron really doesn't tend to respond to the stimulus when hits and misses are put together, it might be a neuron that does not directly respond to the stimulus, but whose spontaneous fluctuations across trials affect how the stimulus is perceived when they occur in a specific time window after the stimulus. Finally, neuron #001 responds with what looks like a real burst of evoked activity to stimulation and also shows a difference between hits and misses, but intriguingly, the response is strongest for misses. In the discussion, the interesting interpretation in terms of a specific gating of information by subcortical structures seems to apply well to this last example, but not necessarily to the other categories.

      We now provide a supplementary Figure showing firing rates for hits, misses and the combination of both. The reviewer’s analysis about whether a perception-selective neuron also has to respond to the stimulus to be involved in gating is interesting. With more data, a finer characterization of these neurons would have been possible. In our study, it is possible that more neurons have similar characteristics as #001 (e.g. #032, #062, #068) but do not show a significant difference with respect to baseline when both hits and misses are considered. We now avoid interpreting null effects, especially considering the low number of trials with near-threshold detection behavior we could collect in 10 minutes. 

      We also realized that we had not updated Figure S7 after the last revision in which we had corrected for possible drifts to obtain sensory-selective neurons. The corrected panel A is provided below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      It appears that the correct rejection was low for most participants. It would improve interpretation of the behavioral results if correct rejection was shown as a rate (i.e., # of correct rejection trials / total number of no stimulus/blank trials) rather than or in addition to reporting the number of correct rejection trials (Figure 1C).

      We added the following figure to the supplementary information.

      The axis tick marks in Figure 5A late versus early are incorrect (appears the axis was duplicated).

      Thank you for spotting this, it has been corrected.

      Reviewer #2 (Recommendations For The Authors):

      We would like to congratulate the authors on this strongly supported contribution to the field. The manuscript is well-written, although a little bit too concise in sections. See the following comments for the methods that could benefit the present conclusions:

      Thank you for these suggestions that we believe improved our interpretations.

      Major Points

      (1) The subpopulations of neurons that are considered are small, but it is not a confounding issue for the conclusions drawn. However, the behavior of the neurons that were excluded should be considered by calculating the percentage of neurons that are selective for the distinct parameters, as a function of time. This would greatly strengthen the understanding of what can be observed in the two subcortical structures.

      We thank the reviewer for this suggestion. We performed a new analysis of the latencies at which our main effects were observed. This analysis revealed the existence of two clusters, as shown in the new Figure 5 copied below

      We also performed a new analysis to support the existence of bimodal distributions and quantified the latencies. We added this text to the result section:

      We note that the timings of sensory and perception effects in Figures 3 and 4 showed a bimodal distribution with an early cluster (149 ms for sensory neurons; 121 ms for perception neurons; c.f. methods) and a later cluster (330 ms for sensory neurons; 315 ms for perception neurons; Figure 5). and this section to the methods:

      To measure bimodal timings of effect latencies, we fitted a two-component Gaussian mixture distribution to the data in Figure 5 by minimizing the mean square error with an interior-point method. We took the best of 20 runs with random initialization points and verified that the resulting mean square error was markedly (> 4 times) better than using a single component.

      We also updated the discussion:

      The early cluster’s average timing around 150 ms post-stimulus corresponds to the onset of a putative cortical correlate of tactile consciousness, the somatosensory awareness negativity (Dembski et al., 2021). Similar electroencephalographic markers are found in the visual and auditory modality. It is unclear, however, whether these markers are related to perceptual consciousness or selective attention (Dembski et al., 2021). The later cluster is centered around 300 ms and could correspond to a well known electroencephalographic marker, the P3b (Polich, 2007) whose association with perceptual consciousness has been questioned (Pitts et al., 2014; Dembski et al., 2021) although brain activity related to consciousness has been observed at similar timing even in the absence of report demands (Sergent et al., 2021; Stockart et al., 2024). It is also important to note that these clusters contain neurons with both increased and decreased firing rates following stimulus onset, similar to what was observed previously in the posterior parietal cortex (Pereira et al., 2021).

      (2) We highly recommend that the authors consider employing some analysis that decodes therepresentations observable in the activity of individual neurons as a function of time (e.g. Shannon's Mutual Information). This would reinforce and emphasize the most relevant conclusions.

      We thank the reviewers for this suggestion. Unfortunately, such methods would require many more trials than what we were able to collect in the 10-minute slots available in the operating room.

      (3) Although there are small populations recorded in each of the two subcortical structures, they aresufficient to attempt a study using population dynamics (primarily, PCA can still work with smaller populations). Given the broad range of dynamics that are observed in a population of single units typically involved in decision-making, it would be interesting to consider whether heterogeneity is a hallmark of decision-making, and trying to summarize the variance in the activity of the entire population should provide a certain understanding of the cue-selective versus the perception-selective qualities, as an example.

      We now present all 13 neurons that were sensory- or perception-selective for which we had good enough behavior to show hit vs. miss differences in Supplementary Figure S5. Although population-level analyses would be relevant, they are not compatible with the number of neurons we identified.

      (4) A stronger presentation of what the expectations are for the results would also benefit theinterpretability of the manuscript when added to the introduction and discussion sections.

      Due to the scarcity of single-neuron data related to perceptual consciousness, especially in the subcortical structures we explored, our prior expectations did not exceed finding perception-selective neurons. We would prefer to avoid refining these expectations post-hoc. 

      Minor Comments

      (1) Add the shared overlap between differently selective neurons explicitly in the manuscript.

      We added this information at the end of the results section.

      (2) Add a consideration in the methods of why the Wilcoxon test or permutation test was selected forseparate uses. How do the results compare?

      Sorry for this misunderstanding. We clarified this in revised methods:

      To deal with possibly non-parametric distributions, we used Wilcoxon rank sum test or sign test instead of t-tests to test differences between distributions. We used permutation tests instead of Binomial tests to test whether a reported number of neurons could have been obtained by chance.

      Reviewer #3 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data or analysis:

      As suggested already in the public review, it might be worth showing all 13 neurons with either stimulusresponsive or perception-selective behaviour and, based on that, deepen the potential interpretation of the results for the different categories.

      We agree that this information improves the understanding of the underlying data and this addition was also proposed by reviewer 2. We added it in a new supplementary Figure S5.

      Recommendations for improving the writing and presentation

      As mentioned in the public review, I think Figure 3 needs clarification. I found that, in some instances, it was not entirely clear which type of analyses or permutation tests had been performed, and I would advocate for more clarity in these instances. For example:

      Page 6 line 146 "permuting trial labels 1000 times": do you mean randomly attributing a trial to aneuron? Or something else?

      We agree that this was somewhat unclear. We modified the sentence to:

      permuting the sign of the trial-wise differences

      We now define a sign permutation test for paired tests and a trial permutation test for two-sample tests in the methods and specify which test was used in the maintext.

      Page 7, neurons which have their firing rate modulated by the stimulus: I think you ought to be moreexplicit about the analysis so that we grasp it on the first read. To understand what is shown in Figure 3 I had to go back and forth between the main text and the method, and I am still not sure I completely understood. You compare the firing rate in sliding windows following stimulus onset with the mean firing rate during the 300ms baseline. Sliding windows are between 0 and 400 ms post-stim (according to methods ?) and a neuron is deemed responsive if you find at least one temporal cluster that shows a significant difference with baseline activity (using cluster permutation). Is that correct? Either way, I would recommend being a bit more precise about the analysis that was carried out in the main text, so that we only need to refer to methods when we need specialized information.

      We agree that the methods section was unclear. We re-wrote the following two paragraphs:

      To identify sensory-selective neurons, we assumed that subcortical signatures of stimulus detection ought to be found early following its onset and looked for differences in the firing rates during the first 400 ms post-stimulus onset compared to a 300 ms pre-stimulus baseline. To correct for possible drifts occurring during the trial, we subtracted the average cue-locked activity from catch trials to the cuelocked activity of each stimulus-present trials before realigning to stimulus onset. We defined a cluster as a set of adjacent time points for which the firing rates were significantly different between hits and misses, as assessed by a non-parametric sign rank test. A putative neuron was considered sensory-selective when the length of a cluster was above 80 ms, corresponding to twice the standard deviation of the smoothing kernel used to compute the firing rate. Whether for the shuffled data or the observed data, if more than one cluster was obtained, we discarded all but the longest cluster. This permutation test allowed us to control for multiple comparisons across time and participants.

      For perception-selective neurons, we looked for differences in the firing rates between hit and miss trials during the first 400 ms post-stimulus onset. We defined a cluster as a set of adjacent time points for which the firing rates were significantly different between hits and misses as assessed by a nonparametric Wilcoxon rank sum test. As for sensory-selective neurons, a putative neuron was considered perception-selective when the length of a cluster was above 80 ms, corresponding to twice the standard deviation of the smoothing kernel used to compute the firing rate and we discarded all but the longest cluster.

      Minor points:

      Figure 3: inset showing action potentials, please also provide the time scale (in the legend for example), so that it's clear that it is not commensurate with the firing rate curve below, but rather corresponds to the dots of the raster plot.

      We added the text ”[...], duration: 2.5 ms” in Figures 2, 3, and 4.

      Line 210: I recommend: “we found 8 neurons [...] showing a significant difference *between hits and misses* after stimulus onset."

      We made the change.

      Top of page 9, the following sentence is misleading “This result suggests that neurons in these two subcortical structures have mostly different functional roles ; this could read as meaning that functional roles are different between the two structures. Probably what you mean is rather something along this line : “these two subcortical structures both contain neurons displaying several different functional roles”

      Changed.

      Line 329: remove double “when”

      We made the change, thank you for spotting this.

    1. Author response:

      The following is the authors’ response to the previous reviews

      We would like to thank you for your valuable comments and suggestions, which have greatly contributed to improving our manuscript.

      We have carefully addressed all the reviewers' suggestions, and detailed responses for each Reviewer are provided at the end of this letter. In summary:

      • The Introduction has been revised to provide a more focused discussion on results, toning down the speculative discussion on seasonal host shifts.

      • The methodology section has been clarified, particularly the power analysis, which now includes a clearer explanation. The random effects in the models have been better described to ensure transparency.

      • The Results section was reorganized to highlight the key findings more effectively.

      • The Discussion has been restructured for clarity and conciseness, ensuring the interpretation of the results is clearer and better aligned with the study objectives.

      • Minor edits throughout the manuscript were made to improve readability and accuracy.

      We hope you find this revised version of the manuscript satisfactory.

      Reviewer #1 (Public review):

      Summary:

      This study examines the role of host blood meal source, temperature, and photoperiod on the reproductive traits of Cx. quinquefasciatus, an important vector of numerous pathogens of medical importance. The host use pattern of Cx. quinquefasciatus is interesting in that it feeds on birds during spring and shifts to feeding on mammals towards fall. Various hypotheses have been proposed to explain the seasonal shift in host use in this species but have provided limited evidence. This study examines whether the shifting of host classes from birds to mammals towards autumn offers any reproductive advantages to Cx.

      quinquefasciatus in terms of enhanced fecundity, fertility, and hatchability of the offspring. The authors found no evidence of this, suggesting that alternate mechanisms may drive the seasonal shift in host use in Cx. quinquefasciatus.

      Strengths:

      Host blood meal source, temperature, and photoperiod were all examined together.

      Weaknesses:

      The study was conducted in laboratory conditions with a local population of Cx. quinquefasciatus from Argentina. I'm not sure if there is any evidence for a seasonal shift in the host use pattern in Cx. quinquefasciatus populations from the southern latitudes.

      Comments on the revision:

      Overall, the manuscript is much improved. However, the introduction and parts of the discussion that talk about addressing the question of seasonal shift in host use pattern of Cx. quin are still way too strong and must be toned down. There is no strong evidence to show this host shift in Argentinian mosquito populations. Therefore, it is just misleading. I suggest removing all this and sticking to discussing only the effects of blood meal source and seasonality on the reproductive outcomes of Cx. quin.

      Introduction and discussion have been modified, toned down and sticked to discuss the results as suggested.

      Reviewer #1 (Recommendations for the authors):

      Some more minor comments are mentioned below.

      Line 51: Because 'of' this,

      Changed as suggested.

      Line 56: specialists 'or' generalists

      Changed as suggested.

      Line 56: primarily

      Changed as suggested.

      Line 98: Because 'of' this,

      Changed as suggested.

      Reviewer #2 (Public review):

      Summary:

      Conceptually, this study is interesting and is the first attempt to account for the potentially interactive effects of seasonality and blood source on mosquito fitness, which the authors frame as a possible explanation for previously observed hostswitching of Culex quinquefasciatus from birds to mammals in the fall. The authors hypothesize that if changes in fitness by blood source change between seasons, higher fitness on birds in the summer and on mammals in the autumn could drive observed host switching. To test this, the authors fed individuals from a colony of Cx. quinquefasciatus on chickens (bird model) and mice (mammal model) and subjected each of these two groups to two different environmental conditions reflecting the high and low temperatures and photoperiod experienced in summer and autumn in Córdoba, Argentina (aka seasonality). They measured fecundity, fertility, and hatchability over two gonotrophic cycles. The authors then used generalized linear mixed models to evaluate the impact of host species, seasonality, and gonotrophic cycle on fecundity, fertility, and hatchability. The authors were trying to test their hypothesis by determining whether there was an interactive effect of season and host species on mosquito fitness. This is an interesting hypothesis; if it had been supported, it would provide support for a new mechanism driving host switching. While the authors did report an interactive impact of seasonality and host species, the directionality of the effect was the opposite from that hypothesized. The authors have done a very good job of addressing many of the reviewer's concerns, especially by adding two additional replicates. Several minor concerns remain, especially regarding unclear statements in the discussion.

      Strengths:

      (1) Using a combination of laboratory feedings and incubators to simulate seasonal environmental conditions is a good, controlled way to assess the potentially interactive impact of host species and seasonality on the fitness of Culex quinquefasciatus in the lab.

      (2) The driving hypothesis is an interesting and creative way to think about a potential driver of host switching observed in the field.

      Weaknesses:

      (1) The methods would be improved by some additional details. For example, clarifying the number of generations for which mosquitoes were maintained in colony (which was changed from 20 to several) and whether replicates were conducted at different time points.

      Changed as suggested.

      (2) The statistical analysis requires some additional explanation. For example, you suggest that the power analysis was conducted a priori, but this was not mentioned in your first two drafts, so I wonder if it was actually conducted after the first replicate. It would be helpful to include further detail, such as how the parameters were estimated. Also, it would be helpful to clarify why replicate was included as a random effect for fecundity and fertility but as a fixed effect for hatchability. This might explain why there were no significant differences for hatchability given that you were estimating for more parameters.

      The power analysis was conducted a posteriori, as you correctly inferred. While I did not indicate that it was performed a priori, you are right in noting that this was not explicitly mentioned. As you suggested, the methodology for the power analysis has been revised to clarify any potential doubts.

      Regarding the model for hatchability, a model without a random effect variable was used, as all attempts to fit models with random effects resulted in poor validation. These points have now been clarified and explained in the corresponding section.

      (3) A number of statements in the discussion are not clear. For example, what do you mean by a mixed perspective in the first paragraph? Also, why is the expectation mentioned in the second paragraph different from the hypothesis you described in your introduction?

      Changed as suggested.

      (4) According to eLife policy, data must be made freely available (not just upon request).

      Data and code will be publicly available. The corresponding section was modified.

      Reviewer #2 (Recommendations for the authors):

      Your manuscript is much improved by the inclusion of two additional replicates! The results are much more robust when we can see that the trends that you report are replicable across 3 iterations of the experiment. Congratulations on a greatly improved study and paper! I have several minor concerns and suggestions, listed below:

      38-39: I think it is clearer to say "no statistically significant effect of season on hatchability of eggs" ... or specify if you are referring to blood or the interaction of blood and season. It isn't clear which treatment you are referring to here.

      Changed as suggested.

      54-57: This could be stated more succinctly. Instead of citing papers that deal with specific examples of patterns, I would suggest citing a review paper that defines these terms.

      Changed as suggested.

      83-84: What if another migratory bird is the preferred host in Argentina? I would state this more cautiously (e.g. "may not be applicable...").

      Changed as suggested.

      95-96: I don't understand what you mean by this. These hypotheses are specifically meant to understand mosquitoes that DO have a distinct seasonal phenology, so I'm not sure why this caveat is relevant. And naturally this hypothesis is host dependent, since it is based on specific host reproductive investments. I think that the strongest caveat to this hypothesis is simply that it hasn't been proven.

      Changed as suggested.

      97-115: This is a great paragraph! Very clear and compelling.

      Thanks for your words!

      118: Do you have an exact or estimated number of rafts collected?

      Sorry, I have not the exact number of rafts, but it was at leas more than 20-30.

      135: "over twenty" was changed to "several"; several would imply about 3 generations, so this is misleading. If the colony was actually maintained for over twenty generations, then you should keep that wording.

      Changed as suggested.

      163-164: Can you please clarify whether the replicates were conducted a separate time points?

      Changed as suggested.

      Note: the track changes did not capture all of the changes made; e.g. 163-164 should show as new text but does not.

      You are absolutely right; when I uploaded the last version, I unfortunately deleted all tracked changes and cannot recover them. In this new version, I will ensure that all minimal changes are included as tracked changes.

      186 - 189: the terms should be "fixed effect" and "random effect"

      Changed as suggested.

      191: Edit: linear

      Changed as suggested.

      194: why was replicate not included as a random effect here when it was above? Also, can you please clarify "interaction effects"? Which interactions did you include?

      Changed as suggested. Explained above and in methodology. Hatchability models with random effect variable were poor fitted and validated. The interactions for hatchability were a four-way (season, blood source, cycle and replicate)

      207-208: I'm not sure what you mean by "aimed to achieve"? Weren't you doing this after you conducted the experiments, so wouldn't this be determining the power of your model (post-hoc power analysis)? Also, I think you should provide the parameter estimates that were used (e.g. effect size - did you use the effect size you estimated across the 3 replicates?).

      Changed as suggested.

      214-215: this should be reworded to acknowledge that this is estimated for the given effect size; for example, something like "This sample size was sufficient to detect the observed effect with a statistical power of 0.8" or something along those lines (unless I am misunderstanding how you conducted this test).

      Changed as suggested.

      246. Abbreviate Culex

      Changed as suggested.

      253-255: This sentence isn't clear. What do you mean by mixed? Also, the season really seemed to mainly impact the fitness of mosquitoes fed on mouse blood and here the way it is phrased seems to indicate that season has an impact on the fitness of those fed with chicken blood.

      Changed as suggested.

      258-260: You stated your hypothesis as the relative fitness shifting between seasons, but this statement about the expectation is different from your hypothesis stated earlier. Please clarify.

      You are right. Thank you for noting this. It was changed as suggested.  

      263-266: I also don't understand this sentence; what does the first half of the sentence have to do with the second?

      Changed as suggested.

      269-270: This doesn't align with your observation exactly; you say first AND second are generally most productive, but you observed a drop in the second. Please clarify this.

      Changed as suggested.

      280: I suggest removing "as same as other studies"; your caveats are distinct because your experimental design was unique

      Changed as suggested.

      287: you shouldn't be looking for a "desired" effect; I suggest removing this word

      Changed as suggested.

      288: It wasn't really a priori though, since you conducted it after your first replicate (unless you didn't use the results from the first replicate you reported in the original drafts?)

      It was a posteriori. Changed as suggested.

      290: Why is 290 written here?

      It was a mistype. Deleted as suggested.

      291-298: The meaning of this section of your paragraph is not clear.

      Improve as suggested.

      304-313: This list of 3 explanations are directed at different underlying questions. Explanations 1 and 2 are alternative explanations for why host switching occurs if not due to differences in fitness. This isn't really an explanation of your results so much as alternative explanations for a previously reported phenomenon. And the third is an explanation for why you may not have observed the expected effect. I suggest restructuring this to include the fact that Argentinian quinqs may not host switch as part of your previous list of caveats. Then you can include your two alternative explanations for host switching as a possible future direction (although I would say that it is really just one explanation because "vector biology" is too broad of a statement to be testable). Also, you haven't discussed possible explanations for your actual result, which showed that mosquito fitness decreased when feeding on mouse blood in autumn conditions and in the second gonotrophic, while those that fed on chicken did not experience these changes. Why might that be?

      The discussion was restructured to include all these suggested changes. Additionally, it was also discussed some possible explanations of our results.

      315-317: This statement is vague without a direct explanation of how this will provide insight. I suggest removing or providing an explanation of how this provides insight to transmission and forecasting.

      Changed as suggested.

      319-320: According to eLife policy, all data should be publicly available. From guidelines: "Media Policy FAQs Data Availability Purpose and General Principles To maintain high standards of research reproducibility, and to promote the reuse of new findings, eLife requires all data associated with an article to be made freely and widely available. These must be in the most useful formats and according to the relevant reporting standards, unless there are compelling legal or ethical reasons to restrict access. The provision of data should comply with FAIR principles (Findable, Accessible, Interoperable, Reusable). Specifically, authors must make all original data used to support the claims of the paper, or that is required to reproduce them, available in the manuscript text, tables, figures or supplementary materials, or at a trusted digital repository (the latter is recommended). This must include all variables, treatment conditions, and observations described in the manuscript. The authors must also provide a full account of the materials and procedures used to collect, pre-process, clean, generate and analyze the data that would enable it to be independently reproduced by other researchers."

      - so you need to make your data available online; I also understand the last sentence to indicate that code should be made available.  

      Data and code will be publicly available.

      Table 1: it is notable that in replicate 2, the autumn:mouse:gonotrophic cycle II fecundity and fertility are actually higher than in the summer, which is the opposite of reps 1 and 3 and the overall effect you reported from the model. This might be worth mentioning in the discussion.

      Mentioned in the discussion as suggested.

      Tables 1 and 2: shouldn't this just be 8 treatments? You included replicate as a random effect, so it isn't really a separate set of treatments.

      This table reflects the output of the whole experiment, that is why it is present the 24 expetiments.

      Figure 3: Can you please clarify if this is showing raw data?

      Changed as suggested.

      Note: grammatical copy editing would be beneficial throughout

      Grammar was improved as suggested.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      In this study, Tian et al. explore the role of ubiquitination of non-structural protein 16 (nsp16) in the SARS-CoV-2 life cycle. nsp16, in conjunction with nsp10, performs the final step of viral mRNA capping through its 2'-O-methylase activity. This modification allows the virus to evade host immune responses and protects its mRNA from degradation. The authors demonstrate that nsp16 undergoes ubiquitination and subsequent degradation by the host E3 ubiquitin ligases UBR5 and MARCHF7 via the ubiquitin-proteasome system (UPS). Specifically, UBR5 and MARCHF7 mediate nsp16 degradation through K48- and K27-linked ubiquitination, respectively. Notably, degradation of nsp16 by either UBR5 or MARCHF7 operates independently, with both mechanisms effectively inhibiting SARS-CoV-2 replication in vitro and in vivo. Furthermore, UBR5 and MARCHF7 exhibit broad-spectrum antiviral activity by targeting nsp16 variants from various SARS-CoV-2 strains. This research advances our understanding of how nsp16 ubiquitination impacts viral replication and highlights potential targets for developing broadly effective antiviral therapies.

      Strengths:

      The proposed study is of significant interest to the virology community because it aims to elucidate the biological role of ubiquitination in coronavirus proteins and its impact on the viral life cycle. Understanding these mechanisms will address broadly applicable questions about coronavirus biology and enhance our overall knowledge of ubiquitination's diverse functions in cell biology. Employing in vivo studies is a strength.

      Weaknesses:

      Minor comments:

      Figure 5A- The authors should ensure that the figure is properly labeled to clearly distinguish between the IP (Immunoprecipitation) panel and the input panel.

      Thank you for your suggestion. We have exchanged Figure 5 in this version.

      Reviewer #3 (Public review):

      Summary:

      The manuscript "SARS-CoV-2 nsp16 is regulated by host E3 ubiquitin ligases, UBR5 and MARCHF7" is an interesting work by Tian et al. describing the degradation/ stability of NSP16 of SARS CoV2 via K48 and K27-linked Ubiquitination and proteasomal degradation. The authors have demonstrated that UBR5 and MARCHF7, an E3 ubiquitin ligase bring about the ubiquitination of NSP16. The concept, and experimental approach to prove the hypothesis looks ok. The in vivo data looks ok with the controls. Overall, the manuscript is good.

      Strengths:

      The study identified important E3 ligases (MARCHF7 and UBR5) that can ubiquitinate NSP16, an important viral factor.

      Comments on revisions:

      I had gone through the revised form of the manuscript thoroughly. The authors have addressed all of my concerns. To me, the experimental approach looks convincing that the host E3 ubiquitin ligases (UBR5 and MARCHF7) ubiquitinate NSP16 and mark it for proteasomal degradation via K48- and K27- linkage. The authors have represented the final figure (Fig.8) in a convincing manner, opening a new window to explore the mechanism of capping the vRNA bu NSP16.

      Thank you for your recognition.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary, and Strengths:

      The authors and their team have investigated the role of Vimentin Cysteine 328 in epithelial-mesenchymal transition (EMT) and tumorigenesis. Vimentin is a type III intermediate filament, and cysteine 328 is a crucial site for interactions between vimentin and actin. These interactions can significantly influence cell movement, proliferation, and invasion. The team has specifically examined how Vimentin Cysteine 328 affects cancer cell proliferation, the acquisition of stemness markers, and the upregulation of the non-coding RNA XIST. Additionally, functional assays were conducted using both wild-type (WT) and Vimentin Cysteine 328 mutant cells to demonstrate their effects on invasion, EMT, and cancer progression. Overall, the data supports the essential role of Vimentin Cysteine 328 in regulating EMT, cancer stemness, and tumor progression. Overall, the data and its interpretation are on point and support the hypothesis. I believe the manuscript has great potential.

      The authors are thankful to the reviewers for carefully reading the manuscript and evaluating the data to make positive comments and supporting our conclusions.

      Weaknesses:

      Minor issues are related to the visibility and data representation in Figures 2E and 3 A-F

      We have revised the figures (Figure 2E and Figure 3A-F) to increase the data visibility.

      Reviewer #2 (Public review):

      The aim of the investigation was to find out more about the mechanism(s) by which the structural protein vimentin can facilitate the epithelial-mesenchymal transition in breast cancer cells.

      The authors focussed on a key amino acid of vimentin, C238, its role in the interaction between vimentin and actin microfilaments, and the downstream molecular and cellular consequences. They model the binding between vimentin and actin in silico to demonstrate the potential involvement of C238, but the outcome is described vaguely.

      We have expanded the discussion of these results in the manuscript to more explicitly describe the critical role of C238 in the vimentin-actin interaction. Specifically, we highlight that C238 lies within a region of the vimentin rod domain known to mediate key protein-protein interactions. Our modeling shows that the thiol group of C238 enables specific hydrogen bonding and potential disulfide-mediated interactions with actin, which are disrupted upon mutation to serine. These findings provide mechanistic insight into the functional importance of this residue.

      The phenotype of a non-metastatic breast cancer cell line MCF7, which doesn't express vimentin, could be changed to a metastatic phenotype when mutant C238S vimentin, but not wild-type vimentin, was expressed in the cells. Expression of vimentin was confirmed at the level of mRNA, protein, and microscopically. Patterns of expression of vimentin and actin reflected the distinct morphology of the two cell lines. Phenotypic changes were assessed through assay of cell adhesion, proliferation, migration, and morphology and were consistent with greater metastatic potential in the C238S MCF7 cells. Changes in the transcriptome of MCF7 cells expressing wild-type and C238S vimentins were compared and expression of Xist long ncRNA was found to be the transcript most markedly increased in the metastatic cells expressing C238S vimentin. Moreover changes in expression of many other genes in the C238S cells are consistent with an epithelial mesenchymal transition. Tumourigenic potential of MCF7 cells carrying C238S but not wild-type, vimentin was confirmed by inoculation of cells into nude mice. This assay is a measure of the stem-cell quality of the cells and not a measure of metastasis. It does demonstrate phenotypic changes that could be linked to metastasis.

      shRNA was used to down-regulate vimentin or Xist in the MCF7 C238S cells. The description of the data is limited in parts and data sets require careful scrutiny to understand the full picture. Down-regulation of vimentin reversed the morphological changes to some degree, but down-regulation of Xist didn't.

      This is understandable given the fact that vimentin interacts with actin which is known to determine cell shape. XIST being a non-coding RNA will not have the same effect.

      Conversely, down-regulation of XIST inhibited cell growth, a sign of reversing metastatic potential, but down-regulation of vimentin had no effect on growth.

      XIST is known to get induced in a number of cancers (see Figure 3E) which is consistent with our observation that its downregulation will inhibit cell growth. However, downregulation of vimentin had no effect on growth which is consistent with our previously published observation that ectopic expression of wildtype vimentin in MCF-7 cells did not influence cell growth (Usman et al Cells 2022, 11(24), 4035; https://doi.org/10.3390/cells11244035).

      Down-regulation of either did inhibit cell migration, another sign of metastatic reversal.

      We have previously shown that ectopic expression of wildtype vimentin in MCF-7 stimulate cell migration due to downregulation of CDH5 (endothelial cadherins) (Usman et al Cells 2022, 11(24), 4035). Therefore, downregulation of vimentin is expected to inhibit cell migration which is what we observed in this study. Why downregulation of XIST inhibited cell migration is not clear. It is conceivable that XIST downregulation affects Lamin expression which may suppress intercellular interactions to increase cell migration. This hypothesis is supported by the fact that vimentin expression in MCF-7 affects Lamin expression (Usman et al Cells 2022, 11(24), 4035).

      The interpretation of this type of experiment is handicapped when full reversal of expression is not achieved, as was the case in this study.

      Full reversal of any biological effect is almost impossible to achieve which is because the shRNAs by nature are not 100% effective. This can however be tested using crispr Cas 9 gene editing to completely knockdown a protein (can’t be used for XIST as it is a non-coding RNA). In that case one has to assume that it will have no off-target effect.

      Overall the study describes an intriguing model of metastasis that is worthy of further investigation, especially at the molecular level to unravel the connection between vimentin and metastasis. The identification of a potential role for Xist in metastasis, beyond its normal role in female cells to inactivate one of the X chromosomes, corroborates the work of others demonstrating increased levels in a variety of tumours in women and even in some tumours in men. It would be of great interest to see where in metastatic cells Xist is expressed and what it binds to.

      The authors fully agree that it is an interesting model of metastasis/oncogenesis that requires further investigation.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Hua et al show how targeting amino acid metabolism can overcome Trastuzumab resistance in HER2+ breast cancer.

      Strengths:

      The authors used metabolomics, transcriptomics and epigenomics approaches in vitro and in preclinical models to demonstrate how trastuzumab-resistant cells utilize cysteine metabolism.

      Thank you for your valuable comments. We would like to extend our appreciation for your efforts. Your constructive suggestion would help improve our research.

      Weaknesses:

      However, there are some key aspects that needs to be addressed.

      Major:

      (1) Patient Samples for Transcriptomic Analysis: It is unclear from the text whether tumor tissues or blood samples were used for the transcriptomic analysis. This distinction is crucial, as these two sample types would yield vastly different inferences. The authors should clarify the source of these samples.

      Thank you for your valuable comments. In the transcriptomic analysis, we included the data of HER2 positive breast cancer patients who received trastuzumab in I-SPY2 trial (GSE181574). Tumor tissues were used in this dataset. We highlighted the usage of “pre-treatment breast cancer tumors” in Line 309 and included the overview of transcriptomic data analysis in I-SPY2 trial in Figure S1F.

      (2) The study only tested one trastuzumab-resistant and one trastuzumab-sensitive cell line. It is unclear whether these findings are applicable to other HER2-positive tumor cell lines, such as HCC1954. The authors should validate their results in additional cell lines to strengthen their conclusions.

      Thank you for your valuable comments. We agree with your opinion, and the exploration of multiple cell lines would make our research findings more comprehensive. This is a limitation of our study, and we would continue to improve our design and methods in future experiments.

      (3) Relevance to Metastatic Disease: Trastuzumab resistance often arises in patients during disease recurrence, which is frequently associated with metastasis. However, the mouse experiments described in this paper were conducted only in the primary tumors. This article would have more impact if the authors could demonstrate that the combination of Erastin or cysteine starvation with trastuzumab can also improve outcomes in metastasis models.

      Thank you for your valuable comments. We agree with your suggestions. The exploration of metastatic disease would make our research more meaningful and help better address clinical key issues. In our future studies, we will continue to investigate the association between the invasive and metastatic capabilities of trastuzumab resistant HER2 positive breast cancer and cysteine metabolism.

      Minor:

      (1) The figures lack information about the specific statistical tests used. Including this information is essential to show the robustness of the results.

      Thank you for your valuable comments. We added statistical information in our figure legends, including Line 849-850, Line 865-867, Line 881-882, Line 898-900, Line 910-911 and Line 923-924.

      (2) Figure 3K Interpretation: The significance asterisks in Figure 3K do not specify the comparison being made. Are they relative to the DMSO control? This should be clarified.

      Thank you for your valuable comments. We have modified this figure to demonstrate it more clearly. In Figure 3K, the significance was determined by one-way ANOVA and the comparison presented was relative to the DMSO control. It was indicated that the combination of erastin or cysteine starvation and trastuzumab could increase lipid peroxidation, although trastuzumab monotherapy did not induce ferroptosis.

      Additionally, the combination of erastin and trastuzumab could result in more lipid peroxidation than erastin alone. Similar results were also found in the combination of cysteine starvation and trastuzumab. These results showed that targeting cysteine metabolism plus trastuzumab could have synergic effects to induce ferroptosis in trastuzumab resistant HER2 positive breast cancer.

      Reviewer #2 (Public review):

      In this manuscript, Hua et al. proposed SLC7A11, a protein facilitating cellular cystine uptake, as a potential target for the treatment of trastuzumab-resistant HER2-positive breast cancer. If this claim holds true, the finding would be of significance and might be translated to clinical practice. Nevertheless, this reviewer finds that the conclusion was poorly supported by the data.

      Notably, most of the data (Figures 2-6) were based on two cell lines - JIMT1 as a representative of trastuzumab-resistant cell line, and SKBR3 as a representative of trastuzumab sensitive cell line. As such, these findings could be cell-line specific while irrelevant to trastuzumab sensitivity at all. Furthermore, the authors claimed ferroptosis simply based on lipid peroxidation (Figure 3). Cell viability was not determined, and the rescuing effects of ferroptosis inhibitors were missing. The xenograft experiments were also suspicious (Figure 4). The description of how cysteine starvation was performed on xenograft tumors was lacking, and the compound (i.e., erastin) used by the authors is not suitable for in vivo experiments due to low solubility and low metabolic stability. Finally, it is confusing why the authors focused on epigenetic regulations (Figures 5 & 6), without measuring major transcription factors (e.g., NRF2, ATF4) which are known to regulate SLC7A11.

      To sum up, this reviewer finds that the most valuable data in this manuscript is perhaps Figure 1, which provides unbiased information concerning the metabolic patterns in trastuzumab-sensitive and primary resistant HER2-positive breast cancer patients.

      Thank you for your valuable comments. We agree with your suggestions. Your feedback would help enhance the quality of our research.

      (1) Our research was mainly conducted in JIMT1 (trastuzumab resistant) and SKBR3 (trastuzumab sensitive), and this is a limitation of our study. The experimental validation using different cell lines will make our research findings more persuasive. In our future research, we will continuously optimize experimental design and methods to make our findings more comprehensive.

      (2) The detection of ferroptosis in our research was mainly performed by evaluating the lipid peroxidation. Experiments measuring cell viability and rescuing effects would help provide more evidence.

      We utilized CCK8 tests to compare cell viabilities of JIMT1 and SKBR3 in different erastin and RSL3 concentrations, as well as different exposure time of cysteine starvation. It was shown that JIMT1 was more sensitive to erastin and RSL3, but tolerant to cysteine starvation, which was consistent with the previous lipid peroxidation tests. This data was included in Figure S5C-E. We added the description in Line 375-379.

      In addition, we also performed experiments to explore the rescuing effects of ferroptosis inhibitor Fer-1. It was indicated that Fer-1 could suppress the lipid peroxidation resulted from erastin, RSL3 and cysteine starvation in both JIMT1 and SKBR3. This provided more evidence that cysteine metabolism played a vital role in modulating HER2 positive breast cancer ferroptosis. This data was included in Figure S5G and S5H. We added the description to Line 387-391.

      (3) In xenograft experiments, the cysteine starvation was performed by feeding cystine/cysteine-deficient diet (Xietong Bio). We added details of this diet on Line 236-237 in Methods.

      We agree with your opinion on the role of erastin in experiments in vivo. We have tried to optimize drug dissolution and other conditions by referring to previous relevant literature. We would continue to improve our experimental design and methods.

      (4) Epigenetic modifications have been recognized as crucial factors in drug resistance formation. An increasing number of studies have emphasized the importance of epigenetic changes in regulating the abnormal expression of oncogenes and tumor suppressor genes related to drug resistance. Currently, the role of epigenetic changes in the development of trastuzumab resistance in HER2 positive breast cancer is still in exploration. We tried to investigate the dysregulation of histone modifications and DNA methylation in trastuzumab resistant HER2 positive breast cancer. Our findings indicated that targeting H3K4me3 and DNA methylation could decrease SLC7A11 expression and induce ferroptosis. This would provide more evidence in exploring trastuzumab resistance mechanisms. We have provided a detailed discussion on Line 598-607.

      We would like to extend our appreciation for your constructive suggestions and continue to improve our research in future experiments.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Line 334: it would be helpful to clarify that JIMT1 cells are trastuzumab-resistant while SKBR3 cells are trastuzumab sensitive, especially for those not familiar with breast cancer cell lines.

      Thank you for your valuable recommendations. We added the description of trastuzumab sensitive SKBR3 and trastuzumab resistant JIMT1 on Line 334-335.

      (2) Figure 3: the concentrations of erastin and RSL3 should be indicated.

      Thank you for your valuable recommendations. In Figure 3, the concentration of erastin was 10μm and RSL3 was 1μm. We added these details in the figure legends on Line 872-873.

      (3) Figure 3: lipid peroxidation does not necessarily mean ferroptosis. Cell viability data and rescuing effects of ferroptosis inhibitors should be shown.

      Thank you for your valuable recommendations. As we mentioned above, we utilized CCK8 tests to compare cell viabilities of JIMT1 and SKBR3 in different erastin and RSL3 concentrations, as well as different exposure time of cysteine starvation. It was consistent with lipid peroxidation tests that JIMT1 was more sensitive to erastin and RSL3, but tolerant to cysteine starvation. This data was included in Figure S5C-E. We added the description in Line 375-379.

      As described above, we also performed experiments to explore the rescuing effects of ferroptosis inhibitor Fer-1. It was indicated that Fer-1 could suppress the lipid peroxidation resulted from erastin, RSL3 and cysteine starvation in both JIMT1 and SKBR3. This provided more evidence that cysteine metabolism played a vital role in modulating HER2 positive breast cancer ferroptosis. This data was included in Figure S5G and S5H. We added the description to Line 387-391.

      (4) Figure 3H: how cysteine starvation was performed should be clarified in the Methods section.

      Thank you for your valuable recommendations. We performed cell culture with cysteine starvation by utilizing cystine/cysteine-deficient DMEM (BIOTREE) and 1% penicillin streptomycin at 37℃ with 5% CO2. We added details of this diet on Line 141-143 in Methods.

      (5) Figure 4: the meaning of "H" should be clarified.

      Thank you for your valuable recommendations. H was indicated as trastuzumab. We clarified the meaning of “H” in the figure legends on Line 898.

      (6) Figure 4B & 4C: the data of "H" group and "Erastin" group are inconsistent.

      Thank you for your valuable recommendations. In the vivo experiments, the tumor volume changes were analyzed using a paired approach, comparing the tumor size of each individual mouse before and after treatment. We noticed the confusion caused and added more details about our vivo experiments on Line 240 in Methods and Line 892-893 in figure legends.

      (7) Figure 4: how cysteine starvation was performed should be clarified in the Methods section.

      Thank you for your valuable recommendations. We performed cysteine starvation by utilizing cystine/cysteine-deficient diet (Xietong Bio). We added details of this diet on Line 236-237 in Methods.

      We have also corrected some grammatical errors in the manuscript and We would like to extend our great appreciation to all editors and reviewers for their invaluable contributions.

    1. Author response:

      The following is the authors’ response to the original reviews

      Summary of revisions:

      Thanks to the careful review and comments from the reviewers, we restructured the introduction and the discussion to improve clarity and better contextualise findings. We notably discuss further the f<sub>sphere</sub> decrease observations in the cerebellum and the Tau-specific findings (Tau being a possible marker for Purkinje cells development and Tau switching compartment in the thalamus). We added material in Supplementary Information to support these discussion points. We added a figure to show the metabolic profiles normalised by water or by macromolecules and a figure and table related to a rough approximation of f<sub>sphere</sub>, leaning on existing literature. We report the DTI results for thoroughness.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this work, Ligneul and coauthors implemented diffusion-weighted MRS in young rats to follow longitudinally and in vivo the microstructural changes occurring during brain development. Diffusion-weighted MRS is here instrumental in assessing microstructure in a cell-specific manner, as opposed to the claimed gold-standard (manganese-enhanced MRI) that can only probe changes in brain volume. Differential microstructure and complexification of the cerebellum and the thalamus during rat brain development were observed noninvasively. In particular, lower metabolite ADC with increasing age were measured in both brain regions, reflecting increasing cellular restriction with brain maturation. Higher sphere (representing cell bodies) fraction for neuronal metabolites (total NAA, glutamate) and total creatine and taurine in the cerebellum compared to the thalamus were estimated, reflecting the unique structure of the cerebellar granular layer with a high density of cell bodies. Decreasing sphere fraction with age was observed in the cerebellum, reflecting the development of the dendritic tree of Purkinje cells and Bergmann glia. From morphometric analyses, the authors could probe non-monotonic branching evolution in the cerebellum, matching 3D representations of Purkinje cells expansion and complexification with age. Finally, the authors highlighted taurine as a potential new marker of cerebellar development.

      From a technical standpoint, this work clearly demonstrates the potential of diffusion-weighted MRS at probing microstructure changes of the developing brain non-invasively, paving the way for its application in pathological cases. Ligneul and coauthors also show that diffusionweighted MRS acquisitions in neonates are feasible, despite the known technical challenges of such measurements, even in adult rats. They also provide all necessary resources to reproduce and build upon their work, which is highly valuable for the community.

      From a biological standpoint, claims are well supported by the microstructure parameters derived from advanced biophysical modelling of the diffusion MRS data. The assumption of metabolite compartmentation, forming the basis of cell-specific microstructure interpretation of dMRS data, remains debated and should be considered with care (Rae, Neurochem Res, 2014, https://doi.org/10.1007/s11064-013-1199-5). External cross-validation of some of the authors' claims, in particular taurine in the thalamus switching from neurons to astrocytes during brain development, would be a highly valuable addition to this study.

      R1.1: We understand the reviewer's concerns. Metabolic compartmentation is not a one-toone correspondence. Although we interpret the results in the light of metabolic compartmentation, our results are not driven by this assumption. We could not perform a direct cross-validation of the taurine switch in the thalamus, but we now clarify in the discussion why the dMRS results themselves indicate a switch, and we integrate our results better with existing literature on taurine. We now discuss this in more detail for the cerebellar results too.

      Specific strengths:

      (1) The interpretation of dMRS data in terms of cell-specific microstructure through advanced biophysical modelling (e.g. the sphere fraction, modelling the fraction of cell bodies versus neuronal or astrocytic processes) is a strong asset of the study, going beyond the more commonly used signal representation metrics such as the apparent diffusion coefficient, which lacks specificity to biological phenomena.

      (2) The fairly good data quality despite the complexity of the experimental framework should be praised: diffusion-weighted MRS was acquired in two brain regions (although not in the same animals) and longitudinally, in neonates, including data at high b-values and multiple diffusion times, which altogether constitutes a large-scale dataset of high value for the diffusion-weighted MRS community.

      (3) The authors have shared publicly data and codes used for processing and fitting, which will allow one to reproduce or extend the scope of this work to disease populations, and which goes in line with the current effort of the MR(S) community for data sharing.

      Specific weaknesses:

      (1) This work lacks an introduction and a discussion about diffusion MRI, which is already a validated technique to assess brain development non-invasively. Although water lacks cellspecificity compared to metabolites, several studies have reported a decrease in water ADC and increased fractional anisotropy with brain maturation, associated with the myelination process and decreased water content (overview in Hüppi, Chapt. 30 of "Diffusion MRI: Theory, Methods, and Applications", Oxford University Press, 2010). Interestingly, the same observations are found in this work (decreased ADC with age for most metabolites in both brain regions), which should have been commented on. Moreover, the authors could have reported water diffusion properties in addition to metabolites', as I believe the water signal, used for coil combination and/or Eddy currents corrections, is usually naturally acquired during diffusion-weighted MRS scans.

      R1.2: Thank you for these helpful suggestions. We have now improved our introduction of the various modalities, and we contextualise the study in light of previous DTI findings in the as suggested by the reviewer. We agree with the reviewer that the comparison with previous human DTI is relevant, and we now mention it at the beginning of the discussion. However, the very different nature of the dMRS signal compared to dMRI (intracellular and absence of exchange for metabolites) prevents us from drawing any strong conclusions.

      (2) It is unclear why the authors have normalized metabolite concentrations (measured from low b-values diffusion-weighted MRS spectra) to the macromolecule concentrations. First, it is not specified whether in vivo macromolecules were acquired at each age or just at one time point. Second, such ratios are not standard practice in the MRS community so this choice should have been explained. Third, the macromolecule content was reported to change with age (Tkac et al., Magn Reson Med, 2003), therefore a change in metabolite to macromolecule ratio with age cannot be interpreted unequivocally.

      R1.3: We agree with the reviewer that this needed further explanations. We now clarify in the Results section “Metabolic profile changes with age” the reasoning behind choosing macromolecules for normalisation. We also added in the Supplementary Information the metabolite concentrations change with age when normalising by water, and a direct comparison with MM normalisation (Figure S2).

      (3) Some discussion is missing about the choice of the analytical biophysical model (although a few are compared in Supplementary Materials), in particular: is a model of macroscopic anisotropy relevant in cerebellum, made of a large fraction of oriented white matter tracks, and does the model remain valid at different ages given white matter maturation and the ongoing myelination process?

      R1.4: We agree with the reviewer that this is a valid concern. We actually acquired some standard DTI at the end of the acquisition sessions (where possible) having in mind the fibre dispersion estimation. However, data could not be acquired in all animals, and the data quality was poor (see Figure S8, the experimental conditions would have required further optimisation). We now add a couple of sentences at the beginning and in the end of discussion to address this limitation, and we include the DTI data in Supplementary Information.

      Reviewer #2 (Public Review):

      Summary:

      The authors set out to non-invasively track neuronal development in rat neonates, which they achieved with notable success. However, the direct relationship between the results and broader conclusions regarding developmental biology and potential human implications is somewhat overstretched without further validation.

      Strengths:

      If adequately revised and validated, this work could have a significant impact on the field, providing a non-invasive tool for longitudinal studies of brain development and neurodevelopmental disorders in preclinical settings.

      Weaknesses:

      (1) Consistency and Logical Flow:

      The manuscript suffers from a lack of strategic flow in some sections. Specifically, transitions between major findings and methodological discussions need refinement to ensure a logical progression of ideas. For example, the jump from the introduction of developmental trajectories and the technicalities of MRS (Magnetic Resonance Spectroscopy) processing on page 3 could benefit from a bridging paragraph that explicitly states the study's hypotheses based on existing literature gaps.

      R2.1: Thank you for this general feedback (along with your point (3)) that helped us restructure the introduction and the discussion to improve the clarity and flow.

      (2)  Scientific Rigour:

      While the novel application of diffusion-weighted MRS is commendable, there's a notable gap in the rigorous validation of this approach against gold-standard histological or molecular techniques. Particularly, the assertions regarding the sphere fraction and morphological changes inferred from biophysical modelling mandates direct validation to solidify the claims made. A study comparing these in vivo findings with ex vivo confirmation in at least a subset of samples would significantly enhance the reliability of these conclusions.

      R2.2: We agree with the reviewer that this would have been a great addition to the manuscript. Although we could not run new experiments to address these flaws, we now discuss the results more quantitatively, leaning on existing literature (addition of Figure S11 and Table S2). This helps us understand the results around Tau in both regions better, and illustrate the R<sub>sphere</sub> trend.

      (3) Clarity and Novelty:

      - The manuscript often delves deeply into technical specifics at the expense of accessibility to readers not deeply familiar with MRS technology. The introduction and discussions would benefit from a clearer elucidation of why these specific metabolite markers were chosen and their known relevance to neuronal and glial cells, placing this in the context of what is novel compared to existing literature.

      - The novelty aspect could be reinforced by a more structured discussion on how this method could change the current understanding or practices within neurodevelopmental research, compared to the current state of the art.

      R2.3: See answer to (1). By restructuring the introduction and the discussion, we hope to have addressed this point. We now discuss how these findings compare to the state of the art (notably added comparison with dMRI research). Along with the next comment, we better discuss potential implications of these findings for neurodevelopmental research.

      (4) Completeness:

      - The Discussion section requires expansion to offer a more comprehensive interpretation of how these findings impact the broader field of neurodevelopment and psychiatric disorders. Specifically, the implications for human studies or clinical translation are touched upon but not fully explored.

      - Further, while supplementary material provides necessary detail on methodology, key findings from these analyses should be summarized and discussed in the main text to ensure the manuscript stands complete on its own.

      R2.4: Thank you for these helpful suggestions. We now integrate the findings better into the existing literature. We notably discuss how the results might translate to humans.

      (5) Grammar, Style, Orthography:

      There are sporadic grammatical and typographical errors throughout the text which, while minor, detract from the overall readability. For example, inconsistencies in metabolite abbreviations (e.g., tCr vs Cr+PCr) should be standardized.

      R2.5: Thank you for the careful review. This has been corrected.

      (6) References and Additional Context:

      The current reference list is extensive but lacks integration into the narrative. Direct comparisons with existing studies, especially those with conflicting or supportive findings, are scant. More dedicated effort to contextualize this work within the existing body of knowledge would be beneficial.

      R2.6: Because the nature of this work is novel, it is difficult to find directly conflicting/similar works. However, we now integrate the findings into the broader literature.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor comments:

      Thank you for the careful review, we have addressed most of the minor comments, except for the last one, which we discuss below.

      - Some figures appear blurred in the printed PDF- Introduction: "constrained and hindered by cell membranes," - maybe use "restricted" instead of "constrained", like everywhere else in the text

      - Introduction: "(typically ~8cm3 vs ~8mm3 in dMRI in humans)" - here I suggest to put the rat brain sizes instead to help the reader understand how small the voxel was at P5 in this study, thus explaining the challenges

      - Fig 1 - numbers 1 and 2 on panel A,B should be clarified and they do not match 1 and 2 on panel C, which is confusing- Fig 2 - I am guessing the large dots are the mean and small are individual data points? Please clarify

      - Please specify "Relative CRLB" rather than just "CRLB", in supp. mat as well

      - Fig 3 - title of panel B, I would change "signal" into "concentration"

      - Fig 3 - end of caption: "and levelled to get Signal(tCr,P30)/Signal(MM,P30)=8", I think "in the thalamus" is missing

      - The results section "Biophysical modelling underlines different developmental trajectories of cell microstructure between the cerebellum and the thalamus" is sometimes unprecise, e.g.: "Cerebellum: The sphere fraction and the radius estimated from tNAA diffusion properties vary with age." but the tNAA sphere fraction seems to vary more with age in the thalamus according to table 1 "Cerebellum: fsphere decreases from 0.63 (P10) to 0.41 (P30), but R is stable" this is for tCr I presume

      - Table 1 - "pvalues" please add "before multiple comparison correction"

      - Figure 5 - Panel B, the L-segment subpanel is unclear -which metabolites is it referring to? Why does Tau have a * in panel A?

      - Update Ref 37 to the journal version

      - Methods: "A STELASER (Ligneul et al., MRM 2017) sequence", add numbered reference instead

      - Please specify that the DIVE toolbox uses Gaussian phase distribution approximation, it is important for the dMRS reader given that your diffusion gradient length is long and cannot be neglected, and that the SGP approximation does not apply.

      The Gaussian phase distribution approximation and the SGP approximation are two different concepts. The gradient duration ∂ (7 ms) is short compared to the gradient separation ∆ (100 ms), but it could still be considered too long for the SGP approximation to hold. However, the gradient duration is accounted for in DIVE in any case.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors Eapen et al. investigated the peptide inhibitors of Cdc20. They applied a rational design approach, substituting residues found in the D-box consensus sequences to better align the peptides with the Cdc20-degron interface. In the process, the authors designed and tested a series of more potent binders, including ones that contain unnatural amino acids, and verified binding modes by elucidating the Cdc-20-peptide structures. The authors further showed that these peptides can engage with Cdc20 in the cellular context, and can inhibit APC/CCdc20 ubiquitination activity. Finally, the authors demonstrated that these peptides could be used as portable degron motifs that drive the degradation of a fused fluorescent protein.

      Strengths:

      This manuscript is clear and straightforward to follow. The investigation of different peptide variations was comprehensive and well-executed. This work provided the groundwork for the development of peptide drug modalities to inhibit degradation or apply peptides as portable motifs to achieve targeted degradation. Both of which are impactful.

      Weaknesses:

      A few minor comments:

      (1) In my opinion, more attention to the solubility issue needs to be discussed and/or tested. On page 10, what is the solubility of D2 before a modification was made? The authors mentioned that position 2 is likely solvent exposed, it is not immediately clear to me why the mutation made was from one hydrophobic residue to another. What was the level of improvement in solubility? Are there any affinity data associated with the peptide that differ with D2 only at position 2?

      The reviewer is correct that we have not done any detailed solubility characterisation; we refer only to observations rather than quantitative analysis. We wrote that we reverted from Leu to Ala due to solubility - we have clarified this statement (page 11) to say that that we reverted to Ala, as it was the residue present in D1, for which we observed a measurable affinity by SPR and saw a concentration-dependent response in the thermal shift analysis. We do not have any peptides or affinity data that explore single-site mutations with the parental peptide of D2. D2 is included in the paper because of its link to the consensus D-box sequence and thus was the logical path to the investigations into positions 3 and 7 that come later in the manuscript.

      (2) I'm not entirely convinced that the D19 density not observed in the crystal structure was due to crystal packing. This peptide is peculiar as it also did not induce any thermal stabilization of Cdc20 in the cellular thermal shift assay. Perhaps the binding of this peptide could be investigated in more detail (i.e., NMR?) Or at least more explanation could be provided.

      This section has been clarified (page 16). The lack of observed density was likely due to the relatively low affinity of D19 and also to the lack of binding of the three C-terminal residues in the crystal, and consequently it has a further reduced affinity. The current wording in the manuscript puts greater emphasis on this second aspect being a D19-specific issue, even though it applies to all four soaked peptides. The extent of peptide-induced thermal stabilisations observed by TSA and CETSA is different, with the latter experiment consistently showing smaller shifts. This observation may be due to the more complex medium (cell lysate vs. purified protein) and/or different concentrations of the proteins in solution. In the CETSA, we over-expressed a HiBiT-tagged Cdc20, which is present in addition to any endogenously expressed Cdc20. Although we did not investigate it, the near identical D-box binding sites on Cdc20 and Cdh1 would suggest that there will be cross-specificity, which could further influence the CETSA experiments.

      The section now reads:

      “We therefore assume that this is the reason for the lack of observed density in this region of the peptides D20 and D21 (Fig. S3E and S3F, respectively). We believe that it causes a reduction in binding affinities of all peptides in crystallo, given the evidence from SPR highlighting a role of position 7 in the interaction (Table 1). Interestingly, the observed electron density of the peptide correlates with Cdc20 binding affinity: D21 and D20, having the highest affinities, display the clearest electron density allowing six amino acids to be modeled, whereas D7 shows relatively poor density permitting modelling of only four residues. For D19, the lack of density observed likely reflects its intrinsically weaker affinity compared to the other peptides, in addition to losing the interactions from position 7 due to crystal packing.”

      Reviewer #2 (Public review):

      Summary:

      The authors took a well-characterised (partly by them), important E3 ligase, in the anaphase-promoting complex, and decided to design peptide inhibitors for it based on one of the known interacting motifs (called D-box) from its substrates. They incorporate unnatural amino acids to better occupy the interaction site, improve the binding affinity, and lay foundations for future therapeutics - maybe combining their findings with additional target sites.

      Strengths:

      The paper is mostly strengths - a logical progression of experiments, very well explained and carried out to a high standard. The authors use a carefully chosen variety of techniques (including X-ray crystallography, multiple binding analyses, and ubiquitination assays) to verify their findings - and they impressively achieve their goals by honing in on tight-binders.

      Weaknesses:

      Some things are not explained fully and it would be useful to have some clarification. Why did the authors decide to model their inhibitors on the D-box motif and not the other two SLiMs that they describe?

      For completeness, in addition to the D-box we did originally construct peptides based on the ABBA and KEN-box motifs, but they did not show any shift in melting temperature of cdc20 in the thermal shift assay whereas the D-box peptides did; consequently, we focused our efforts on the D-box peptides. Moreover, there is much evidence from the literature that points to the unique importance of the D-box motif in mediating productive interactions of substrates with the APC/C (i.e. those leading to polyubiquitination & degradation). One of the clearest examples is a study by Mark Hall’s lab (described in Qin et al. 2016), which tested the degradation of 15 substrates of yeast APC/C in strains carrying alleles of Cdh1 in which the docking sites for D-box, KEN or ABBA were mutated. They observed that whereas degradation of all 15 substrates depended on D-box binding, only a subset required the KEN binding site on Cdh1 and only one required the ABBA binding site. A more recent study from David Morgan’s lab (Hartooni et al. 2022) looking at binding affinities of different degron peptides concluded that KEN motif has very low affinity for Cdc20 and is unlikely to mediate degradation of APC/C-Cdc20 substrates. Engagement of substrate with the D-box receptor is therefore the most critical event mediating APC/C activity and the interaction that needs to be blocked for most effective inhibition of substrate degradation.

      We have added the following text to the Results section “Design of D-box peptides” (page 10):

      “We focused on D-box peptides, as there is much evidence from the literature that points to the unique importance of the D-box motif in mediating productive interactions of substrates with the APC/C (i.e. those leading to polyubiquitination & degradation). One of the clearest examples is a study that tested the degradation of 15 substrates of yeast APC/C in strains carrying alleles of Cdh1 in which the docking sites for D-box, KEN or ABBA were mutated ((Qin et al. 2017)). They observed that, whereas degradation of all 15 substrates depended on D-box binding, only a subset required the KEN binding site on Cdh1 and only one required the ABBA binding site. A more recent study (Hartooni et al. 2022) of binding affinities of different degron peptides concluded that KEN motif has very low affinity for Cdc20 and is unlikely to mediate degradation of APC/C-Cdc20 substrates. Engagement of substrate with the D-box receptor is therefore the most critical event mediating APC/C activity and the interaction that needs to be blocked for most effective inhibition of substrate degradation.”

      What exactly do they mean when they say their 'observation is consistent with the idea that high-affinity binding at degron binding sites on APC/C, such as in the case of the yeast 'pseudo-substrate' inhibitor Acm1, acts to impede polyubiquitination of the bound protein'? It's an interesting thing to think about, and probably the paper they cite explains it more but I would like to know without having to find that other paper.

      Interesting results from a number of labs (Choi et al. 2008,  Enquist-Newman et al. 2008,  Burton et al. 2011, Qin et al. 2019) have shown that mutation of degron SLiMs in Acm1 that weaken interaction with the APC/C have the unexpected consequence of converting Acm1 from APC/C inhibitor to APC/C substrate. A necessary conclusion of these studies is that the outcome of degron binding (i.e. whether the binder functions as substrate or inhibitor) depends on factors other than D-box affinity and that D-box affinity can counteract them. One idea is that if a binder interacts too tightly, this removes some flexibility required for the polyubiquitination process. The most recent study on this question (Qin et al.2019) specifically pins the explanation for the inhibitory function of the high affinity D-box in Acm1 on its ‘D-box Extension’ (i.e. residues 8-12) preventing interaction with APC10.  In our current study, the binding affinity of peptides is measured against Cdc20. In cellular assays however, the D-box must also engage APC10 for degradation to occur. It may be that the peptide binding most strongly to the D-box pocket on Cdc20 is less able to bind to APC10 and therefore less effective in triggering APC10-dependent steps in the polyubiquitination pathway. The important Hartooni et al. paper from David Morgan’s lab confirms that even though the binding of D-box residues to APC10 is very weak on its own, it can contribute 100X increase in affinity of a peptide by adding cooperativity to the interaction of D-box with co-activator. Re Figure 6 and the fact that we did look at peptide binding in cells, these experiments were done in unsynchronised cells, so most Cdc20 would not be bound to APC/C.

      We have modified the text (page 18) from:

      “However, we found the opposite effect: D2 and D3 showed increased rates of mNeon degradation compared to D1 and D19 (Fig. 8C,D). This observation is consistent with the idea that high-affinity binding at degron binding sites on APC/C, such as in the case of the yeast ‘pseudo-substrate’ inhibitor Acm1, acts to impede polyubiquitination of the bound protein (Qin et al. 2019). Indeed, there is no evidence that Hsl1, which is the highest affinity natural D-box (D1) used in our study, is degraded any more rapidly than other substrates of APC/C in yeast mitosis. As shown in Qin et al., mutation of the high affinity D-box in Acm1 converts it from inhibitor to substrate (Qin et al. 2019). Overall, our results support the conclusions that all the D-box peptides engage productively with the APC/C and that the highest affinity interactors act as inhibitors rather than functional degrons of APC/C.”

      to:

      “However, we found the opposite effect: D2 and D3 showed increased rates of mNeon degradation compared to D1 and D19 (Fig. 8C,D). This observation is consistent with conclusions from other studies that affinity of degron binding does not necessarily correlate with efficiency of degradation.  Indeed, there is no evidence that Hsl1, which is the highest affinity natural D-box (D1) used in our study, is degraded any more rapidly than other substrates of APC/C in yeast mitosis. A number of studies of a yeast ‘pseudo-substrate’ inhibitor Acm1, have shown that mutation of the high affinity D-box in Acm1 converts it from inhibitor to substrate (Choi et al. 2008,  Enquist-Newman et al. 2008,  Burton et al. 2011) through a mechanism that governs recruitment of APC10 (Qin et al. 2019). Our study does not consider the contribution of APC10 to binding of our peptides to APC/C<sup>Cdc20</sup> complex, but since there is strong cooperativity provided by this additional interaction (Hartooni et al. 2022) we propose this as the critical factor in determining the ability of the different peptides to mediate degradation of associated mNeon.”

      Reviewer #3 (Public review):

      Summary:

      Eapen and coworkers use a rational design approach to generate new peptide-inspired ligands at the D-box interface of cdc20. These new peptides serve as new starting points for blocking APC/C in the context of cancer, as well as manipulating APC/C for targeted protein degradation therapeutic approaches.

      Strengths:

      The characterization of new peptide-like ligands is generally solid and multifaceted, including binding assays, thermal stability enhancement in vitro and in cells, X-ray crystallography, and degradation assays.

      Weaknesses:

      One important finding of the study is that the strongest binders did not correlate with the fastest degradation in a cellular assay, but explanations for this behavior were not supported experimentally. Some minor issues regarding experimental replicates and details were also noted.

      Interesting results from a number of labs (Choi et al. 2008,  Enquist-Newman et al. 2008,  Burton et al. 2011, Qin et al. 2019) have shown that mutation of degron SLiMs in Acm1 that weaken interaction with the APC/C have the unexpected consequence of converting Acm1 from APC/C inhibitor to APC/C substrate. A necessary conclusion of these studies is that the outcome of degron binding (i.e. whether the binder functions as substrate or inhibitor) depends on factors other than D-box affinity and that D-box affinity can counteract them. One idea is that if a binder interacts too tightly, this removes some flexibility required for the polyubiquitination process. The most recent study on this question (Qin et al.2019) specifically pins the explanation for the inhibitory function of the high affinity D-box in Acm1 on its ‘D-box Extension’ (i.e. residues 8-12) preventing interaction with APC10.  In our current study, the binding affinity of peptides is measured against Cdc20. In cellular assays however, the D-box must also engage APC10 for degradation to occur. It may be that the peptide binding most strongly to the D-box pocket on Cdc20 is less able to bind to APC10 and therefore less effective in triggering APC10-dependent steps in the polyubiquitination pathway. The important Hartooni et al. paper from David Morgan’s lab confirms that even though the binding of D-box residues to APC10 is very weak on its own, it can contribute 100X increase in affinity of a peptide by adding cooperativity to the interaction of D-box with co-activator. Re Figure 6 and the fact that we did look at peptide binding in cells, these experiments were done in unsynchronised cells, so most Cdc20 would not be bound to APC/C.

      We have modified the text (page 18) from:

      “However, we found the opposite effect: D2 and D3 showed increased rates of mNeon degradation compared to D1 and D19 (Fig. 8C,D). This observation is consistent with the idea that high-affinity binding at degron binding sites on APC/C, such as in the case of the yeast ‘pseudo-substrate’ inhibitor Acm1, acts to impede polyubiquitination of the bound protein (Qin et al. 2019). Indeed, there is no evidence that Hsl1, which is the highest affinity natural D-box (D1) used in our study, is degraded any more rapidly than other substrates of APC/C in yeast mitosis. As shown in Qin et al., mutation of the high affinity D-box in Acm1 converts it from inhibitor to substrate (Qin et al. 2019). Overall, our results support the conclusions that all the D-box peptides engage productively with the APC/C and that the highest affinity interactors act as inhibitors rather than functional degrons of APC/C.”

      to:

      “However, we found the opposite effect: D2 and D3 showed increased rates of mNeon degradation compared to D1 and D19 (Fig. 8C,D). This observation is consistent with conclusions from other studies that affinity of degron binding does not necessarily correlate with efficiency of degradation.  Indeed, there is no evidence that Hsl1, which is the highest affinity natural D-box (D1) used in our study, is degraded any more rapidly than other substrates of APC/C in yeast mitosis. A number of studies of a yeast ‘pseudo-substrate’ inhibitor Acm1, have shown that mutation of the high affinity D-box in Acm1 converts it from inhibitor to substrate (Choi et al. 2008,  Enquist-Newman et al. 2008,  Burton et al. 2011) through a mechanism that governs recruitment of APC10 (Qin et al. 2019). Our study does not consider the contribution of APC10 to binding of our peptides to APC/C<sup>Cdc20</sup> complex, but since there is strong cooperativity provided by this additional interaction (Hartooni et al. 2022) we propose this as the critical factor in determining the ability of the different peptides to mediate degradation of associated mNeon.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) On page 12 (towards the end), the author stated D10 contained an A3P mutation, they meant P3A right? 'To test this hypothesis, we proceeded to synthesise D10, a derivative of D4 containing an A3P single point mutation.'

      We thank the reviewer for spotting this typo, which we have corrected.

      (2) Have the authors considered other orthogonal approaches to cross-examine/validate binding affinities? That said, I do not think extra experiments are necessary.

      We did not explore further orthogonal approaches due to the challenges of producing sufficient amounts of the Cdc20 protein. Due to the low affinities of many peptides for Cdc20, many techniques would have required more protein than we were able to produce. We believe that the qualitative TSA combined with the SPR is sufficient to convince the readers; indeed there is a correlation between SPR-determined binding affinities and the thermal shifts: For the natural amino acid-containing peptides (Table 1) D19 has the highest affinity and causes the largest thermal shift in the Cdc20 melting temperature, D10 has the lowest affinity and causes the smallest thermal shift, and D1, D3, D4, and D5 and all rank in the middle by both techniques. For those peptides containing unnatural amino acids (Table 2), again higher affinities are reflected in larger thermal shifts.

      Reviewer #2 (Recommendations for the authors):

      The data seem fine to me. I would appreciate a little more detail on the points mentioned in the public review. Also a thorough reread, maybe by a disinterested party as there are various typos that could be corrected - all in all an excellent clear paper that encompasses a lot of work.

      A colleague has carefully checked the manuscript, and typos have been corrected.

    1. Author response:

      We wish to express our gratitude to the reviewers for their insightful and constructive comments on the initial version of our manuscript. We greatly value their observations and have every intention of addressing their remarks in a thorough and constructive manner. Based on the editors’ and reviewers’ feedback, we realize that it was not entirely clear that we intended this work primarily to be a resource and not yield strong insights into DNN-human alignment. Since our method also covers the broad range of natural objects - as used in the vast majority of studies on object processing - we also feel we did not sufficiently highlight the breadth of the tool. Based on the editors’ assessment, our explorations into the limits of the method - which we saw as a strength, not a weakness of our work - perhaps overshadowed the otherwise broad applicability somewhat. We hope to clarify this in the revised manuscript. Beyond these general points, we would like to address the following four points:

      • Where feasible, we intend to undertake additional analyses and refine existing ones. For instance, we plan to provide noise ceilings for all datasets where such calculations are possible, and we plan to give careful consideration to implementing a permutation or label-shuffling test to explore some of the ideas shared by the reviewers.

      • We plan to discuss more thoroughly several topics raised by the reviewers (e.g., how our approach might contend with different experimental situations such when using line drawings as stimuli).

      • We aim to enhance the clarity of our manuscript throughout. This will include refining the wording of our abstract and offering a more detailed explanation of the methods employed in the fMRI analyses.

      • We plan to elaborate further on our line of reasoning by addressing potential sources of misunderstanding—such as clarifying what we mean by a “lack of data” and providing greater detail regarding the nature of the 49-dimensional embedding.

    1. Author response:

      The evidence supporting this mechanism is incomplete, with additional work needed to clarify SHP-1's role, the contribution of Fc receptor crosslinking, and the biological relevance across normal and malignant B cells. 

      We will address these points by:

      - including SHP-1 inhibitors in the DuoHexaBody-CD37 cytotoxicity experiments to address the role of SHP-1

      - investigating which Fc receptors are involved in the crosslinking using FcR blocking antibodies and/or use purified fixed effector cells that express different Fc receptors in the DuoHexaBody-CD37 cytotoxicity experiments 

      - study the effect of DuoHexaBody-CD37 on normal B cells

      As the findings are based primarily on in vitro models, further validation would be required to support broader translational conclusions.

      We would like to refer to previous studies that showed potent cytotoxicity of DuoHexaBody-CD37 in vivo, including xenograft and PDX lymphoma models supporting broader translational conclusions:

      Oostindie et al. Blood Cancer Journal (2020) 10:30 https://doi.org/10.1038/s41408-020-0292-7

    1. Author response:

      We thank the reviewers for their comments and for their constructive suggestions. We intend to submit a revised manuscript where we address the comments made in the Public Reviews as well as in the Recommendations for the Authors.

      One of our most interesting findings, as noted by the reviewers, was the discovery of a small subpopulation of cells likely arrested in G2 that accounts for a disproportionate amount of radiation-induced gene expression. In addition, to the responses indicated below, we are planning to include additional “wet lab” experiments in the revised manuscript that address the properties of this seemingly important subpopulation of cells.

      Reviewer 1:

      Strengths:

      (1) The authors have used robust methods for rearing Drosophila larvae, irradiating wing discs, and analyzing the data with Seurat v5 and HHI.

      (2) These data will be informative for the field.

      (3) Most of the data is well-presented.

      (4) The literature is appropriately cited.

      Thank you for these comments

      Weaknesses:

      (1) The data in Figure 1 are single-image representations. I assume that counting the number of nuclei that are positive for these markers is difficult, but it would be good to get a sense of how representative these images are and how many discs were analyzed for each condition in B-M.

      (2) Some of the figures are unclear.

      In the revised manuscript, we will provide a more detailed quantitative analysis. For each condition, we analyzed 4 - 9 discs.

      We assume that the reviewer in referring to panels in Figure 1. We will review these images and if necessary, repeat the experiments or choose alternative images that appear clearer.

      Reviewer 2:

      Overall, the data presented in the manuscript are of high quality but are largely descriptive. This study is therefore perceived as a resource that can serve as an inspiration for the field to carry out follow-up experiments.

      We intend to include more  “wet lab” experiments in our revised manuscript to address the identity and properties of the high-trbl cells that we have identified using the clustering approach based on cell-cycle gene expression.

      Reviewer 3:

      Strengths:

      Overall, the manuscript makes a compelling case for heterogeneity in gene expression changes that occur in response to uniform induction of damage by X-rays in a single-layer epithelium. This is an important finding that would be of interest to researchers in the field of DNA damage responses, regeneration, and development.

      Thank you.

      Weaknesses:

      This work would be more useful to the field if the authors could provide a more comprehensive discussion of both the impact and the limitations of their findings, as explained below.

      Propidium iodide staining was used as a quality control step to exclude cells with a compromised cell membrane. But this would exclude dead/dying cells that result from irradiation. What fraction of the total do these cells represent? Based on the literature, including works cited by the authors, up to 85% of cells die at 4000R, but this likely happens over a longer period than 4 hours after irradiation. Even if only half of the 85% are PI-positive by 4 hr, this still removes about 40% of the cell population from analysis. The remaining cells that manage to stay alive (excluding PI) at 4 hours and included in the analysis may or may not be representative of the whole disc. More relevant time points that anticipate apoptosis at 4 hr may be 2 hr after irradiation, at which time pro-apoptotic gene expression peaks (Wichmann 2006). Can the authors rule out the possibility that there is heterogeneity in apoptosis gene expression, but cells with higher expression are dead by 4 hours, and what is left behind (and analyzed in this study) may be the ones with more uniform, lower expression? I am not asking the authors to redo the study with a shorter time point, but to incorporate the known schedule of events into their data interpretation.

      We thank the reviewer for these important comments. The generation of single-cell RNAseq data from irradiated cells is tricky. Many cells have already died. Even those that do not incorporate propidium iodide are likely in early stages of apoptosis or are physiologically unhealthy and likely made it through our FACS filters. Indeed, in irradiated samples up to  57% of sequenced cells were not included in our analysis since their RNA content seemed to be of low quality. It is therefore likely that our data are biased towards cells that are less damaged. As advised by the reviewer, we will include a clearer discussion of these issues as well as the time course of events and how our analysis captures RNA levels only at a single time point.

      If cluster 3 is G1/S, cluster 5 is late S/G2, and cluster 4 is G2/M, what are clusters 0, 1, and 2 that collectively account for more than half of the cells in the wing disc? Are the proportions of clusters 3, 4, and 5 in agreement with prior studies that used FACS to quantify wing disc cells according to cell cycle stage?

      Clusters 0, 1, and 2 likely contain cells in other stages of the cell cycle, including early G1. Other studies indicate that more than 70% of cells are expected to have a 4C DNA content 4 h after irradiation at 4000 Rad. The high-trbl cluster only accounts for 18% of cells. Thus clusters 0, 1 and 2 could potentially contain other populations that also have a 4C DNA content. Importantly, similar proportions of cells in these clusters are also observed in unirradiated discs. We are mining the gene expression patterns in these clusters with the goal of estimating their location in the cell cycle and will include those data in the revised manuscript.

      The EdU data in Figure 1 is very interesting, especially the persistence in the hinge. The authors speculate that this may be due to cells staying in S phase or performing a higher level of repair-related DNA synthesis. If so, wouldn't you expect 'High PCNA' cells to overlap with the hinge clusters in Figures 6G-G'? Again, no new experiments are needed. Just a more thorough discussion of the data.

      We have found that the locations of elevated PCNA expression do not always correlate with the location of EdU incorporation either by examining scRNA-seq data or by using HCR to detect PCNA. PCNA expression is far more widespread. We intend to present additional data that address this point and also a more thorough discussion in the revised manuscript.

      Trbl/G2/M cluster shows Ets21C induction, while the pattern of Ets21C induction as detected by HCR in Figures 5H-I appears in localized clusters. I thought G2/M cells are not spatially confined. Are Ets21C+ cells in Figure 5 in G2/M? Can the overlap be confirmed, for example, by co-staining for Trbl or a G2/M marker with Ets21C?

      The data show that the high_-trbl_ cells are higher in Ets21C transcripts relative to other cell-cycle-based clusters after irradiation. This does not imply that high-trbl-cells in all regions of the disc upregulate Ets21C equally. Ets21C expression is likely heterogeneous in both ways – by location in the disc and by cell-cycle state. We will attempt to look for co-localization as suggested by the reviewer.

      Induction of dysf in some but not all discs is interesting. What were the proportions? Any possibility of a sex-linked induction that can be addressed by separating male and female larvae?

      We can separate the cells in our dataset into male and female cells by expression of lncRNA:roX1/2. When we do this, we see X-ray induced dysf expressed similarly in both male and female cells. We think that it is therefore unlikely that this difference in expression can be attributed to cell sex. We are investigating other possibilities such as the maturity of discs.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Epiney et al. use single-nuclei RNA sequencing (snRNA-seq) to characterize the lineage of Type-2 (T2) neuroblasts (NBs) in the adult Drosophila brain. To isolate cells born from T2 NBs, the authors used a genetic tool that specifically allows the permanent labeling of T2-derived cell types, which are then FAC-sorted for snRNA-seq. This effective labeling approach also allows them to compare the isolated T2 lineage cells with T1-derived cell types by a simple exclusion method. The authors begin by describing a transcriptomic atlas for all T1 and T2-derived neuronal and glia clusters, reporting that the T2-derived lineage comprises 161 neuronal clusters, in contrast to the T1 lineage which comprises 114 of them. The authors then use the expression of VAChT, VGlut, Gad1, Tbh, Ple, SerT, and Tdc2 to show that T2 neuroblasts generate all major neuron classes of fast-acting neurotransmitters. Strikingly, they show that a subset of glia and neuronal clusters have disproportionate enrichment in males or females, suggesting that T2 neuroblasts generate sex-biased cell types. The authors then proceed to characterize neuropeptide expression across T2-derived neuronal clusters and argue that the same neuropeptide can be expressed across different cell types, while similar cell types can express distinct neuropeptides. The functional implication of both observations, however, remains to be tested. Furthermore, the authors describe combinatorial transcription factor (TF) codes that are correlated with neuropeptide expression for T2-derived neurons along with an overall TF code for all T2-derived cell types, both of which will serve as an important starting point for future investigations. Finally, the authors map well-studied neuronal types of the central complex to the clusters of their T2-derived snRNA-seq dataset. They use known marker combinations, bulk RNA-seq data and highly specific split-GAL4 driver lines to annotate their T2-derived atlas, establishing a comprehensive transcriptomic atlas that would guide future studies in this field.

      Thanks for the clear and accurate summary of our findings.

      Strengths:

      This study provides an in-depth transcriptomic characterization of neurons and glia derived from Type-2 neuroblast lineages. The results of this manuscript offer several future directions to investigate the mechanisms of diversifying neuronal identity. The datasets of T1-derived and T2-derived cells will pave the way for studies focused on the functional analysis of combinatorial TF codes specifying cell identity, sex-based differences in neurogenesis and gliogenesis, the relationship between neuropeptide (co)expression and cell identity, and the differential contributions of distinct progenitor populations to the same cell type.

      Thank you for the positive comments.

      Weaknesses:

      The study presents several important observations based on the characterization of Type II neuroblast-derived lineages. However, a mechanistic insight is missing for most observations. The idea that there is a sex-specific bias to certain T2-derived neurons and glial clusters is quite interesting, however, the functional significance of this observation is not tested or discussed extensively. Finally, the authors do not show whether the combinatorial TF code is indeed necessary for neuropeptide expression or if this is just a correlation due to cell identity being defined by TFs. Functional knockdown of some candidate TFs for a subset of neuropeptide-expressing cells would have been helpful in this case.

      We agree that we do not provide mechanistic or functional insights. Our goal was to produce hypothesis generating datasets for our lab and others to use to direct functional or mechanistic studies.

      Reviewer #2 (Public review):

      In this manuscript, Epiney et al., present a single-nucleus sequencing analysis of Drosophila adult central brain neurons and glia. By employing an ingenious permanent labeling technique, they trace the progeny of T2 neuroblasts, which play a key role in the formation of the central complex. This transcriptomic dataset is poised to become a valuable resource for future research on neurogenesis, neuron morphology, and behavior.

      Thank you for the positive comments.

      The authors further delve into this dataset with several analyses, including the characterization of neurotransmitter expression profiles in T2-derived neurons. While some of the bioinformatic analyses are preliminary, they would benefit from additional experimental validation in future studies.

      Thank you for the positive comments. We too hope that future research will benefit from this dataset.

      Reviewer #1 (Recommendations for the authors):

      Major points

      (1) In Figures 1E and 4A, the T1 and T2 glia subsets reveal sub-clusters for several cell types as seen by the distribution of points on the UMAP. This observation is never validated or discussed. Do these sub-clusters represent true differences in identities or are they artifacts of the single-nucleus preparation? For Figure 1E, it is not clear whether specific sub-clusters (see Ensheathing-4 vs Ensheathing-5 and Astrocyte-2 vs. Astrocyte-6) are differentially enriched between the T1 and T2 lineages. The existence of these sub-clusters must be discussed or dismissed.  

      We agree that this needs to be addressed more clearly in the manuscript and have made text changes in the Results and Discussion sections to clarify. We note that a recent glial cell atlas (Lago-Baldaia et al., 2023: PMID: 37862379) of the developing fly VNC and optic lobes found sub-clusters that mapped to the same subtype annotations. Interestingly, Lago-Baldaia and colleagues found that the transcriptional diversity of glia cell types did not match the morphological diversity of glia validated in vivo. See text changes below.

      Lines 131-133: “Similar to a previous glial cell atlas (Lago-Baldaia et al., 2023) we found some glial subtypes (astrocytes, ensheathing, and subperineurial) mapped to multiple clusters (Figure 1E, 1F).”

      Lines 206-208: “In line with our T1+T2 atlas and previous glia cell atlas (Lago-Baldaia et al., 2023), some subtypes mapped to several subclusters including ensheathing, astrocytes, and chiasm (Figure 4A-B).”

      Lines 397-401: “Similar to a recent glial cell atlas (Lago-Baldaia et al., 2023), we found glial subtypes like astrocytes, ensheathing, and subperineurial glia mapped to several sub-clusters (Figure 1E-F). It remains unclear if these sub-clusters with the same cell type annotation represent distinct glial identities or different transcriptional states within these populations.”

      (2) The authors present evidence for sex-specific neuronal and glia subtypes and find differential expression of specific yolk proteins and long non-coding RNAs. However, whether any of these differences are driven by other canonical sex-specific genes such as Fruitless (Fru) or Double-sex (Dbx) has not been reported or discussed. The authors must re-analyze their data for these genes and claim whether they have any contribution to sex-specific sub-clusters.

      Thank you for pointing this out. We have made text changes and clarifications to highlight the expression of other canonical sex-specific genes. Fru was enriched in male nuclei as expected. Interestingly, dbx was enriched in female nuclei. It remains to be determined if these genes are mechanisms that may be driving sex-specific changes.

      Lines 224-226: “Additionally, female nuclei were enriched for dbx (Supp Table 8). Male glial nuclei expressed higher levels of genes including the male-specific genes lncRNA:rox1/2 and fru (Figure 5C; Supp Table 8) (Ryner et al., 1996; Amrein and Axel, 1997; Meller et al., 1997).”

      Lines 237-239: “Male nuclei expressed higher levels of genes including the male-specific genes lncRNA:rox1/2 and fru (Figure 5G; Supp Table 9) (Ryner et al., 1996; Amrein and Axel, 1997; Meller et al., 1997).”

      Lines 428-431:” We found the expected differential expression of yolk proteins (yp1, yp2, yp3) enriched in female nuclei and the long non-coding RNAs rox1/2 and fru enriched in male neuronal nuclei (Ryner et al., 1996; Amrein and Axel, 1997; Meller et al., 1997; Warren et al., 1979). Interestingly, we found dbx to be enriched in both glial and neuronal female nuclei.”

      Lines 433-435: “It remains to be determined if these genes are driving these sex-specific differences in glia and neurons.”

      (3) In Figure 6C, it is unclear whether the Ms-2A-LexA-expressing neurons of clusters 157 and 160 project to two different neuropils or share projects to both neuropils. However, it is not explicitly shown in the immunostaining data whether indeed there are two populations to begin with. The authors must check for cluster 157 and 160 specific markers (such as Dh44 and ple) and test whether they appear mutually exclusively in the Ms-2A-LexA-expressing neurons. The same reasoning would apply to the data shown in Figures 6D and 6E, where the authors must test whether the NPF and AstA expressing cells are indeed neurons from clusters 100 and 128, using orthogonal cluster markers to conclude that they are similar (or the same) neurons.

      We changed the focus of the paragraph to confirm that these neurons indeed come from type II and that they target the central complex. Although due to the lack of reagents we cannot test the identity of each one of these neurons, we could make meaningful interpretations of the staining to validate our ideas about neuropeptidergic cells in the central complex. We made sure to mention the limitation of our experiment to avoid any wrong conclusions.

      Minor points

      (1) Line 115 - "cluster that represents optic lobe neurons". How was this cluster identified?

      We reexamined the most significant genes enriched in this cluster 124, and found they are Rh2, ninaC, trpl, and phototransduction related genes (Supplemental table 1). We reassigned the identity of this cluster as ocelli, which also express photoreceptor genes but can’t be easily removed during dissection. We modified the text as follows:

      "We used known markers (Croset et al., 2018; Davie et al., 2018; Supp Table 2) to identify distinct cell types in the central brain, including glia, mushroom body neurons, olfactory projection neurons, clock neurons, Poxn+ neurons, serotonergic neurons, dopaminergic neurons, octopaminergic neurons, corazonergic neurons, hemocytes, and ocelli (Figure 1B, Supp. Table 1)."

      (2) As the separation in Figure 1B is not obvious, annotated cell type clusters must be re-colored instead of being labelled as the exact dots are indistinguishable. This would especially be helpful for OCTY, SER, OPN, and CLK clusters.

      (3) Cluster labels in Figure 1C are barely visible and the font size must be increased for the reader. Recoloring the cluster identities and attaching a legend would again help in this case.

      We recolored the atlas in Figure 1B, 1C and 1C’ and increased the font size in Figure 1C’.

      (4) For Figure 4A, clusters should be labelled on the UMAP along with the legend as it is difficult for the reader to match identities using Seurat colors. The same is true for the UMAPs in Figure 5A.

      Yes, we agree that labeling would improve readability and have done so for UMAPs in Figure 4A and 5A-A’’.

      Reviewer #2 (Recommendations for the authors):

      In this manuscript, Epiney et al., present a single-nucleus sequencing analysis of adult central brain neurons and glia Through the use of a ingenious permanent labeling technique, they are able to trace the progeny of T2 neuroblasts, which contribute significantly to the formation of the central complex. This transcriptomic dataset is the first of its kind and will likely serve as a valuable resource for future studies.

      The authors further explore this dataset through several analyses, including the characterization of neurotransmitter expression profiles in T2-derived neurons. However, the approach used to identify the identity of each neuron cluster could be more clearly articulated, and some of the authors' conclusions are more generalized - either already well-established or lacking sufficient support.

      Detailed comments:

      Abstract - "Our data support the hypothesis that each transcriptional cluster represents one or a few closely related neuron subtypes. - Is this a novel finding? If so, it would be helpful if the authors could explain why this is the case more clearly.

      Our results are not generally novel, and many single cell/single nuclei RNA-seq papers have been published (more citations added to Introduction). Our work is novel in that we analyze Type 1 and Type 2 neuroblasts in the central brain.

      Line 53 - In the introduction the authors should also reference other single-cell studies done in the Drosophila brain.

      Done.

      Line 59 - There are some typos here. The authors could also mention type zero.

      Both done.

      Figure 1 and Sup Table 1 - Authors show in sup table 1 the top cell markers by cluster but there is no correspondence between cluster number and identity. The authors do not say which known markers were used to give the identity to each cluster.

      We have added the cell identity in the Supplemental Table 1. For the unknown cells, we left the column blank. We have also added a Supplemental Table 2 to show the markers we used to give identity to the clusters.

      Supplementary Tables - For each table, more detailed information should be provided regarding what is being compared and the methods used for these comparisons.

      We have added the methods we used in Seurat to generate each individual table.

      Line 138 - Differential gene expression analysis between T1 and T2 glial progeny did not show differences across any glial cell types (Supp Table 4). - Was this comparison done per cluster? Is differential gene expression of top markers, which are anyway the genes that define each glial cell type, enough for this type of analysis?

      Yes, we performed the differential expression analysis using all genes (i.e., not just marker defining) at a cluster-by-cluster resolution with results in Supplemental Table 4. We have edited the text to make this clarification.

      Lines 139-141: “Differential gene expression analysis for all genes between T1 and T2 glial progeny did not show differences across any glial cell types or clusters (Supp Table 4).”

      Line 146 - We identified T1-derived neurons by excluding cells co-expressing T2-specific. Markers FLP+/GFP+/RFP+ plus repo+ glial clusters. - Bioinformatically, correct?

      Yes. We clarified the sentence as follows:

      "We identified T1-derived neurons by bioinformatically excluding cells co-expressing T2-specific markers FLP+/GFP+/RFP+ plus repo+ glial clusters."

      Line 156 - We found that each cluster strongly expressed a unique combination of genes. - As they are grouped by seurat in different clusters, why is this surprising?

      Line 175 - "top 10 significantly enriched genes gathered from each T2 neuron cluster" - can these lists be included?

      Yes they are grouped by Seurat. We toned down the sentence and refer each combination of genes as cluster markers. We modified the sentences as follows:

      Each unique combination of enriched genes could be referred to as cluster markers.

      Line 211- How did the authors identify sex-biased clusters? How did the authors separate the samples/cells by sex? Was it done bioinformatically by the expression of certain genes? If so, which?

      We collected male and female nuclei separately. We have added text in the methods section as follows:

      "Equal amounts of male and female central brains (excluding optic lobes) were dissected at room temperature within 1 hour. The samples were flash-frozen in liquid nitrogen and stored separately at -80°.

      In the first round, we pooled male and female brains together to select GFP+ nuclei and used particle-templated instant partitions to capture single nuclei to generate cDNA library (Fluent BioSciences, Waterton, MA). In the second and third rounds, RFP+ nuclei from male and female brains were collected separately. The split-pool method was then used to generate barcoded cDNA libraries from each individual nucleus."

      Are there sex-specific differences in genes in glia other than genes that were previously known to be sex-specific?

      We report the comprehensive list of sex-specific differences in gene expression for both glia and neurons in Supp tables 8 and 9.

      Line 237 - When the authors mention "We conclude that male and female adult T2 neurons have sex-specific differences in gene expression within the same neuronal subtype" does this mean that these neurons are the same in male and in female brains, but they additionally specifically express sex-specific genes?

      Yes, we report that male and females contain the same neurons defined by their transcriptional profile. It remains to be seen if this sex-specific differences changes how these same neuronal subtypes function between male and females. We have added additional text in the discussion to expand on this thought.

      Lines 437-441: “It remains to be determined if these genes are driving sex-specific differences within glial and neuronal subtypes. These genes may reflect sex-specific differences in the adult central brain and may provide insight into how behavioral circuits are linked to sex-specific behaviors. Future work should aim to characterize and test these genes.”

      Line 250 - The idea behind these sections "What is the relationship between neuropeptide expression and cluster identity?" "relation between cluster and morphology" lacks clarity. As clusters are defined based on principal component analysis, and the genes used to define a cluster are dependent on this method, there is no assumption that each cluster represents only one type of neuron or that it should include only neurons expressing the same neurotransmitter genes. Even if some clusters consist of a single neuron type, this should not be generalized to all clusters (and vice-versa).

      Correct, we cannot determine from the transcriptome data whether distinct clusters will have different morphology. We have changed the focus of the question to address that we are confirming they come from type 2 and that they target the central complex while comparing to known cells that express the neuropeptide.

      Line 265 - We first assayed the neuronal morphology of Ms+ neurons - why did the authors choose these neurons?

      Resolved in main text: “we found that type II-derived Ms-2A-LexA-expressing neurons project to multiple layers of the dorsal fan-shaped body and the entire ellipsoid body, suggesting an unknown class of Ms+ neurons targeting to EB and/orFB".

      Line 268 - "Currently we can't determine whether Ms+ neurons in clusters 157 and 160 project to different CX neuropils, or whether neurons from both clusters share projections into both neuropils. " - The purpose of this point is unclear.

      Resolved in text: “we found that type II-derived Ms-2A-LexA-expressing neurons project to multiple layers of the dorsal fan-shaped body and the entire ellipsoid body, suggesting an unknown class of Ms+ neurons targeting to EB and/or FB”.

      Line 279 - This analysis could be more explored.

      Thank you for your feedback. As the comment was somewhat broad, we were unsure of the specific revisions needed and have therefore left the text unchanged.

      Line 301 - The text regarding this section, and the description and details of respective figures should be proofread to ensure clarity.

      Done.

      Line 386 - Alternatively, co-expression may be due to background from RNAs released during dissociation. - RNA in soup could be bioinformatically analysed.

      Correct. We opted to delete this sentence since our split-pool based method does not create background RNA expression. Additionally, the analysis is performed on scaled expression >2, and any background RNA is unlikely to yield such high expression.

      Discussion - Some of the conclusions are a bit too general, suggesting that the results might be meaningful, but also acknowledging the possibility of artifacts. If the authors could refine this, it would strengthen the manuscript.

      We are sorry but we are uncertain what you are asking; we don't know what you want us to refine. Our apologies for the misunderstanding.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      This review evaluates the SCellBOW framework, which applies phenotype algebra to obtain vectors from cancer subclusters or user-defined subclusters.

      Strengths:

      SCellBOW employs an innovative application of NLP-inspired techniques to analyze scRNA-seq data, facilitating the identification and visualization of phenotypically divergent cell subpopulations. The framework demonstrates robustness in accurately representing various cell types across multiple datasets, highlighting its versatility and utility in different biological contexts. By simulating the impact of specific malignant subpopulations on disease prognosis, SCellBOW provides valuable insights into the relative risk and aggressiveness of cancer subpopulations, which is crucial for personalized therapeutic strategies. The identification of a previously unknown and aggressive AR−/NElow subpopulation in metastatic prostate cancer underscores the potential of SCellBOW in uncovering clinically significant findings.

      Major concerns:

      The reliance on bulk RNA-seq data as a reference raises concerns about potentially misleading results due to the presence of RNA expression from immune cells in the TME. It is unclear if SCellBOW adequately addresses this issue, which could affect the accuracy of the cancer subcluster vectors.

      We appreciate the reviewer's concerns. To address the concern about potentially misleading results due to the TME when using bulk RNA-seq data as a reference:

      a. We account for systematic biases between the single-cell and bulk transcriptomics readouts by creating pseudo-bulk profiles for single-cell clusters, enabling more accurate comparisons [Section Materials and methods, Data preparation for phenotype algebra].

      b. We encode expressions into word vectors and co-embed them together. By doing this, we mitigate any possibility of systematic differences in the embedding. It is imperative that we subject both single-cell and bulk data through the same treatments because otherwise, it will be difficult to perform algebraic operations on them [Section Materials and methods, Generating vectors for phenotype algebra].

      c. In our new analysis of the tumor microenvironment, we have shown that SCellBOW effectively differentiates between malignant and non-malignant cells, confirming that it is not biased by the immune cell composition in the bulk RNA-seq data [Section SCellBOW facilitates survival-risk attribution of tumor subpopulations, Fig. 5g-h].

      The method of extracting vectors in phenotype algebra appears to be a straightforward subtraction operation. This simplicity might limit its efficiency in excluding associations with phenotypes from specific subpopulations, potentially leading to inaccurate interpretations of the data.

      Thanks for this excellent query. Vector algebra operations are not done in the gene expression space (i.e., gene expression vectors associated with tumor samples), rather we process the single cell and bulk expression profiles through multiple steps (pseudo-bulk vector generation for single cell clusters, mapping gene expression values to word frequencies as better understood by the Doc2vec neural networks etc.) to ensure their embeddings are consistent and capture intricate phenotypic information. We have demonstrated this through rigorous validation of the clusters yielded on various types of healthy and diseased samples. Furthermore, we have demonstrated the consistency of the vector algebra operations on known cancer subtypes in breast cancer, glioblastoma, and prostate cancer. We have clarified this further in text. [Section Materials and methods, ‘Generating vectors for phenotype algebra’, ‘Survival risk attribution’].

      The review would benefit from additional validation studies to assess the effectiveness of SCellBOW in distinguishing between cancerous and non-cancerous signals, particularly in heterogeneous tumor environments.

      We thank the reviewer for advising this additional validation. While our study primarily focused on signals from malignant cells, we have now considered the impact of the tumor microenvironment. We observed that the predicted risk score increases when the immune component is subtracted from the tumor, suggesting that tumor aggressiveness increases in the absence of immune components. Importantly, the aggressiveness ranking of tumor subtypes (NE > ARAL > ARAH) remained consistent, confirming that SCellBOW effectively preserves subtype-specific risk stratification [Section SCellBOW facilitates survival-risk attribution of tumor subpopulations, Fig. 5g-h].

      Further clarification on how SCellBOW handles mixed-cell populations within bulk RNA-seq data would strengthen the evaluation of its applicability and reliability in diverse research settings.

      We really appreciate the reviewer’s observation. We clarify that rather than relying on absolute gene expression values, SCellBOW maps bulk RNA-seq data into an embedding space, where we extract the latent representation of the tumor. This process effectively masks the influence of mixed-cell populations, reducing biases introduced by immune or stromal components. Furthermore, phenotype algebra operates within this embedding space by comparing cosine similarities between latent representations of bulk and pseudo-bulk datasets, rather than using direct gene expression values. This allows SCellBOW to capture biologically meaningful relationships and infer tumor-specific signals effectively, even in the presence of heterogeneous cell populations. Our benchmarking across diverse cancer types confirms its effectiveness [Section Results, ‘SCellBOW enables pseudo-grading of metastatic prostate cancer tumor microenvironment’, ‘Unsupervised risk-stratification of metastatic prostate cancer clusters using SCellBOW’].

      Reviewer #2 (Public Review):

      The authors developed a novel tool, SCellBOW, to perform cell clustering and infer survival risks on individual cancer cell clusters from the single-cell RNA seq dataset. The key ideas/techniques used in the tool include transfer learning, bag of words (BOW), and phenotype algebra which is similar to word algebra from natural language processing (NLP). Comparisons with existing methods demonstrated that SCellBOW provides superior clustering results and exhibits robust performance across a wide range of datasets. Importantly, a distinguishing feature of SCellBOW compared to other tools is its ability to assign risk scores to specific cancer cell clusters. Using SCellBOW, the authors identified a new group of prostate cancer cells characterized by a highly aggressive and dedifferentiated phenotype.

      Strengths:

      The application of natural language processing (NLP) to single-cell RNA sequencing (scRNA-seq) datasets is both smart and insightful. Encoding gene expression levels as word frequencies is a creative way to apply text analysis techniques to biological data. When combined with transfer learning, this approach enhances our ability to describe the heterogeneity of different cells, offering a novel method for understanding the biological behavior of individual cells and surpassing the capabilities of existing cell clustering methods. Moreover, the ability of the package to predict risk, particularly within cancer datasets, significantly expands the potential applications.

      Major concerns:

      Given the promising nature of this tool, it would be beneficial for the authors to test the risk-stratification functionality on other types of tumors with high heterogeneity, such as liver and pancreatic cancers, which currently lack clinically relevant and well-recognized stratification methods. Additionally, it would be worthwhile to investigate how the tool could be applied to spatial transcriptomics by analyzing cell embeddings from different layers within these tissue

      (1) We completely agree with the reviewer’s view. Our selection of glioblastoma and breast cancer for this study was primarily driven by the focus on extensively studied and well-defined cancer types. To demonstrate the effectiveness of our model, we tested it on advanced prostate cancer, which currently lacks clinically relevant and well-recognized stratification methods. This application to metastatic prostate cancer serves as a proof of concept, illustrating our model's potential to provide valuable insights into cancer types where established stratification approaches are limited or absent.

      (2) Regarding the application of our tool to spatial transcriptomics, we have already analyzed data from Digital Spatial Profiling (DSP). The article is already quite complex and involved, and we are afraid the inclusion of spatial transcriptomics may amount to a significant extension of the method. To this end, although we will discuss the future possibilities, we will skip the method validity check on spatial transcriptomics data.

      Reviewer #2 (Recommendations For The Authors):

      (1) "SCellBOW adapts the popular document-embedding model Doc2vec for single-cell latent representation learning, which can be used for downstream analysis...": Using only simple gene frequency might overlook the dependent relationships between genes, potentially compromising the biological significance. This could be discussed further.

      This is an excellent point raised by the reviewer. We acknowledge that using only simple gene frequency may overlook dependent relationships between genes, potentially compromising biological significance. To address this, we have now compared SCellBOW on the specific task of phenotype algebra and demonstrated its effectiveness in capturing meaningful biological relationships which is overlooked by simple gene frequency. We have now added the results of this comparison and showed that gene expression data alone couldn't cut it for accurate risk stratification [Section Overall discussion, Supplementary Note 7, Supplementary Fig. 8i-k].

      (2) "While existing methods effectively reveal the subpopulations, they are insufficient in associating malignant risk with specific cellular subpopulations identified from scRNA-seq data....": Perhaps I missed it in the methods section, but how does SCellBOW compare to simply performing pseudobulk analysis on separate cell clusters, treating them as bulk RNA-seq, and then associating the signatures with disease prognosis?

      This is an insightful point, and we appreciate the opportunity to clarify it.

      (1) While pseudobulk analysis on separate cell clusters, followed by associating their signatures with disease prognosis, is a common approach, SCellBOW achieves this without requiring a priori knowledge of prognostic biomarkers to determine whether a subpopulation is aggressive.

      (2) Moreover, pseudobulk analysis aggregates gene expression across cells, which can potentially mask intra-cluster heterogeneity, thereby obscuring important signatures associated with disease prognosis. In contrast, the latent representation in SCellBOW captures the semantic meaning of disease aggressiveness, allowing for a more nuanced and biologically meaningful risk assessment.

      (3) "The proposed approach, SCellBOW, can effectively capture the heterogeneity and risk associated with each phenotype, enabling the identification and assessment of malignant cell subtypes in tumors directly from scRNA-seq gene expression profiles, thereby eliminating the need for marker genes...": Have the author compared the resulting group with well-known markers and do they overlap?

      We appreciate this thoughtful question. While SCellBOW does not rely on predefined marker genes for clustering or risk stratification, we have systematically evaluated whether the resulting subpopulations align with well-known markers. To assess this, we compared SCellBOW-derived clusters with established marker-based annotations across multiple datasets. We observed a significant overlap between SCellBOW clusters and canonical marker-defined cell types in various cancers, including GBM, BRCA, and mCRPC.

      (4) "We constructed three use cases leveraging publicly available scRNA-seq datasets...": The three training and testing datasets are all from healthy tissue. How about in tumor tissue? i.e., Could SCellBOW also identify better cell clusters in tumor datasets?

      We appreciate the reviewer’s inquiry. For benchmarking and method validation, we primarily selected normal tissue datasets as they are heavily annotated and well-characterized. Our goal was to extensively evaluate SCellBOW across different clustering metrics, including ARI, NMI, and SI, which required datasets with reliable ground truth. Tumor datasets, in contrast, often lack confirmatory ground truth, making direct benchmarking more challenging. However, to assess SCellBOW’s applicability in tumor settings, we performed downstream analyses on tumor scRNA-seq datasets using phenotype algebra. Our results demonstrate that SCellBOW effectively identifies distinct cell clusters, including malignant and non-malignant populations, reinforcing its applicability in tumor settings [Section Results, ‘Unsupervised risk-stratification of metastatic prostate cancer clusters using SCellBOW’].

      Minor issues:

      (1) Labels of subplots within the manu/figure should be revised to ensure correct order (missing Figures 3a-d, 4b before 4a, etc).

      We thank the reviewer for pointing this out. We have corrected the figure labels and ensured that all subplots follow the correct order, aligning with the manuscript.

      (2) "reaffirmed the clinically known aggressiveness order, i.e., CLA >-MES >-PRO, where CLA succeeds the rest of the subtypes in aggressiveness48 (Figures 4c, d)...": "Fig. 4c, d" should be "Fig. 4e, f". Also please put Figure 4a before 4b. Overall the order of Figure 4 needs to be revised to match the order in the manu. Similar to Figure 6.

      We have corrected the figure reference to Fig. 4e, f and revised the order of Figure 4 to maintain consistency with the manuscript.

      (3) "Our results showed that SCellBOW learned latent representation of single-cells accurately captures the 'semantics' associated with cellular phenotypes and allows algebraic operations such as'+' and'-'." Figure 5f (SCellBOW performances on mCRPC) should also be cited here since Supplementary Figure 6 contains three datasets (GBM, BRCA, mCRPC) while in Figure 4 only GBM and BRCA were shown?

      We thank the reviewer for this suggestion. We have now cited Figure 5f in this section to ensure that all datasets, including mCRPC, are appropriately referenced.

      (4) Under the subheading "SCellBOW facilitates survival-risk attribution of tumor subpopulations", the lines start with "We refer to this as phenotype algebra. We utilized this ability to find an association between the embedding vectors, representing total tumor - a specific malignant cell cluster with tumor aggressiveness..." could be reduced a little bit especially the re-intro of phenotype algebra since the author has already discussed previously (under "overview of SCellBOW").

      We appreciate the feedback and have condensed this section to avoid redundancy while maintaining clarity in connecting phenotype algebra to survival-risk attribution.

      (5) "Most CD4+ T cells map to CL0 and CL9 (here, CL is used as an abbreviation for cluster) (Figure 3f)..." "(here, CL is used as an abbreviation for cluster)" this note could be moved forward to SF2 since CL is first introduced in SF2.

      We thank the reviewer for the suggestion. We have moved the definition of CL (cluster) to Supplementary Figure 2 (SF2), where it is first introduced, for improved clarity.

    1. Author response:

      We sincerely thank the editor and both reviewers for their time and thoughtful feedback on our manuscript. We have addressed several of the concerns in the responses below and are currently working on additional analyses to further strengthen the study. These results will be incorporated into the final version of the research paper.

      Reviewer #1 (Public review):

      Summary:

      The authors investigated the population structure of the invasive weed Lantana camara from 36 localities in India using 19,008 genome-wide SNPs obtained through ddRAD sequencing.

      Strengths:<br /> The manuscript is well-written, the analyses are sound, and the figures are of great quality.

      Weaknesses:

      The narrative almost completely ignores the fact that this plant is popular in horticultural trade and the different color morphs that form genetic populations are most likely the result of artificial selection by humans for certain colors for trade, and not the result of natural selfing. Although it may be possible that the genetic clustering of color morphs is maintained in the wild through selfing, there is no evidence in this study to support that. The high levels of homozygosity are more likely explained as a result of artificial selection in horticulture and relatively recent introductions in India. Therefore, the claim of the title that "the population structure.. is shaped by its mating system" is in part moot, because any population structure is in large part shaped by the mating system of the organism, but further misleading because it is much more likely artificial selection that caused the patterns observed.

      The reviewer raises the possibility that the observed genetic patterns may have originated through the selection of different varieties by the horticultural industry. While it is plausible that artificial selection can lead to the formation of distinct morphs, the presence of a strong structure between them in the wild populations cannot be explained just based on selection. In the wild, different flower colour variants frequently occur in close physical proximity and should, in principle, allow for cross-fertilization. Over time, this gene flow would be expected to erode any genetic structure shaped solely by past selection. However, our results show no evidence of such a breakdown in structure. Despite co-occurring in immediate proximity, the flower colour variants maintain distinct genetic identities. This suggests the presence of a barrier to gene flow, likely maintained by the species' mating system. Moreover, the presence of many of these flower colour morphs in the native range—as documented through observations on platforms like iNaturalist—suggests that these variants may have a natural origin rather than being solely products of horticultural selection.

      While it is plausible that horticultural breeding involved efforts to generate new varieties through crossing—resulting in the emergence of some of the observed morphs—even if this were the case, the dynamics of a self-fertilizing species would still lead to rapid genetic structuring. Following hybridization, just a few generations of selfing are sufficient to produce inbred lines, which can then maintain distinct genetic identities. As discussed in our manuscript, such inbred lines could be associated with specific flower colour morphs and persist through predominant self-fertilization. This mechanism provides a compelling explanation for the strong genetic structure observed among co-occurring flower colour variants in the wild.

      While a recent bottleneck may have increased inbreeding, the strong and consistent genetic structuring we observe within populations is more indicative of predominant self-fertilization. To further validate this, we conducted a bagging experiment on Lantana camara inflorescences to exclude insect-mediated cross-pollination. The results showed no significant difference in seed set between bagged and open-pollinated flowers, supporting the conclusion that L. camara is primarily self-fertilizing in India.

      As the reviewer rightly points out, the mating system of a species plays a crucial role in shaping patterns of genetic structure. However, in many natural populations, structuring patterns are often influenced by a combination of factors such as selection, barriers to gene flow, and genetic drift. In some cases, the mating system exerts a more prominent influence at the microgeographic level, while in others, it can shape genetic structure at broader spatial scales. What is particularly interesting in our study is that - the mating system appears to shape genetic structure at a subcontinental scale. Despite the species having undergone other evolutionary forces—such as a genetic bottleneck and expansion due to its invasive nature—the mating system exerts a more pronounced effect on the observed genetic patterns, and the influence of the mating system is remarkably strong, resulting in a clear and consistent genetic structure across populations.

      Reviewer #2 (Public review):

      Summary:

      The authors performed a series of population genetic analyses in Lantana camara using 19,008 genome-wide SNPs data from 359 individuals in India. They found a clear population structure that did not show a geographical pattern, and that flower color was rather associated with population structure. Excess of homozygosity indicates a high selfing rate, which may lead to fixation of alleles in local populations and explain the presence of population structure without a clear geographic pattern. The authors also performed a forward simulation analysis, theoretically confirming that selfing promotes fixation of alleles (higher Fst) and reduction in genetic diversity (lower heterozygosity).

      Strengths:

      Biological invasion is a critical driver of biodiversity loss, and it is important to understand how invasive species adapt to novel environments despite limited genetic diversity (genetic paradox of biological invasion). Lantana camara is one of the hundred most invasive species in the world (IUCN 2000), and the authors collected 359 plants from a wide geographical range in India, where L. camara has invaded. The scale of the dataset and the importance of the target species are the strengths of the present study.

      Weaknesses:

      One of the most critical weaknesses of this study would be that the output modelling analysis is largely qualitative, which cannot be directly comparable to the empirical data. The main findings of the SLiM-based simulation were that selfing promotes the fixation of alleles and the reduction of genetic diversity. These are theoretically well-reported knowledge, and such findings themselves are not novel, although it may have become interesting these findings are quantitatively integrated with their empirical findings in the studied species. In that sense, a coalescent-based analysis such as an Approximate Bayesian Computation method (e.g. DIY-ABC) utilizing their SNPs data would be more interesting. For example, by ABC-based methods, authors can infer the split time between subpopulations identified in this study. If such split time is older than the recorded invasion date, the result supports the scenario that multiple introductions may have contributed to the population structure of this species. In the current form of the manuscript, multiple introductions were implicated but not formally tested.

      Through our SLiM simulations, we aimed to demonstrate that a pattern of strong genetic structure within a location—similar to what we observed in Lantana camara—can arise under a predominantly self-fertilizing mating system. These simulations were not parameterized using species-specific data from Lantana but were intended as a conceptual demonstration of the plausibility of such patterns under selfing using SNP data. While the theoretical consequences of self-fertilisation have been widely discussed, relatively few studies have directly modelled these patterns using SNP data. Our SLiM simulations contribute to this gap and support the notion that the observed genetic structuring in Lantana may indeed result from predominant self-fertilisation.

      We thank the reviewer for the suggestion regarding the use of simulations based on genomic data from Lantana and for explaining the importance of it. We are currently conducting demographic simulations using genomic data from Lantana to estimate divergence times between the different flower colour variants. We believe this analysis will offer deeper insights and provide further clarity on the points raised by the reviewers.

      I also have several concerns regarding the authors' population genetic analyses. First, the authors removed SNPs that were not in Hardy-Weinberg equilibrium (HWE), but the studied populations would not satisfy the assumption of HWE, i.e., random mating, because of a high level of inbreeding. Thus, the first screening of the SNPs would be biased strongly, which may have led to spurious outputs in a series of downstream analyses.

      Hardy-Weinberg Equilibrium (HWE) filtering is a commonly used step in SNP filtering analysis to exclude loci potentially under selection, thereby enriching for neutral variants and minimizing bias in downstream analyses. To ensure that our results are not influenced by selection-driven SNPs, we conducted the analysis both with and without applying the HWE filter. Notably, the number of SNPs retained did not drop significantly after filtering, and the overall patterns observed remained consistent across both approaches.

      Second, in the genetic simulation, it is not clear how a set of parameters such as mutation rate, recombination rate, and growth rate were determined and how they are appropriate. Importantly, while authors assume the selfing rate in the simulation, selfing can also strongly influence the effective mutation rate (e.g. Nordborg & Donnelly 1997 Genetics, Nordborg 2000 Genetics). It is not clear how this effect is incorporated in the simulation.

      The aim of the SLiM simulation was to demonstrate that the extreme genetic structuring observed in Lantana camara can plausibly arise in natural systems under predominant self-fertilization. For the simulation, we used mutation and recombination rates estimated for Arabidopsis thaliana, as these parameters are currently unknown for Lantana. The details of this will be added in the revised version, and thanks to the reviewer for pointing this out. While we acknowledge that this simulation does not provide an exact representation of the species' evolutionary history, the goal of the simulation was not to produce precise estimates but rather to illustrate the feasibility of such strong genetic structuring resulting from self-fertilization alone. The impact of the selfing on the mutation rate is not incorporated in the simulations now. We will look into the details of this.

      Third, while the authors argue the association between flower color and population structure, their statistical associations were not formally tested.

      We recognize that one of the key improvements needed for the manuscript is to provide experimental evidence supporting self-fertilization. To address this, we conducted a bagging experiment on Lantana camara inflorescences to prevent insect visitation and eliminate insect-mediated cross-fertilization. The results showed no significant difference in seed set between bagged and open-pollinated inflorescences, indicating that Lantana is predominantly self-fertilizing in India. This finding is consistent with our genetic data and will be included in the revised version of the manuscript.

      Also, it is not mentioned how flower color polymorphisms are defined. Could it be possible to distinguish many flower color morphs shown in Figure 1b objectively? I am concerned particularly because the authors also mentioned that flower color may change temporally and that a single inflorescence can have flowers of different colors (L160).

      The different flower colour variants are visually distinguishable. Our classification of these variants is not based on the colour of individual flowers at a single time point, but rather on the overall colour change pattern across the inflorescence over time. In other words, the temporal aspect of colour change has been considered in our grouping. For example, in the “yellow-pink” variant, flowers begin as yellow when young and gradually turn pink as they age. Importantly, variants that follow this pattern do not transition to an orange type at any stage, which distinguishes them from other colour types. The varieties that don't change colours are named based on the single flower colour like “orange”.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      The authors present an algorithm and workflow for the inference of developmental trajectories from single-cell data, including a mathematical approach to increase computational efficiency. While such efforts are in principle useful, the absence of benchmarking against synthetic data and a wide range of different single-cell data sets make this study incomplete. Based on what is presented, one can neither ultimately judge if this will be an advance over previous work nor whether the approach will be of general applicability.

      We thank the eLife editor for the valuable feedback. Both benchmarking against other methods and validation on a synthetic dataset (“dyntoy”) are indeed presented in the Supplementary Note, although this was not sufficiently highlighted in the main text, which has now been improved.

      Our manuscript contains benchmarking against a challenging synthetic dataset in Figure 1; furthermore, both the synthetic dataset and the real-world thymus dataset have been analyzed in parallel using currently available TI tools (as detailed in the Supplementary Note). z other single-cell datasets (single-cell RNA-seq) were added in response to the reviewers' comments.

      One of the reviewers correctly points out that tviblindi goes against the philosophy of automated trajectory inference. This is correct; we believe that a new class of methods, complementary to fully automated approaches, is needed to explore datasets with unknown biology. tviblindi is meant to be a representative of this class of methods—a semi-automated framework that builds on features inferred from the data in an unbiased and mathematically well-founded fashion (pseudotime, homology classes, suitable low-dimensional representation), which can be used in concert with expert knowledge to generate hypotheses about the underlying dynamics at an appropriate level of detail for the particular trajectory or biological process.

      We would also like to mention that the algorithm and the workflow are not the sole results of the paper. We have thoroughly characterized human thymocyte development, where, in addition to expected biological endpoints, we found and characterized an unexpected activated thymic T-reg endpoint.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors present tviblindi, a computational workflow for trajectory inference from molecular data at single-cell resolution. The method is based on (i) pseudo-time inference via expecting hitting time, (ii) sampling of random walks in a directed acyclic k-NN where edges are oriented away from a cell of origin w.r.t. the involved nodes' expected hitting times, and (iii) clustering of the random walks via persistent homology. An extended use case on mass cytometry data shows that tviblindi can be used elucidate the biology of T cell development.

      Strengths:

      - Overall, the paper is very well written and most (but not all, see below) steps of the tviblindi algorithm are explained well.

      - The T cell biology use case is convincing (at least to me: I'm not an immunologist, only a bioinformatician with a strong interest in immunology).

      We thank the reviewer for feedback and suggestions that we will accommodate, we respond point-by-point below

      Weaknesses:

      - The main weakness of the paper is that a systematic comparison of tviblindi against other tools for trajectory inference (there are many) is entirely missing. Even though I really like the algorithmic approach underlying tviblindi, I would therefore not recommend to our wet-lab collaborators that they should use tviblindi to analyze their data. The only validation in the manuscript is the T cell development use case. Although this use case is convincing, it does not suffice for showing that the algorithms's results are systematically trustworthy and more meaningful (at least in some dimension) than trajectories inferred with one of the many existing methods.

      We have compared tviblindi to several trajectory inference methods (Supplementary note section 8.2: Comparison to state-of-the-art methods, namely Monocle3 (v1.3.1) Cao et al. (2019), Stream (v1.1) Chen et al. (2019), Palantir (v1.0.0) Setty et al. (2019), VIA (v0.1.89) Stassen et al. (2021), StaVia (Via 2.0) Stassen et al. (2024), CellRank 2 (v2.06) Weiler et al. (2024)  and PAGA (scanpy==1.9.3) Wolf et al. (2019). We added thorough and systematic comparisons to the other algorithms mentioned by reviewers. We included extended evaluation on publicly available datasets (Supplementary Note section 10).

      Also, in the meantime we have successfully used tviblindi to investigate human B-cell development in primary immunodeficiency (Bakardjieva M, et al. Tviblindi algorithm identifies branching developmental trajectories of human B-cell development and describes abnormalities in RAG-1 and WAS patients. Eur J Immunol. 2024 Dec;54(12):e2451004. doi: 10.1002/eji.202451004.).

      - The authors' explanation of the random walk clustering via persistent homology in the Results (subsection "Real-time topological interactive clustering") is not detailed enough, essentially only concept dropping. What does "sparse regions" mean here and what does it mean that "persistent homology" is used? The authors should try to better describe this step such that the reader has a chance to get an intuition how the random walk clustering actually works. This is especially important because the selection of sparse regions is done interactively. Therefore, it's crucial that the users understand how this selection affects the results. For this, the authors must manage to provide a better intuition of the maths behind clustering of random walks via persistent homology.

      In order to satisfy both reader types: the biologist and the mathematician, we explain the mathematics in detail in the Supplementary Note, section 4. We improved the Results text to better point the reader to the mathematical foundations in the Supplementary Note.  

      - To motivate their work, the authors write in the introduction that "TI methods often use multiple steps of dimensionality reduction and/or clustering, inadvertently introducing bias. The choice of hyperparameters also fixes the a priori resolution in a way that is difficult to predict." They claim that tviblindi is better than the original methods because "analysis is performed in the original high-dimensional space, avoiding artifacts of dimensionality reduction." However, in the manuscript, tviblindi is tested only on mass cytometry data which has a much lower dimensionality than scRNA-seq data for which most existing trajectory inference methods are designed. Since tviblindi works on a k-NN graph representation of the input data, it is unclear if it could be run on scRNA-seq data without prior dimensionality reduction. For this, cell-cell distances would have to be computed in the original high-dimensional space, which is problematic due to the very high dimensionality of scRNA-seq data. Of course, the authors could explicitly reduce the scope of tviblindi to data of lower dimensionality, but this would have to be stated explicitly.

      In the manuscript we tested the framework on the scRNA-seq data from Park et al 2020 (DOI: 10.1126/science.aay3224). To illustrate that tviblindi can work directly in the high-dimensional space, we applied the framework successfully on imputed 2000 dimensional data. Furthermore we successfully used tviblindi to investigate bone marrow atlas scRNA-Seq dataset Zhang et al. (2024) and atlas of mouse gastrulation Pijuan-Sala et al. (2019). The idea behind tviblindi is to be able to work without the necessity to use non-linear dimensionality reduction techniques, which reduce the dimensionality to a very low number of dimensions and whose effects on the data distribution are difficult to predict. On the other hand the use of (linear) dimensionality reduction techniques which effectively suppress noise in the data such as PCA is a good practice (see also response to reviewer 2). We have emphasized this in the revised version and added the results of the corresponding analysis (see Supplementary note, section 9).

      - Also tviblindi has at least one hyper-parameter, the number k used to construct the k-NN graphs (there are probably more hidden in the algorithm's subroutines). I did not find a systematic evaluation of the effect of this hyper-parameter.

      Detailed discussion of the topic is presented in the Supplementary Note, section 8.1, where Spearman correlation coefficient between pseudotime estimated using k=10 and k=50 nearest neighbors was 0.997.   The number k however does affect the number of candidate endpoints. But even when larger k causes spurious connection between unrelated cell fates, the topological clustering of random walks allows for the separation of different trajectories. We have expanded the “sensitivity to hyperparameters” section 8.1 also in response to reviewer 2.

      Reviewer #2 (Public Review):

      Summary:

      In Deconstructing Complexity: A Computational Topology Approach to Trajectory Inference in the Human Thymus with tviblindi, Stuchly et al. propose a new trajectory inference algorithm called tviblindi and a visualization algorithm called vaevictis for single-cell data. The paper utilizes novel and exciting ideas from computational topology coupled with random walk simulations to align single cells onto a continuum. The authors validate the utility of their approach largely using simulated data and establish known protein expression dynamics along CD4/CD8 T cell development in thymus using mass cytometry data. The authors also apply their method to track Treg development in single-cell RNA-sequencing data of human thymus.

      The technical crux of the method is as follows: The authors provide an interactive tool to align single cells along a continuum axis. The method uses expected hitting time (given a user input start cell) to obtain a pseudotime alignment of cells. The pseudotime gives an orientation/direction for each cell, which is then used to simulate random walks. The random walks are then arranged/clustered based on the sparse region in the data they navigate using persistent homology.

      We thank the reviewer for feedback and suggestions that we have accommodated, we responded point-by-point below.

      Strengths:

      The notion of using persistent homology to group random walks to identify trajectories in the data is novel.

      The strength of the method lies in the implementation details that make computationally demanding ideas such as persistent homology more tractable for large scale single-cell data. This enables the authors to make the method more user friendly and interactive allowing real-time user query with the data.

      Weaknesses:

      The interactive nature of the tool is also a weakness, by allowing for user bias leading to possible overfitting for a specific data.

      tviblindi is not designed as a fully automated TI tool (although it implements a fully automated module), but as a data driven framework for exploratory analysis of unknown data. There is always a risk of possible bias in this type of analysis - starting with experimental design, choice of hyperparameters in the downstream analysis, and an expert interpretation of the results. The successful analysis of new biological data involves a great deal of expert knowledge which is difficult to a priori include in the computational models. 

      tvilblindi tries to solve this challenge by intentionally overfitting the data and keeping the level of resolution on a single random walk. In this way we aim to capture all putative local relationships in the data. The on-demand aggregation of the walks using the global topology of the data allows researchers to use their expert knowledge to choose the right level of detail (as demonstrated in the Figure 4 of the manuscript) while relying on the topological structure of the high dimensional point cloud. At all times tviblindi allows to inspect the composition of the trajectory to assess the variance in the development, possible hubs on the KNN-graph etc.

      The main weakness of the method is lack of benchmarking the method on real data and comparison to other methods. Trajectory inference is a very crowded field with many highly successful and widely used algorithms, the two most relevant ones (closest to this manuscript) are not only not benchmarked against, but also not sited. Including those that specifically use persistent homology to discover trajectories (Rizvi et.al. published Nat Biotech 2017). Including those that specifically implement the idea of simulating random walks to identify stable states in single-cell data (e.g. CellRank published in Lange et.al Nat Meth 2022), as well as many trajectory algorithms that take alternative approaches. The paper has much less benchmarking, demonstration on real data and comparison to the very many other previous trajectory algorithms published before it. Generally speaking, in a crowded field of previously published trajectory methods, I do not think this one approach will compete well against prior work (especially due to its inability to handle the noise typical in real world data (as was even demonstrated in the little bit of application to real world data provided).

      We provided comparisons of tviblindi and vaevictis in the Supplementary Note, section 8.2, where we compare it to Monocle3 (v1.3.1) Cao et al. (2019), Stream (v1.1) Chen et al. (2019), Palantir (v1.0.0) Setty et al. (2019), VIA (v0.1.89) Stassen et al. (2021),  StaVia (Via 2.0) Stassen et al. (2024), CellRank 2 (v2.06) Weiler et al. (2024)  and PAGA (scanpy==1.9.3) Wolf et al. (2019). We added thorough and systematic comparisons to the other algorithms mentioned by reviewers. We included extended evaluation on publicly available datasets (Supplementary Note section 10).

      Beyond general lack of benchmarking there are two issues that give me particular concern. As previously mentioned, the algorithm is highly susceptible to user bias and overfitting. The paper gives the example (Figure 4) of a trajectory which mistakenly shows that cells may pass from an apoptotic phase to a different developmental stage. To circumvent this mistake, the authors propose the interactive version of tviblindi that allows users to zoom in (increase resolution) and identify that there are in fact two trajectories in one. In this case, the authors show how the author can fix a mistake when the answer is known. However, the point of trajectory inference is to discover the unknown. With so much interactive options for the user to guide the result, the method is more user/bias driven than data-driven. So a rigorous and quantitative discussion of robustness of the method, as well as how to ensure data-driven inference and avoid over-fitting would be useful.

      Local directionality in expression data is a challenge which is not, to our knowledge, solved. And we are not sure it can be solved entirely, even theoretically. The random walks passing “through” the apoptotic phase are biologically infeasible, but it is an (unbiased) representation of what the data look like based on the diffusion model. It is a property of the data (or of the panel design), which has to be interpreted properly rather than a mistake. Of note, except for Monocle3 (which does not provide the directionality) other tested methods did not discover this trajectory at all.

      The “zoom in” has in fact nothing to do with “passing through the apoptosis”. We show how the researcher can investigate the suggested trajectory to see if there is an additional structure of interest and/or relevance. This investigation is still data driven (although not fully automated). Anecdotally in this particular case this branching was discovered by a bioinformatician, who knew nothing about the presence of beta-selection in the data.  

      We show that the trajectory of apoptosis of cortical thymocytes consists of 2 trajectories corresponding to 2 different checkpoints (beta-selection and positive/negative selection). This type of a structure, where 2 (or more) trajectories share the same path for most of the time, then diverge only to be connected at a later moment (immediately from the point of view of the beta-selection failure trajectory) is a challenge for TI algorithms and none of tested methods gave a correct result. More importantly there seems to be no clear way to focus on these kinds of structures (common origin and common fate) in TI methods.

      Of note, the “zoom in” is a recommended and convenient method to look for an inner structure, but it does not necessarily mean addition of further homological classes. Indeed, in this case the reason that the structure is not visible directly is the limitation of the dendrogram complexity (only branches containing at least 10% of simulated random walks are shown by default). In summary, tviblindi effectively handled all noise in the data that obscured biologically valid trajectories for other methods. We have improved the discussion of the robustness in the current version.  

      Second, the paper discusses the benefit of tviblindi operating in the original high dimensions of the data. This is perhaps adequate for mass cytometry data where there is less of an issue of dropouts and the proteins may be chosen to be large independent. But in the context of single-cell RNA-sequencing data, the massive undersampling of mRNA, as well as high degree of noise (e.g. ambient RNA), introduces very large degree of noise so that modeling data in the original high dimensions leads to methods being fit to the noise. Therefore ALL other methods for trajectory inference work in a lower dimension, for very good reason, otherwise one is learning noise rather than signal. It would be great to have a discussion on the feasibility of the method as is for such noisy data and provide users with guidance. We note that the example scRNA-seq data included in the paper is denoised using imputation, which will likely result in the trajectory inference being oversmoothed as well.

      We agree with the reviewer. In our manuscript we wanted to showcase that tviblindi can directly operate in high-dimensional space (thousands of dimensions) and we used MAGIC imputation for this purpose. This was not ideal. More standard approach, which uses 30-50 PCs as input to the algorithm resulted in equivalent trajectories. We have added this analysis to the study (Supplementary note, section 9).

      In summary, the fact that tviblindi scales well with dimensionality of the data and is able to work in the original space does not mean that it is always the best option. We have added a corresponding comment into the Supplementary note.  

      Reviewer #3 (Public Review):

      Summary:

      Stuchly et al. proposed a single-cell trajectory inference tool, tviblindi, which was built on a sequential implementation of the k-nearest neighbor graph, random walk, persistent homology and clustering, and interactive visualization. The paper was organized around the detailed illustration of the usage and interpretation of results through the human thymus system.

      Strengths:

      Overall, I found the paper and method to be practical and needed in the field. Especially the in-depth, step-by-step demonstration of the application of tviblindi in numerous T cell development trajectories and how to interpret and validate the findings can be a template for many basic science and disease-related studies. The videos are also very helpful in showcasing how the tool works.

      Weaknesses:

      I only have a few minor suggestions that hopefully can make the paper easier to follow and the advantage of the method to be more convincing.

      (1) The "Computational method for the TI and interrogation - tviblindi" subsection under the Results is a little hard to follow without having a thorough understanding of the tviblindi algorithm procedures. I would suggest that the authors discuss the uniqueness and advantages of the tool after the detailed introduction of the method (moving it after the "Connectome - a fully automated pipeline".

      We thank the reviewer for the suggestion and we have accommodated it to improve readability of the text.

      Also, considering it is a computational tool paper, inevitably, readers are curious about how it functions compared to other popular trajectory inference approaches. I did not find any formal discussion until almost the end of the supplementary note (even that is not cited anywhere in the main text). Authors may consider improving the summary of the advantages of tviblindi by incorporating concrete quantitative comparisons with other trajectory tools.

      We provided comparisons of tviblindi and vaevictis in the Supplementary Note, section 8.2, where we compare it to Monocle3 (v1.3.1) Cao et al. (2019), Stream (v1.1) Chen et al. (2019), Palantir (v1.0.0) Setty et al. (2019), VIA (v0.1.89) Stassen et al. (2021),  StaVia (Via 2.0) Stassen et al. (2024), CellRank 2 (v2.06) Weiler et al. (2024)  and PAGA (scanpy==1.9.3) Wolf et al. (2019). We added thorough and systematic comparisons to the other algorithms mentioned by reviewers. We included extended evaluation on publicly available datasets (Supplementary Note section 10).

      (2) Regarding the discussion in Figure 4 the trajectory goes through the apoptotic stage and reconnects back to the canonical trajectory with counterintuitive directionality, it can be a checkpoint as authors interpret using their expert knowledge, or maybe a false discovery of the tool. Maybe authors can consider running other algorithms on those cells and see which tracks they identify and if the directionality matches with the tviblindi.

      We have indeed used the thymus dataset for comparison of all TI algorithms listed above. Except for Monocle 3 they failed to discover the negative selection branch (Monocle 3 does not offer directionality information). Therefore, a valid topological trajectory with incorrect (expert-corrected) directionality was partly or entirely missed by other algorithms. 

      (3) The paper mainly focused on mass cytometry data and had a brief discussion on scRNA-seq. Can the tool be applied to multimodality data such as CITE-seq data that have both protein markers and gene expression? Any suggestions if users want to adapt to scATAC-seq or other epigenomic data?

      The analysis of multimodal data is the logical next step and is the topic of our current research. At this moment tviblindi cannot be applied directly to multimodal data. It is possible to use the KNN-graph based on multimodal data (such as weighted nearest neighbor graph implemented in Seurat) for pseudotime calculation and random walk simulation. However, we do not have a fully developed triangulation for the multimodal case yet. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data or analyses:

      -  Benchmark against existing trajectory inference methods.

      -  Benchmark on scRNA-seq data or an explicit statement that, unlike existing methods, tviblindi is not designed for such data.

      We provided comparisons of tviblindi and vaevictis in the Supplementary Note, section 8.2, where we compare it to Monocle3 (v1.3.1) Cao et al. (2019), Stream (v1.1) Chen et al. (2019), Palantir (v1.0.0) Setty et al. (2019), VIA (v0.1.89) Stassen et al. (2021),  StaVia (Via 2.0) Stassen et al. (2024), CellRank 2 (v2.06) Weiler et al. (2024)  and PAGA (scanpy==1.9.3) Wolf et al. (2019). We added thorough and systematic comparisons to the other algorithms mentioned by reviewers. We included extended evaluation on publicly available datasets (Supplementary Note section 10).

      -  Systematic evaluation of the effetcs of hyper-parameters on the performance of tviblindi (as mentioned above, there is at least one hyper-parameter, the number k to construct the k-NN graphs).

      This is described in Supplementary Note section 8.1

      Recommendations for improving the writing and presentation:

      -  The GitHub link to the algorithm which is currently hidden in the Methods should be moved to the abstract and/or a dedicated section on code availability.

      -  The presentation of the persistent homology approach used for random walk clustering should be improved (see public comment above).

      This is described extensively in Supplementary Note  

      -  A very minor point (can be ignored by the authors): consider renaming the algorithm. At least for me, it's extremely difficult to remember.

      We choose to keep the original name

      Minor corrections to the text and figures:

      -  Labels and legend texts are too small in almost all figures.

      Reviewer #2 (Recommendations For The Authors):  

      (1) On page 3: "(2) Analysis is performed in the original high-dimensional space avoiding artifacts of dimensionality reduction." In mass cytometry data where there is no issue of dropouts, one may choose proteins such that they are not correlated with each other making dimensionality reduction techniques less relevant. But in the context of an unbiased assays such as single-cell RNA-sequencing (scRNA-seq), one measures all the genes in a cell so dimensionality reduction can help resolve the redundancy in the feature space due to correlated/co-regulated gene expression patterns. This assumption forms the basis of most methods in scRNA-seq. More importantly, in scRNA-seq data the dropouts and ambient molecules in mRNA counts result in so much noise that modeling cells in the full gene expression is highly problematic. So the authors are requested to discuss in detail how they would propose to deal with noise in scRNA-seq data.

      On this note, the authors mention in Supplementary Note 9 (Analysis of human thymus single-cell RNA-seq data): "Imputed data are used as the input for the trajectory inference, scaled counts (no imputation) are shown in line plots". The line plots indicate the gene expression trends along the obtained pseudotime. The authors use MAGIC to impute the data, and we request the authors to mention this in the Methods section (currently one must look through the code on Supplementary Note 1.3 to find this). Data imputation in single-cell RNA-seq data are intended to enable quantification of individual gene expression distribution or pairwise gene associations. But when all the genes in an imputed data are used for visualization, clustering or trajectory inference, the averaging effect will compound and result in severely smoothed data that misses important differences between cell states. Especially, in the case of MAGIC, which uses a transition matrix raised to a power, it is over-smoothing of the data to use a transition matrix smoothed data to obtain another transition matrix to calculate the hitting time (or simulate random walks). Second, the authors' proposal to use scaled counts to study gene trends cannot be generalized to other settings due to drop out issue. Given the few genes (and only one branch) that are highlighted in Figure 7D-G and Figure 31 in Supplementary Note, it is hard to say if scaling raw values would pick up meaningful biology robustly here for other branches.

      We recommend that this data be reanalyzed with non-imputed data used for trajectory inference and imputed gene expression used for line plots.

      As stated above in the public review, we reanalyzed the scRNA Seq data using a more standard approach (first 50 principal components). We have also analyzed two additional scRNA Seq datasets (Section 1 and section 10 of Supplementary Note)

      On the same note, the authors use Seurat's CellCycleScoring to obtain the cell cycle phase of each cell and later use ScaleData to regress them out. While we agree that it is valuable to remove cell cycle effect from the data for trajectory inference (and has been used previously in other methods), the regression approach employed in Seurat's ScaleData is not appropriate. It is an aggressive approach that severely changes expression pattern of many genes and can result in new artifacts (false positives) in the data. We recommend the authors to explore this more and consider using a more principled alternatives such as fscLVM (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1334-8). 

      Cell cycle correction is an open problem (Heumos, Nat Rev Genetics, 2023)

      Here we use an (arguably aggressive) approach to make the presentation more straightforward. The cells we are interested here (end #6) are not dividing and the regression does not change the conclusion drawn in the paper

      (2) The figures provided are extremely low in resolution that it is practically impossible to correctly interpret a lot of the conclusion and references made in the figure (especially Figure 3 in the main text).

      Resolution of the Figures was improved

      (3) There are many aspects of the method that enable easy user biases and can lead to substantial overfitting of the data.

      a. On page 7: "The topology of the point cloud representing human T-cell development is more complex ... and does not offer a clear cutoff for the choice of significant sparse regions. Interactive selection allows the user to vary the resolution and to investigate specific sparse regions in the data iteratively." This implies that the method enables user biases to be introduced into the data analysis. While perhaps useful for exploration, quantitative trajectory assessment using such approach can be faulty when the user (A) may not know the underlying dynamics (B) forces preconceived notion of trajectory.

      The authors should consider making the trajectory inference approach less dependent on interactive user input and show that the trajectory results are robust to any choices the user may make. It may also help if the authors provide an effective guide and mention clearly what issues could result due to the use of such thresholds.

      As explained in the response in public reviews, tviblindi is not designed as a fully automated TI tool, but as a data driven framework for exploratory analysis of unknown data. 

      There is always a risk of possible bias in this type of analysis - starting with experimental design, choice of hyperparameters in the downstream analysis, and an expert interpretation of the results. The successful analysis of new biological data involves a great deal of expert knowledge which is difficult to a priori include in the computational models.  To specifically address the points raised by the reviewer:

      “(A) may not know the underlying dynamics” - tviblindi is designed to perform exploratory analysis of the unknown underlying dynamics. We showcase in the study how this can be performed and we highlight possible cases which can be resolved expertly (spurious connections (doublets), different scales of resolution (beta selection)). Crucially, compared to other TI methods, tviblindi offers a clear mechanism on how to discover, focus and resolve these issues which would (and do) contaminate the trajectories discovered fully automatically by tested methods (cf. the beta selection, or the development of plasmacytoid dendritic cells (PDCs) (Supplementary note, section 10.1).

      “(B) forces preconceived notion of trajectory” - user interaction in tviblindi does not force a preconceived notion of the trajectory. The random walks are simulated before the interactive step in an unbiased manner. During the interactive step the user adjusts trajectory specific resolution - incorrect choice of the resolution may result in either merging distinct trajectories into one or over separating the trajectories (which is arguably much less serious). However the interactive step is designed to deal with exactly this kind of challenge. We showcase (e.g. beta selection, or PDCs development) how to address the issue - tviblindi allows us to investigate deeper structure in any considered trajectory.

      Thus, tviblindi represents a new class of methods that is complementary to fully automated trajectory inference tools. It offers a semi-automated tool that leverages features derived from data in an unbiased and mathematically rigorous manner, including pseudotime, homology classes, and appropriate low-dimensional representations. These can be integrated with expert knowledge to formulate hypotheses regarding the underlying dynamics, tailored to the specific trajectory or biological process under investigation.

      b. In Figure 4, the authors discuss the trajectory of cells emanating from CD3 negative double positive stage and entering apoptotic phase and mention tviblindi may give "the false impression that cells may pass through an apoptotic phase into a later developmental stage" and propose that the interactive version of tviblindi can help user zoom into (increase resolution) this phenomenon and identify that there are in fact two trajectories in one. Given this, how do the other trajectories in the data change if a user manually adjusts the resolution? A quantification of the robustness is important. Also, it appears that a more careful data clean up could avoid such pitfalls where the algorithm infers trajectory based on mixed phenotype and the user would not have to manually adjust the resolution to obtain clear biological conclusion. We not that the original publication of this data did such "data clean up" using simple diffusion map based dimensionality reduction which the authors boast they avoid. There is a reason for this dimensionality reduction (distinguishing signal from noise), even in CyTOF data, let alone its importance in single cell data.

      The reviewer is concerned about two different, but intertwined issues we wish to untangle here. First, data clean-up is typically done on the premise that dead cells are irrelevant and they are a source of false signals. In the case of the thymocytes in the human thymus this premise is not true. Apoptotic cells are a legitimate (actually dominant) fate of the development and thus need to be represented in the TI dataset. Their biological behavior is however complex as they stop expressing proteins and thus lose their surface markers gradually, as dictated by the particular protein degradation kinetics. So can we clean up dead and dying cells better? Yes, but we don't want to do it since we would lose cells we want to analyze. Second, do trajectories change when we zoom into the data? No, only the level of detail presented visually changes. Since we calculate 5000 trajectories in the dataset, we need to aggregate them already for the hierarchical clustering visualization. Note that Figure 4, panel A highlights 159 trajectories selected in V. group. Zooming in means that the hierarchy of trajectories within V. group is revealed (panel D, groups V.a and Vb.) and can be interpreted on the vaevictis and lineplot graphs (panel E, F). 

      c. In the discussion, the authors write "[tviblindi] allows the selection and grouping of similar random walks into trajectories based on visual interaction with the data". This counters the idea of automated trajectory inference and can lead to severe overfitting.

      As explained in reply to Q3, our aim was NOT to create a fully automated trajectory inference tool. Even more, in our experience we realized that all current tools are taking this fully  automated approach with a search for an “ideal” set of hyperparameters. This, in our experience,  leads to a “blackbox” tool that is difficult to interpret for the expert in the biological field. To respond to this need we designed a modular approach where the results of the TI are presented and the expert can interact with them to focus the visualization and to derive interpretation. Our interactive concept is based on 15 years of experience with the data analysis in flow cytometry, where neither manual gating nor full automation is the ultimate solution but smart integration of both approaches eventually wins the game.

      Thus, tviblindi represents a new class of methods that is complementary to fully automated trajectory inference tools.  It offers a semi-automated tool that leverages features derived from data in an unbiased and mathematically rigorous manner. These features include pseudotime, homology classes, and appropriate low-dimensional representations. These features can be integrated with expert knowledge to formulate hypotheses regarding the underlying dynamics, tailored to the specific trajectory or biological process under investigation.

      d. The authors provide some comment on the robustness to the relaxation parameter for witness complex construction in Supplementary Note Section 8.1.2 but it is limited given the importance of this parameter and a more thorough investigation is recommended. We request the authors to provide concrete examples with figures of how changing alpha2 parameter leads to simplicial complexes of different sizes and an assessment of contexts in which the parameter is robust and when not (in both simulated and publicly available real data). Of note, giving the users a proper guide for parameter choice based on these examples and offering them ways to quantify robustness of their results may also be valuable.

      Section 8 in Supplementary Note was extended as requested.

      e. The authors are requested for an assessment of possible short-circuits (e.g. cells of two distantly related phenotypes that get connected erroneously in the trajectory) in the data, and how their approach based on persistent homology deals with it.

      If a short circuit results in a (spurious) alternative trajectory, the persistent homology approach allows us to distinguish it from genuine trajectories that do not follow the short circuit. This prevents contamination of the inferred evolution by erroneous connections. The ability to distinguish and separate distinct trajectories with the same fate is a major strength of this approach (e.g., the trajectory through doublets or the trajectories around checkpoints in thymocytes’ evolution).

      (4) The authors propose vaevictis as a new visualization tool and show its performance compared to the standard UMAP algorithm on a simulated data set (Figure 1 in Supplementary Notes). We recommend a more comprehensive comparison between the two algorithms on a wide array of publicly available single-cell datasets. As well as comparison to other popular dimensionality reduction approaches like force directed layouts, which are the most widely used tool specifically to visualize trajectories.

      We added Section 10 to Supplementary Note that presents multiple comparisons of this kind. It is important to note that tviblindi works independently of visualization and any preferred visualization can be used in the interactive phase (multiple visualisation methods are implemented).

      (5) In Supplementary Note 8.2, the authors compare tviblindi against the other methods. We recommend the authors to quantify the comparison or expand on their assesments in real biological data. For example, in comparison against Palantir and VIA the authors mention "... discovers candidate endpoints in the biological dataset but lacks toolbox to interrogate subtle features such as complex branching" and "fails to discover subtle features (such as Beta selection)" respectively. We recommend the authors to make these comparisons more precise or provide quantification. While the added benefit of interactive sessions of tviblindi may make it more user friendly, the way tviblindi appears to enable analysis of subtle features (e.g. Figure 1H) should be possible in Palantir or VIA as well.

      We extended the comparisons and presented them in Section 8 and 10 in Supplementary Note.  

      (6) The notion of using random walk simulations to identify terminal (and initial states) has been previously used in single-cell data (CellRank algorithm: https://www.nature.com/articles/s41592-021-01346-6). We request the authors to compare their approach to CellRank.

      We compared our algorithm to the CellRank successor CellRank 2 (see section 8.2, Supplementary Note)

      (7) The notion of using persistent homology to discover trajectories has been previously used in single cell data https://pubmed.ncbi.nlm.nih.gov/28459448/. we request a comparison to this approach

      The proposed algorithm was not able to accommodate the large datasets we used.

      scTDA (Rizvi, Camara et al. Nat. Biotechnol. 2017) has not been updated for 6 years. It is not suited for complex atlas-sized datasets both in terms of performance and utility, with its limited visualization tools. It also lacks capabilities to analyze individual trajectories.

      (8) In Figure 3B, the authors visualize the endpoints and simulated random walks using the connectome. There is no edge from start to the apoptotic cells here. It is not clear why? If they are not relevant based on random walks, can the user remove them from analysis? Same for the small group of pink cells below initial point.

      The connectome is a fully automated approach (similar to PAGA) which gives a basic overview of the data. It is not expected to be able to compete with the interactive pipeline of tviblindi for the same reasons as the fully automated methods (difficult to predict the effect of hyperparameters).

      (9) In Supplementary Figure 3, in relation to "Variants of trajectories including selection processes" the author mention that there is a spurious connection between CD4 single positive, and the doublet set of cells. The authors mention that the presence of dividing cells makes it difficult to remove the doublets. We request the authors to discuss why. For example, the authors seem to have cell cycle markers (e.g. Ki67, pH3, Cyclin) and one would think that coupled with DNA intercalator 191/193lr one could further clean-up the data. Can the authors employ alternative toolkits such as doublet detection methods?

      To address this issue, we do remove doublets with illegitimate cell barcodes (e.g. we remove any two cells from two samples with different barcode which present with double barcode). Although there are computational doublet removal approaches for mass cytometry (Bagwell, Cytometry A 2020), mostly applied to peripheral blood samples (where cell division is not present under steady state immune system conditions), these are however not well suited for situations where dividing samples occur (Rybakowska P, Comput Struct Biotechnol J. 2021), which is the case of our thymocyte samples. Furthermore, there are other situations where doublet formation is not an accident, but rather a biological response (Burel JG, Cytometry A (2020). Thus, the doublet cell problem is similar to the apoptotic cell problem discussed earlier.

      We could remove cells with the double DNA signal, but this would remove not only accidental doublets but also the legitimate (dividing) cells. So the question is how to remove the illegitimate doublets but not the legitimate?

      Of note, the trajectory going through doublets does not affect the interpretation of other trajectories as it is readily discriminated by persistent homology and thus random walks passing through this (spurious) trajectory do not contaminate the markers’ evolution inferred for legitimate trajectories.

      We therefore prefer to remove only the barcode illegitimate and keep all others in analysis, using the expert analysis step also to identify (using the cell cycle markers plus other features) the artificially formed doublets and thus spurious connections.

      (10) The authors should discuss how the gene expression trend plots are made (e.g. how are the expression averaged? Rolling mean?).

      The development of those markers is shown as a line plot connecting the average values of a specific marker within a pseudotime segment. By default, the pseudotime values are divided into uniform segments (each containing the same number of points) whose number can be changed in the GUI. To focus on either early or late stages of the development, the segment division can be adjusted in GUI. See section 6 of the Supplementary Note.

      Reviewer #3 (Recommendations For The Authors):

      The overall figures quality needs to be improved. For example, I can barely see the text in Figure 3c.

      Resolution of the Figures was improved

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work done by Huang et.al. revealed the complex regulatory functions and transcription network of 172 unknown transcription factors of Pseudomonas aeruginosa PAO1. The authors utilized ChIP-seq to profile TFs binding site information across the genome, demonstrating diverse regulatory relationships among them via hierarchical networks with three levels. They further constructed thirteen ternary regulatory motifs in small subs and co-association atlas with 7 core associated clusters. The study also uncovered 24 virulence-related master regulators. The pan-genome analysis uncovered both the conservation and evolution of TFs with P. aeruginosa complex and related species. Furthermore, they established a web-based database combining both existing and novel data from HT-SELEX and ChIP-seq to provide TF binding site information. This study offered valuable insights into studying transcription regulatory networks in P. aeruginosa and other microbes.

      Strengths:

      The results are presented with clarity, supported by well-organized figures and tables that not only illustrate the study's findings but also enhance the understanding of complex data patterns.

      Thank you for your valuable feedback on our paper exploring the transcription regulatory networks in P. aeruginosa.

      Weaknesses:

      The results of this manuscript are mainly presented in systematic figures and tables. Some of the results need to be discussed as an illustration how readers can utilize these datasets.

      We appreciate the valuable suggestion about enhancing the practical aspects of our manuscript. We have expanded the discussion section to include more detailed explanations of how these datasets can be utilized in practical applications. 

      Reviewer #2 (Public review):

      In this work, the authors comprehensively describe the transcriptional regulatory network of Pseudomonas aeruginosa through the analysis of transcription factor binding characteristics. They reveal the hierarchical structure of the network through ChIP-seq, categorizing transcription factors into top-, middle-, and bottom-level, and reveal a diverse set of relationships among the transcription factors. Additionally, the authors conduct a pangenome analysis across the Pseudomonas aeruginosa species complex as well as other species to study the evolution of transcription factors. Moreover, the authors present a database with new and existing data to enable the storage and search of transcription factor binding sites. The findings of this study broaden our knowledge on the transcriptome of P. aeruginosa. This study sheds light on the complex interconnections between various cellular functions that contribute to the pathogenicity of P. aeruginosa, along with the associated regulatory mechanisms. Certain findings, such as the regulatory tendencies of DNA-binding domain-types, provides valuable insights on the possible functions of uncharacterized transcription factors and new functions of those that have already been characterized. The techniques used hold great potential for discovery of transcription factor functions in understudied organisms as well.

      The study would benefit from a more clear discussion on the implications of various findings, such as binding preferences, regulatory preferences, and the link between regulatory crosstalk and virulence. Additionally, the pangenome analysis would be furthered through a discussion of the divergence of the transcription factors of P. aeruginosa PAO1 across species in relation to the findings on the hierarchical structure of the transcriptional regulatory network.

      Thank you for your positive feedback and suggestions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major:

      (1) It appears that many TFs are conserved among bacteria, archaebacteria, fungi, plants, and animals. Does this mean these TFs in bacterial could be the ancestors of TFs in fungi, plants, and animals? If we fetch these TFs out and build an evolutionary tree, can we visual the three kingdoms as well?

      Thank you for this comment. While many TFs are conserved across bacteria, archaea, fungi, plants, and animals, this conservation does not necessarily imply a direct ancestral relationship. Instead, it may reflect the fundamental importance of certain domains and regulatory mechanisms, which could have arisen from a common ancestral system or through convergent evolution. If we fetch TF PA2032 out to build an evolutionary tree by setting PAO1 as the root, we can visualize these kingdoms in a tree. We added this content in the revised manuscript. Please see Figure S7D and Lines 404-411.

      “The phylogenetic tree of PA2032 across bacteria, archaea, fungi, plants, and animals, with PAO1 as the root revealed that the bacterial TFs (purple) indicates a high degree of conservation within prokaryotes, suggesting a fundamental role in core regulatory processes. In contrast, eukaryotic TFs (fungi, plants, and animals) form distinct clades with longer branch lengths, indicating significant divergence and specialization during eukaryotic evolution. These findings suggest that while TF is conserved across domains of life, its functional roles and regulatory mechanisms have undergone substantial diversification in eukaryotes.”

      (2) Can the authors give an indication how could we employ the findings of this study in designing next generation of antimicrobial agents?

      Thank you for this important suggestion. We have provided this content in the discussion part. Please see Lines 481-492.

      “The extensive datasets generated in this study offer valuable insights into understanding and targeting P. aeruginosa pathogenicity. The genome-wide binding profiles can be systematically analyzed through our hierarchical regulatory network framework to decode complex virulence mechanisms. The virulence-related master regulators and core regulatory clusters identified in this study highlighted key nodes of transcriptional control. Understanding these regulatory relationships is particularly valuable for identifying targets whose modulation would significantly impact virulence while accounting for potential compensatory mechanisms. This knowledge base thus provides a foundation for developing targeted approaches to combat P. aeruginosa infections, moving beyond traditional antibiotic strategies toward more sophisticated interventions based on regulatory network manipulation.”

      Minor:

      (1) Lines 178-180: It would strengthen the discussion to include a few additional references that support the claims made in this section, providing a more comprehensive context for the readers.

      Yes. We have added more citations(1-5) (No. 1-5 in the references at the end of the rebuttal) to support the claims. Please see Line 182.

      (2) Line 198: You mention 'seven' motifs containing toggle switches, but Fig.3 actually displays eight motifs. Please revise this discrepancy to ensure consistency between the text and the figure.

      Yes. We have revised the wording to “eight”. Please see Line 200.

      (3) Figure 3A: Consider adding a diagram or legend that represents the colors associated with each DNA-binding domain (DBD) family.

      Thank you for your suggestion. The colors of DBD were aligned with the legend in Figure S3. We have added it in Figure 3A.

      Reviewer #2 (Recommendations for the authors):

      Line 21: The use of the abbreviation 'TF' should be done at the first instance of 'transcription factor'.

      Yes. We have revised it. Please see Line 21.

      Line 74: The purpose of this paragraph is slightly unclear. It is recommended that appropriate modifications are made.

      We are sorry for the confusion. The purpose of this paragraph was to introduce the major virulence pathways in P. aeruginosa and mention the important role of TRN in these pathways. We have modified it to make it clearer. Please see Lines 74-75.

      “P. aeruginosa employs diverse virulence pathways to establish successful infection, with QS being one of the major mechanisms involving the expression of many virulence genes.”

      Line 113: How were these 172 TFs selected?

      Thank you for indicating this question. In a previous study, we performed HT-SELEX to characterize the DNA-binding motifs of all TFs in P. aeruginosa PAO1, successfully identifying binding sequences for 182 TFs. To further elucidate the binding landscapes of the rest, we performed ChIP-seq on the remaining TFs (172 TFs in total with high-quality ChIP-seq libraries). Please see Lines 100-101 in the revised manuscript.

      Line 119: Defining other features, namely downstream and include Feature, would be helpful.

      Thank you for your suggestion. We have added the definition for all peak annotation in the legend. Please see Lines 569-574.

      “Annotation heatmap of all peak distribution with 6 locations: Upstream, where the peak is located entirely upstream of the gene; Downstream, where the peak is positioned completely downstream of the gene; Inside, where the peak is entirely contained within the gene body; OverlapStart, where the peak overlaps with the 5' end of the gene; OverlapEnd, where the peak overlaps with the 3' end of the gene; and IncludeFeature, where the peak completely encompasses the gene.”

      Line 129: The distribution type of AraC-type TFs is unclear - it is mentioned that AraC has a 'broad distribution', but it is later stated that it has a 'narrow distribution'.

      We are sorry for this mistake, and we have revised the example for “broad distribution”, which is Cor_CI instead of AraC. Please see Lines 132-135.

      Line 161: 'h value' here may need to be modified to 'absolute h value'.

      Yes. We have revised it. Please see Line 164.

      Line 502: "s The DNA" needs to be corrected.

      Yes. We have revised it. Please see Line 514.

      Line 515: It would be helpful to readers if the reference used for these pathways was cited.

      Yes. We have added the review reference (Shao et al, 2023) related to these pathways(6) (the 6th reference at the end of the rebuttal). Please see Line 527.

      Line 558: "Translation start site" needs to be corrected to "Transcription start site"

      The “TSS” here exactly indicated “Translation start site”.

      Line 593. "Virulent" pathways needs to be corrected to "virulence" pathways.

      Yes. We have revised it. Please see Line 609.

      Line 604: The type of categorization based on which the proportion of genes is displayed needs to be mentioned.

      Yes, we agree. We have added the type of categorization in the legend. Please see Lines 621-627.

      “Figure 6. Conservation and variability of TFs in PAO1. (A). The pie chart shows the proportions of genes categorized by their presence across P. aeruginosa strains for all genes. (B). The pie chart shows the distribution of TFs identified from PAO1 across different conservation categories. (C). The bar plot of the proportion for non-core TFs. Genes are categorized based on their presence frequency across P. aeruginosa strains: Core genes (present in 99% ~ 100% strains), Soft core genes (present in 95% ~ 99% strains), Shell genes (present in 15% ~ 95% strains), and Cloud genes (present in 0% ~ 15% strains).”

      Reference:

      (1) Liang H, Deng X, Li X, Ye Y, Wu M. 2014. Molecular mechanisms of master regulator VqsM mediating quorum-sensing and antibiotic resistance in Pseudomonas aeruginosa. Nucleic acids research 42:10307-10320.

      (2) Jones CJ, Ryder CR, Mann EE, Wozniak DJ. 2013. AmrZ modulates Pseudomonas aeruginosa biofilm architecture by directly repressing transcription of the psl operon. Journal of bacteriology 195:1637-1644.

      (3) Hickman JW, Harwood CS. 2008. Identification of FleQ from Pseudomonas aeruginosa as ac‐di‐GMP‐responsive transcription factor. Molecular microbiology 69:376-389.

      (4) Déziel E, Gopalan S, Tampakaki AP, Lépine F, Padfield KE, Saucier M, Xiao G, Rahme LG. 2005. The contribution of MvfR to Pseudomonas aeruginosa pathogenesis and quorum sensing circuitry regulation: multiple quorum sensing‐regulated genes are modulated without affecting lasRI, rhlRI or the production of N‐acyl‐L‐homoserine lactones. Molecular microbiology 55:998-1014.

      (5) Lizewski SE, Lundberg DS, Schurr MJ. 2002. The transcriptional regulator AlgR is essential for Pseudomonas aeruginosa pathogenesis. Infection and immunity 70:6083-6093.

      (6) Shao X, Yao C, Ding Y, Hu H, Qian G, He M, Deng X. 2023. The transcriptional regulators of virulence for Pseudomonas aeruginosa: Therapeutic opportunity and preventive potential of its clinical infections. Genes & Diseases 10:2049-2063.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      Previous studies in mammals and other vertebrates have shown that a noninvasive measure of cochlear tuning, based on the latency derived from stimulus-frequency otoacoustic emissions, provides a reasonable, and non-invasive, estimate of cochlear tuning. This valuable study confirms that finding in a new species, the budgerigar, and provides convincing support for the utility of otoacoustic estimates of cochlear tuning, a methodology previously explored primarily in mammals. The study's remaining claims of a mismatch between behavioral frequency selectivity and cochlear tuning are based on old behavioral data, and collected in an extreme frequency region at the edge of the limits of hearing. Hearing abilities are hard to measure accurately on the upper frequency edge of the hearing range, and the evidence for these claims is weak.

      We appreciate the detailed summary of our paper by the editors highlighting its strengths. As described in the following responses, we added additional evidence to the Introduction supporting that budgerigars have (1) unusual behavioral frequency tuning compared to other bird species and (2) unusual behavioral tuning results in budgerigars are not readily explainable by the audiogram. This additional background information, including Fig. 1B, substantially strengthens the claim of mismatched behavioral and neural/otoacoustic frequency tuning in budgerigars. Moreover, that the behavioral data are “old” seems not particularly relevant considering that the same behavioral methods are still widely used in animal research, as elaborated upon in the responses below. We suggest the term “previously published” to clarify the behavioral data used in our analyses.

      Reviewer #1 (Public review):

      Summary:

      In their manuscript, the authors provide compelling evidence that stimulus-frequency otoacoustic emission (SFOAE) phase-gradient delays predict the sharpness (quality factors) of auditory-nerve-fiber (ANF) frequency tuning curves in budgerigars. In contrast with mammals, neither SFOAE- nor ANF-based measures of cochlear tuning match the frequency dependence of behavioral tuning in this species of parakeet. Although the reason for the discrepant behavioral results (taken from previous studies) remains unexplained, the present data provide significant and important support for the utility of otoacoustic estimates of cochlear tuning, a methodology previously explored only in mammals.

      Strengths:

      * The OAE and ANF data appear solid and believable. (The behavioral data are taken from previous studies.)

      * No other study in birds (and only a single previous study in mammals) has combined behavioral, auditory-nerve, and otoacoustic estimates of cochlear tuning in a single species.

      * SFOAE-based estimates of cochlear tuning now avoid possible circularity and were are obtained by assuming that the tuning ratio estimated in chicken applies also to the budgerigar.

      Weaknesses:

      * In mammals, accurate prediction of neural Q_ERB from otoacoustic N_SFOAE involves the application of species-invariance of the tuning ratio combined with an attempt to compensate for possible species differences in the location of the so-called apical-basal transition (for a review, see Shera & Charaziak, Cochlear frequency tuning and otoacoustic emissions. Cold Spring Harb Perspect Med 2019; 9:pii a033498. doi: 10.1101/cshperspect.a033498; in particular, the text near Eq. 2 and the value of CFa|b).

      Despite this history, the manuscript makes no mention of the apical-basal transition, its possible role in birds, or why it was ignored in the present analysis. As but one result, the comparative discussion of the tuning ratio (paragraph beginning on lines 383) is incomplete and potentially misleading. Although the paragraph highlights differences in the tuning ratio across groups, perhaps these differences simply reflect differences in the value of CFa|b. For example, if the cochlea of the budgerigar is assumed to be entirely "apical" in character (so that CFa|b is around 7-8 kHz), then the budgerigar tuning ratios appear to align remarkably well with those previously obtained in mammals (see Shera et al 2010, Fig 9).

      We added sections on the apical-basal transition to the Results and Discussion, including how this concept might apply in budgerigars and other birds.

      * For the most part, the authors take previous behavioral results in budgerigar at face value, attributing the discrepant behavioral results to hypothesized "central specializations for the processing of masked signals". But before going down this easy road, the manuscript would be stronger if the authors discussed potential issues that might affect the reliability of the previous behavioral literature. For example, the ANF data show that thresholds rise rapidly above about 5 kHz. Might the apparent broadening of the behavioral filters arise as a consequence of off-frequency listening due to the need to increase signal levels at these frequencies? Or perhaps there are other issues. Inquiring readers would appreciate an informed discussion.

      This is a good point, also raised by reviewer 2, that declining audibility above 4 kHz could impact behavioral tuning estimates. On the other hand, other bird species with highly similar audiograms to budgerigars show conventional behavioral tuning that increases in sharpness relatively slowly and monotonically for higher frequences. Thus, the unusual pattern of behavioral tuning in budgerigars is not fully explainable by the audiogram. We added a section to the Introduction highlighting these points.

      Reviewer #2 (Public review):

      Summary:

      This manuscript describes two new sets of data involving budgerigar hearing: 1) auditory-nerve tuning curves (ANTCs), which are considered the 'gold standard' measure of cochlear tuning, and 2) stimulus-frequency otoacoustic emissions (SFOAEs), which are a more indirect measure (requiring some assumptions and transformations to infer cochlear tuning) but which are non-invasive, making them easier to obtain and suitable for use in all species, including humans. By using a tuning ratio (relating ANTC bandwidths and SFOAE delay) derived from another bird species (chicken), the authors show that the tuning estimates from the two methods are in reasonable agreement with each other over the range of hearing tested (280 Hz to 5.65 kHz for the ANTCs), and both show a slow monotonic increase in cochlear tuning quality over that range, as expected. These new results are then compared with (much) older existing behavioral estimates of frequency selectivity in the same species.

      Strengths:

      This topic is of interest, because there are some indications from the older behavioral literature that budgerigars have a region of best tuning, which the current authors refer to as an 'acoustic fovea', at around 4 kHz, but that beyond 5 kHz the tuning degrades. Earlier work has speculated that the source could be cochlear or higher (e.g., Okanoya and Dooling, 1987). The current study appears to rule out a cochlear source to this phenomenon.

      Weaknesses:

      The conclusions are rendered questionable by two major problems.

      The first problem is that the study does not provide new behavioral data, but instead relies on decades-old estimates that used techniques dating back to the 1970s, which have been found to be flawed in various ways. The behavioral techniques that have been developed more recently in the human psychophysical literature have avoided these well-documented confounds, such as nonlinear suppression effects (e.g., Houtgast, https://doi.org/10.1121/1.1913048; Shannon, https://doi.org/10.1121/1.381007; Moore, https://doi.org/10.1121/1.381752), perceptual confusion between pure-tone maskers and targets (e.g., Neff, https://doi.org/10.1121/1.393678), beats and distortion products produced by interactions between simultaneous maskers and targets (e.g., Patterson, https://doi.org/10.1121/1.380914), unjustified assumptions and empirical difficulties associated with critical band and critical ratio measures (Patterson, https://doi.org/10.1121/1.380914), and 'off-frequency listening' phenomena (O'Loughlin and Moore, https://doi.org/10.1121/1.385691). More recent studies, tailored to mimic to the extent possible the techniques used in ANTCs, have provided reasonably accurate estimates of cochlear tuning, as measured with ANTCs and SFOAEs (Shera et al., 2003, 2010; Sumner et al., 2010). No such measures yet exist in budgerigars, and this study does not provide any. So the study fails to provide valid behavioral data to support the claims made.

      We appreciate the reviewer’s efforts in summarizing and critiquing our study. We feel that the budgerigar data collected by the Dooling and Saunders labs remain essentially valid today. The methods used in these behavioral studies are rigorous and remain widely used in animal research (e.g., critical bands and ratios: Yost & Shofner, 2009; King et al., 2015; simultaneous masking: Burton et al., 2018). The methods are based on the same power-spectrum-model assumptions of auditory masking as even the most recent and elaborate human psychophysical procedures. We therefore believe that it remains highly relevant to test and report whether these methods can accurately predict cochlear tuning. More importantly, while forward-masking behavioral results are hypothesized to more accurately predict cochlear tuning humans (Shera et al., 2002; Joris et al., 2011; Sumner et al., 2018), evidence from nonhumans is controversial. For example, one study showed a closer match between forward-masking results and auditory-nerve tuning (ferret: Sumner et al., 2018), whereas several others showed a close match for simultaneous masking results (e.g., guinea pig, chinchilla, macaque; reviewed by Ruggero & Temchin, 2005; see Joris et al., 2011 for macaque auditory-nerve tuning). Moreover, forward- and simultaneous-masking results can often be equated with a simple scaling factor (e.g., Sumner et al., 2018). Given no consensus on an optimal behavioral method, and seemingly limited potential for the “wrong” method to fundamentally transform the shape of the behavioral tuning quality function, it seems reasonable to accept previously published behavioral tuning estimates as valid while also discussing limitations and remaining open to alternative interpretations. We added these points to the discussion and added clarification throughout as to the specific behavioral approaches used.

      The second, and more critical, problem can be observed by considering the frequencies at which the old behavioral data indicate a worsening of tuning. From the summary shown in the present Fig. 2, the conclusion that behavioral frequency selectivity worsens again at higher frequencies is based on four data points, all with probe frequencies between 5 and 6 kHz. Comparing this frequency range with the absolute thresholds shown in Fig. 3 (as well as from older budgerigar data) shows it to be on the steep upper edge of the hearing range. Thus, we are dealing not so much with a fovea as the point where hearing starts to end. The point that anomalous tuning measures are found at the edge of hearing in the budgerigar has been made before: Saunders et al. (1978) state in the last sentence of their paper that "the size of the CB rapidly increases above 4.0 kHz and this may be related to the fact that the behavioral audibility curve, above 4.0 kHz, loses sensitivity at the rate of 55 dB per octave."

      Hearing abilities are hard to measure accurately on the upper frequency edge of the hearing range, in humans as well as in other species. The few attempts to measure human frequency selectivity at that upper edge have resulted in quite messy data and unclear conclusions (e.g., Buus et al., 1986, https://doi.org/10.1007/978-1-4613-2247-4_37). Indeed, the only study to my knowledge to have systematically tested human frequency selectivity in the extended high frequency range (> 12 kHz) seems to suggest a substantial broadening, relative to the earlier estimates at lower frequencies, by as much as a factor of 2 in some individuals (Yasin and Plack, 2005; https://doi.org/10.1121/1.2035594) - in other words by a similar amount as suggested by the budgerigar data. The possible divergence of different measures at the extreme end of hearing could be due to any number of factors that are hard to control and calibrate, given the steep rate of threshold change, leading to uncontrolled off-frequency listening potential, the higher sound levels needed to exceed threshold, as well as contributions from middle-ear filtering. As a side note, in the original ANTC data presented in this study, there are actually very few tuning curves at or above 5 kHz, which are the ones critical to the argument being forwarded here. To my eye, all the estimates above 5 kHz in Fig. 3 fall below the trend line, potentially also in line with poorer selectivity going along with poorer sensitivity as hearing disappears beyond 6 kHz.

      This is an excellent point, also raised by reviewer 1, that declining audibility above 4 kHz could influence behavioral tuning measures. While we acknowledge this possibility, declining audibility cannot fully explain the unusual pattern of behavioral frequency tuning in budgerigars considering that other bird species with the same audiogram phenotype show conventional tuning patterns. We added these points to the Introduction and Fig. 1B. We also added clarification throughout that it is not just the shape of tuning function that is noteworthy in budgerigars, but also the extreme slope in the 1-3.5 kHz region. Behavioral tuning quality in budgerigars increases by 5.3 dB/octave in this range (i.e., nearly doubling each octave increase in frequency), vs. 1.8 dB/octave in humans, 2.5 dB/octave in ferret, 1.1 dB/octave in macaque, and 1.9 dB/octave in starling. This additional background information, including Fig. 1B, substantially strengthens the claim of mismatched behavioral and neural/otoacoustic frequency tuning in budgerigars.

      The basic question posed in the current study title and abstract seems a little convoluted (why would you expect a behavioral measure to reflect cochlear mechanics more accurately than a cochlear-based emissions measure?). A more intuitive (and likely more interesting) way of framing the question would be "What is the neural/mechanical source of a behaviorally observed acoustic fovea?" Unfortunately, this question does not lend itself to being answered in the budgerigar, as that 'fovea' turns out to be just the turning point at the end of the hearing range. There is probably a reason why no other study has referred to this as an acoustic fovea in the budgerigar.

      Overall, a safe interpretation of the data is that hearing starts to change (and becomes harder to measure) at the very upper frequency edge, and not just in budgerigars. Thus, it is difficult to draw any clear conclusions from the current work, other than that the relations between ANTC and SFOAEs estimates of tuning are consistent in budgerigar, as they are in most (all?) other species that have been tested so far.

      We removed the term fovea from the paper. See above for our argument that unusual behavioral tuning in budgerigars is not simply or fully explainable by the audiogram.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Line 34. As far as I could tell, no other study has referred to this region in budgerigar as an acoustic fovea. Probably for good reason (see above). This wording should probably be avoided.

      We removed the term.

      Line 35. Describing 3.5-4 kHz as 'mid-frequencies' is a stretch. 4 kHz is actually the corner frequency, above which hearing degrades.

      We added a more detailed and accurate description of the tuning pattern.

      Lines 89-91. This seems a nice statement of the problem, and to my mind makes for a much better rationale for the study.

      Line 255. "mixed effect" should "mixed effects".

      We made the correction.

      Line 380. Kuhn and Saunders didn't measure high enough to detect any changes in tuning.

      We removed the reference here.

    1. Author response:

      Public Reviews:  

      Reviewer #1 (Public review):

      Summary:

      This manuscript presents a study on expectation manipulation to induce placebo and nocebo effects in healthy participants. The study follows standard placebo experiment conventions with the use of TENS stimulation as the placebo manipulation. The authors were able to achieve their aims. A key finding is that placebo and nocebo effects were predicted by recent experience, which is a novel contribution to the literature. The findings provide insights into the differences between placebo and nocebo effects and the potential moderators of these effects.

      Specifically, the study aimed to:

      (1) assess the magnitude of placebo and nocebo effects immediately after induction through verbal instructions and conditioning

      (2) examine the persistence of these effects one week later, and

      (3) identify predictors of sustained placebo and nocebo responses over time.

      Strengths:

      An innovation was to use sham TENS stimulation as the expectation manipulation. This expectation manipulation was reinforced not only by the change in pain stimulus intensity, but also by delivery of non-painful electrical stimulation, labelled as TENS stimulation.

      Questionnaire-based treatment expectation ratings were collected before conditioning and after conditioning, and after the test session, which provided an explicit measure of participants' expectations about the manipulation.

      The finding that placebo and nocebo effects are influenced by recent experience provides a novel insight into a potential moderator of individual placebo effects.

      We thank the reviewer for their thorough evaluation of our manuscript and for highlighting the novelty and originality of our study.

      Weaknesses:

      There are a limited number of trials per test condition (10), which means that the trajectory of responses to the manipulation may not be adequately explored.

      We appreciate the reviewer’s comment regarding the number of trials in the test phase (i.e., 10 trials per condition). This trial number was chosen to ensure comparability with previous studies employing similar designs and research questions (e.g. Colloca et al., 2010). Our primary objective was to directly compare placebo and nocebo effects within a within-subject design and to examine their persistence one week after the first test session. While we did not specifically aim to investigate the trajectory of responses within a single testing session, we fully agree that a comprehensive analysis of the trajectories of expectation effects on pain would be a valuable extension of our work. We will acknowledge this limitation and future direction in the revised manuscript. 

      On day 8, one stimulus per stimulation intensity (i.e., VAS 40, 60, and 80) was applied before the start of the test session to re-familiarise participants with the thermal stimulation. There is a potential risk of revealing the manipulation to participants during the re-familiarization process, as they were not previously briefed to expect the painful stimulus intensity to vary without the application of sham TENS stimulation.

      We thank the reviewer for the opportunity to clarify that participants were informed at the beginning of the experiment that we would use different stimulation intensities to re-familiarize them with the stimuli before the second test session. We are therefore confident that participants perceived this step as part of a recalibration rather than associating it with the experimental manipulation. We will add this information to the revised version of the manuscript. 

      The differences between the nocebo and control conditions in pain ratings during conditioning could be explained by the differing physiological effects of the different stimulus intensities, so it is difficult to make any claims about expectation effects here.

      We appreciate the reviewer’s comment and agree that, despite the careful calibration of the three pain stimuli, we cannot entirely rule out the possibility that temporal dynamics during the conditioning session were influenced by differential physiological effects of the varying stimulus intensities (e.g., intensity-dependent habituation or sensitization). We will address this in the revision of the manuscript, but we would like to emphasize that the stronger nocebo effects during the test phase are statistically controlled for any differences in the conditioning session. 

      A randomisation error meant that 25 participants received an unbalanced number of 448 trials per condition (i.e., 10 x VAS 40, 14 x VAS 60, 12 x VAS 80).

      We agree that it is unfortunate that 25 participants were conditioned with an unbalanced number of trials per condition during the conditioning session. In the revised version of the manuscript, we will include additional analyses to demonstrate that this imbalance did not systematically bias the results and that the findings observed during the test phase remain robust despite this error.  

      Reviewer #2 (Public review):

      Summary:

      Kunkel et al aim to answer a fundamental question: Do placebo and nocebo effects differ in magnitude or longevity? To address this question, they used a powerful within-participants design, with a very large sample size (n=104), in which they compared placebo and nocebo effects - within the same individuals - across verbal expectations, conditioning, testing phase, and a 1-week follow-up. With elegant analyses, they establish that different mechanisms underlie the learning of placebo vs nocebo effects, with the latter being acquired faster and extinguished slower. This is an important finding for both the basic understanding of learning mechanisms in humans and for potential clinical applications to improve human health.

      Strengths:

      Beyond the above - the paper is well-written and very clear. It lays out nicely the need for the current investigation and what implications it holds. The design is elegant, and the analyses are rich, thoughtful, and interesting. The sample size is large which is highly appreciated, considering the longitudinal, in-lab study design. The question is super important and well-investigated, and the entire manuscript is very thoughtful with analyses closely examining the underlying mechanisms of placebo versus nocebo effects.

      We thank the reviewer for their positive evaluation of our manuscript and for acknowledging the large sample size, methodological rigor, and the significant implications for clinical applications and the broader research field.

      Weaknesses:

      There were two highly addressable weaknesses in my opinion:

      (1) I could not find the preregistration - this is crucial to verify what analyses the authors have committed to prior to writing the manuscript. Please provide a link leading directly to the preregistration - searching for the specified number in the suggested website yielded no results.

      We apologize that the registration number alone does not directly lead to the preregistration of this study. We thank the reviewer for pointing this out and will include a link to the preregistration in the revised manuscript. This study was pre-registered with the German Clinical Trial Register (registration number: DRKS00029228; https://drks.de/search/de/trial/DRKS00029228).

      (2) There is a recurring issue which is easy to address: because the Methods are located after the Results, many of the constructs used, analyses conducted, and even the main placebo and nocebo inductions are unclear, making it hard to appreciate the results in full. I recommend finding a way to detail at the beginning of the results section how placebo and nocebo effects have been induced. While my background means I am familiar with these methods, other readers will lack that knowledge. Even a short paragraph or a figure (like Figure 4) could help clarify the results substantially. For example, a significant portion of the results is devoted to the conditioning part of the experiment, while it is unknown which part was involved (e.g., were temperatures lowered/increased in all trials or only in the beginning).

      We thank the reviewer for this comment and suggestion. In the revised version, we will restructure the manuscript and include more detailed information about the key experimental procedures and design at the beginning of the Results section to enhance clarity and improve the interpretability of the reported findings.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Previous studies have shown that the MSH6 family of mismatch repair proteins contains an unstructured N-terminal domain that contains either a PWWP domain, a Tudor domain or neither and that the interaction of the histone reader domains with the appropriate histone H3 modification enhances mismatch repair, and hence reduces mutation rates in coding regions to some extent. However, the elimination of the MSH6-histone modification probably does not completely eliminate mismatch repair, although the published papers on this point do not seem definitive.

      In this study, the authors perform a details phylogenetic analysis of the presence of the PWWP and Tudor domains in MSH6 proteins across the tree of life. They observe that there are basically three classes of organisms that contain either a PWWP domain, a Tudor domain, or neither. On the basis of their analysis, they suggest that this represents convergent evolution of the independent acquisition of histone reader domains and that key amino acid residues in the reader domains are selected for.

      Strengths:

      The phylogenetic aspects of the work seem well done and the basic evolutionary conclusions of the work are well supported. The basic evolutionary conclusions are interesting and there is little to criticize from my perspective.

      Thank you for the positive evaluation. We appreciate your interest and review.

      Weaknesses:

      A major concern about this paper is that the authors fail to put their work into the proper context of what is already known about the N-terminus of MSH6. Further, their structural studies, which are really structural illustrations, are misleading, often incorrect, and not always helpful in addition to having been published before.

      Thank you for the helpful suggestions on this front. We agree that some of the structural visualizations were over simplified and apologize for the lack of clarity. Notably, we did not annotate the presence of putative or known short PCNA-interacting protein (PIP) motifs which have been found at the linker disordered N-terminus of MSH6 proteins. Indeed, while not direct to our investigation of the origins of histone readers, the PIP motifs are an interesting and functionally important feature of MSH6 structural biology, especially because they may facilitate DNA repair processes more generally. In the revised manuscript, we aim to improve the scholarship on this topic and clarify the presence/importance of this motif for MSH6 function, as well as what is known about the structural biology of the MSH6 N-terminus more broadly. We will add annotations of the PIP motif and will also improve structural prediction by visualizing MSH6 structure in its dimerized form with MSH2, for a more accurate estimate of its folding in vivo. We hope that these in addition to other valuable suggested improvements will enhance the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      In this work, Monroe JG and colleagues show a compelling case of convergent evolution in the fusion between an important mismatch repair protein (MSH6) and histone reader domains across the tree of life. These fused MSH6 readers have been shown to be important for the recruitment of MSH6 to exon-rich genome locations, therefore improving the efficiency of reducing mutation rates in coding regions.

      Comparative genomic analyses here performed revealed independent instances of MSH6 fusion with histone readers in plants and metazoa with several instances of putative loss (or gain) across the phylogeny. The work also unveiled instances of MSH6 fusion putatively interesting domains in fungi which might be worth exploring in the future.

      The authors also show potential signatures of purifying selection in functional amino acids MSH6 histone readers.

      Overall the approach is adequate for the questions proposed to be answered, the analyses are rigorous and support the authors' claims.

      DNA repair genes are essential to maintain genome stability and fidelity, and alterations in these pathways have been associated with hypermutation phenotypes in the context for instance of cancer in humans, with sometimes implications in treatment resistance. This is an important work that contributes to our understanding of the evolutionary consequences of the evolution of epigenome-targeted DNA repair.

      Strengths:

      The methods used are adequate for the questions and support the results. The search for MSH6 fusions was rigorous and conservative, which strengthens the significance of the claims on the evolutionary history of these fusion events.

      Thank you for the positive evaluation. We appreciate your interest and review.

      Weaknesses:

      I did not identify any major weaknesses, but please see my suggestions/recommendations.

      Thank you, we will also address your suggestions, which provide valuable recommendations for improving the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      In the manuscript entitled "Convergent evolution of epigenome recruited DNA repair across the Tree of Life", Monroe et al. investigate bioinformatically how some important mechanisms of epigenome-targeted DNA repair evolved at the tree of life scale. They provide a clear example of convergent evolution of these mechanisms between animals and plants, investigating more than 4000 eukaryotic genomes, and uncovering a significant association between gain/retention of such mechanisms with genome size and high intron content, that at least partially explains the evolutionary patterns observed within major eukaryotic lineages.

      Strengths:

      The manuscript is well written, clear, and understandable, and has potentially broad interest. It provides a thorough analysis of the evolution of MSH6-related DNA repair mechanisms using more than 4000 eukaryotic genomes, a pretty impressive number allowing to identify both large-scale (i.e. kingdoms) as well as shorter-scale (i.e. phyla, orders) evolutionary patterns. Moreover, despite providing no experimental validation, it investigates with a sufficient degree of depth, a potential relationship between gain/retention of epigenome recruited DNA repair mediated by MSH6 and genomic, as well as life-history (population size, body mass, lifespan), traits. In particular, it provides convincing evidence for a causative effect between genome size/intron content and the presence/absence of this mechanism. Moreover, it stimulates further scientific investigation and biological questions to be addressed, such as the conservation of epigenomes across the tree of life, the existence of potential trade-offs in gain/retention vs. loss of such mechanisms, and the relationship between these processes, mutation rate heterogeneity, and evolvability.

      Thank you for the positive evaluation. We appreciate your interest and review.

      Weaknesses:

      Despite the interesting and necessary insights provided on (1) the evolution of DNA repair mechanisms, and (2) the convergent evolution of molecular mechanisms, this bioinformatic study emanates from studies in humans and Arabidopsis already showing signs of potential convergent evolution in aspects of epigenome-recruited DNA repair. For this, this study, although bioinformatically remarkably thorough, does not come as a surprise, potentially lowering its novelty.

      What could have increased further its impact, interest, and novelty could have been a more comprehensive understanding of the causative processes leading to gain/retention vs. loss of MSH6-related epigenetic recruitment mechanisms. The authors provide interesting associations with life-history traits (yet not significant), and significant links with genome size and intron content only at the theoretical level. For the first aspect, the analyses could have expanded toward other life-history traits. For the second, maybe it could have been even possible to tackle experimentally some of the generated questions, functionally in some models, or deepened using specific case studies.

      We agree that this work expands on recent experimental work in humans and Arabidopsis on the function of histone readers in MSH6, PWWP and Tudor, respectively. However, the evolution of these fusions remained a significant knowledge gap, limiting the degree to which functional work could be translated to other organisms. This study definitively characterized the evolutionary history of MHS6 histone readers and lays the groundwork for future investigations in diverse species. We agree that more causal inference would be valuable to understand the evolutionary pressures acting on MSH6 histone reader presence/absence. Indeed, we prioritized the conservative approach of testing hypotheses with strict phylogenetically constrained contrasts. While we observed highly significant associations between histone readers and genomic traits like intron content, associations with life history traits were only significant before accounting for phylogeny. It is possible that this is due to a lack of power because such traits are only available in limited taxa. In the revised manuscript, we aim to clarify potential causes, outline future experimental work beyond the scope of this individual study, and argue that this work highlights the need to catalog trait diversity at broader phylogenetic scales.  We also address other valuable suggestions in the revised manuscript.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Is peristimulus alpha (8-14 Hz) frequency and/or phase involved in shaping the length of visual and audiovisual temporal binding windows, as posited by the discrete sampling hypothesis? If so, to what extent and perceptual scenario are they functionally relevant? The authors addressed such questions by collecting EEG data during the completion of the widely-known 2-flash fusion paradigm, administered both in a standard (i.e., visual only, F2) and audiovisual (i.e., 2 flashes and 1 beep, F2B1) fashion. Instantaneous frequency estimation performed over parieto-occipital sensors revealed slower alpha rhythms right after stimulus onset in the F2B1 condition, as compared to the F2, a pattern found to correlate with the difference between modality-specific ISIs (F2B1-F2). Of note, peristimulus alpha frequency differed also between 1 vs 2 flashes reports, although in the visual modality only (i.e., faster alpha oscillations in 2 flash percept vs 1 flash). This pattern of results was reinvigorated in a causal manner via occipital tACS, which was capable of, respectively, narrowing down vs enlarging the temporal binding window of individuals undergoing 13 Hz vs 8 Hz stimulation in the F2 modality alone. To elucidate what the oscillatory signatures of crossmodal integration might be, the authors further focused on the phase of posterior alpha rhythms. Accordingly, the Phase Opposition Sum proved to significantly differ between modalities (F2B1 vs F2) during the prestimulus time window, suggesting that audiovisual signals undergo finer processing based on the ongoing phase of occipital alpha oscillations, rather than the speed at which these rhythms cycle. As a last bit of information, a computational model factoring in the electrophysiological assumptions of both the discrete sampling hypothesis and auditory-induced phase-resetting was devised. Analyses run on such synthetic data were partially able to reproduce the patterns witnessed in the empirical dataset. While faster frequency rates broadly provide a higher probability to detect 2 flashes instead of 1, the occurrence of a concurrent auditory signal in cross-modal trials should cause a transient elongation (i.e. slower frequency rate) of the ongoing alpha cycle due to phase-reset dynamics (as revealed via inter-trial phase clustering), prompting larger ISIs during F2B1 trials. Conversely, the model provides that alpha oscillatory phase might predict how well an observer dissociates sensory information from noise (i.e., perceptual clarity), with the second flash clearly perceived as such as long as it falls within specific phase windows along the alpha cycle.

      Strengths:

      The authors leveraged complementary approaches (EEG, tACS, and computational modelling), the results thereof not only integrate, but depict an overarching mechanistic scenario elegantly framing phase-resetting dynamics into the broader theoretical architecture posited by the discrete sampling hypothesis. Analyses on brain oscillations (either via frequency sliding and phase opposition sum) mostly appear to be methodologically sound, and very-well supported by tACS results. Under this perspective, the modelling approach serves as a convenient tool to reconcile and shed more light on the pieces of evidence gathered on empirical data, returning an appealing account on how cross-modal stimuli interplay with ongoing alpha rhythms and differentially affect multisensory processing in humans.

      Weaknesses:

      Some information relative to the task and the analyses is missing. For instance, it is not entirely clear from the text what the number of flashes actually displayed in explicit short trials is (1 or 2?). We believe it is always two, but it should be explicitly stated.

      We thank the reviewer for highlighting this important point. In our study, all explicit trials consistently presented two flashes. We will clearly state this detail in the Methods section to avoid any further confusion.

      Moreover, the sample size might be an issue. As highlighted by a recent meta-analysis on the matter (Samaha & Romei, 2024), an underpowered sample size may very well drive null-findings relative to tACS data in F2B1 trials, in interplay with broad and un-individualized frequency targets.

      We thank the reviewer for raising this point. First, we would like to clarify that our results do not suggest that the frequency effect is absent in the F2B1 condition; rather, it is relatively attenuated compared to the F2 condition. If the sample size were the primary issue, we would expect to observe a null effect in both conditions. Instead, the stronger frequency modulation in F2 confirms that the sound-induced modulation is present, albeit reduced in the audiovisual context. In our revised manuscript, we will explicitly note that our claim is not that there is no frequency effect in F2B1 but that the effect is weaker relative to F2, and we will also acknowledge the potential limitations associated with sample size and the lack of individualized frequency targeting.

      Some criticality arises regarding the actual "bistability" of bistable trials, as the statistics relative to the main task (i.e., the actual means and SEMs are missing) broadly point toward a higher proclivity to report 2 instead of 1 flash in both F2B1 and F2 trials. This makes sense to some extent, given that 2 flashes have always been displayed (at least in bistable trials), yet tells about something botched during the pretest titration procedure.

      We thank the reviewer for pointing out the potential bias toward reporting “two flashes” in the bistable trials. Because our experimental design involves presenting two flashes in both explicit and bistable trials, a slight tendency to report two flashes may naturally arise, especially at threshold levels determined during pretesting. We believe, however, that this bias does not undermine our primary findings. Our psychophysical procedure is designed to align the inter-stimulus interval with each participant’s fusion threshold, aiming for a near 50/50 split between “one-flash” and “two-flash” reports. However, given that two flashes are always presented, participants may be predisposed to report two flashes when uncertain. This reflects a plausible perceptual bias inherent in the bistable design, rather than a systematic flaw. Importantly, this tendency appears at comparable levels in both the F2 and F2B1 conditions, indicating that it does not selectively affect any particular condition. In the revised manuscript, we will include additional descriptive statistics, such as means and standard deviations, to demonstrate that the observed bias remains within an acceptable range and does not compromise our core conclusions regarding the modulatory effect of auditory input on visual integration.

      Coming to the analyses on brain waves, one main concern relates to the phase-reset-induced slow-down of posterior alpha rhythms being of true oscillatory nature, rather than a mere evoked response (i.e., not sustained over time).

      We appreciate the reviewer’s concern regarding this issue. First, the sustained decrease in posterior alpha frequency observed in our study—persisting for approximately 280 ms—substantially exceeds the typical duration of an auditory evoked potential (generally 50–200 ms) (Näätänen and Picton, 1987). This extended period of modulation suggests that it is not merely a transient evoked response.

      Second, our analysis of alpha power further supports this interpretation. A purely evoked response is usually accompanied by a corresponding increase in signal power; however, our results show no such power increase when comparing the F2B1 condition with the F2 condition.

      Moreover, the observed increase in alpha phase resetting—as measured by inter-trial phase coherence (ITC)—does not significantly correlate with changes in alpha power. This dissociation further indicates that the auditory-induced effects are unlikely to be driven solely by evoked potentials, but are more consistent with a reorganization of the intrinsic neural oscillatory activity.

      Together, these lines of evidence strongly support the view that the auditory-induced decrease in alpha frequency reflects true changes in ongoing oscillatory dynamics, rather than being merely a transient evoked response.

      Another question calling for some further scrutiny regards the overlooked pattern linking the temporal extent of the IAF differences between F2 and F2B1 trials with the ISIs across experimental conditions (explicit short, bistable, and explicit long). That is, the wider the ISI, the longer the temporal extent of the IAF difference between sensory modalities. Although neglected by the authors, such a trend speaks in favour of a rather nuanced scenario stemming from not only auditory-induced phase-reset alpha cycle elongation, but also some non-linear and perhaps super-additive contribution of flash-induced phase-resetting. This consideration introduces some of the issues about the computational simulation, which was modelled around the assumption of phase-resetting being triggered by acoustic stimuli alone. Given how appealing the model already is, I wonder whether the authors might refine the model accordingly and integrate the phase-resetting impact of visual stimuli upon synthetic alpha rhythms.

      We appreciate the reviewer’s insightful comment regarding the potential influence of flash-induced phase resetting on the temporal extent of the IAF differences. We acknowledge that the observation—that wider ISIs are associated with a longer period of IAF differences—hints at a non-linear or even super-additive interaction between auditory- and flash-induced phase resetting mechanisms.

      However, the primary focus of our current study is on how auditory stimuli affect alpha oscillatory dynamics. Our experimental design and computational model were specifically optimized to capture auditory-induced phase resetting. Incorporating the additional influence of flash-induced effects would require a significantly more refined experimental framework and a more complex modeling approach. This added complexity could obscure the interpretation of our main findings, which are centered on auditory influences.

      In the revised manuscript, we will address this intriguing possibility in the Discussion section. We will acknowledge that while the data hint at a potential visual contribution, our present model deliberately isolates auditory-induced phase resetting to maintain clarity. We also propose that future research, with more precise experimental designs and enhanced modeling techniques, is necessary to fully disentangle and capture the interplay between auditory and flash-induced phase resetting mechanisms.

      Relatedly, I would also suggest the authors to throw in a few more simulations to explore the parameter space and assay, to which quantitative extent the model still holds (e.g. allowing alpha frequency to randomly change within a range between 8 and 13 Hz, or pivoting the phase delay around 10 or 50 ms).

      We appreciate the reviewer’s suggestion to further explore our model’s parameter space. In response, we will conduct additional simulations that incorporate variability in alpha frequency—sampling randomly between 8 and 13 Hz—and examine alternative phase delays (e.g., around 10 and 50 ms). By systematically adjusting these parameters, we can more thoroughly evaluate the model’s robustness and delineate its boundaries under a broader range of neurophysiological conditions. We will present these results in the revised manuscript and discuss how they inform our understanding of alpha-driven visual integration in cross-modal contexts.

      As a last remark, I would avoid, or at least tone down, concluding that the results hereby presented might reconcile and/or explain the null effects in Buergers & Noppeney, 2022; as the relationship between IAFs and audiovisual abilities still holds when examining other cross-modal paradigms such as the Sound-Induced Flash-Illusion (Noguchi, 2022), and the aforementioned patterns might be due to other factors, such as a too small sample size (Samaha & Romei, 2024).

      We appreciate the reviewer’s suggestion and will revise our claims accordingly. In the revised manuscript, we will clarify that while our study demonstrates a mechanism by which alpha oscillations influence audiovisual integration in certain paradigms, this does not mean that our findings fully reconcile all conflicting results in the literature. We will emphasize that our mechanism may help explain why alpha frequency plays a critical role in some experimental settings, but that factors such as sample size, task parameters, and experimental design differences likely contribute to the divergent results observed across studies. Accordingly, we acknowledge that further research with larger samples and more refined methodologies is necessary to fully reconcile these discrepancies. This more cautious interpretation will be clearly discussed in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The authors used a visual flash discrimination task in which two flashes are presented one after another with different inter-stimulus intervals. Participants either perceive one flash or two flashes. The authors show that the simultaneous presence of an auditory input extends the temporal window of integration, meaning that two flashes presented shortly after one another are more likely to be perceived as a single flash. Auditory inputs are accompanied by a reduction in alpha frequency over visual areas. Prestimulus alpha frequency predicts perceptual outcomes in the absence of auditory stimuli, whereas prestimulus alpha phase becomes the dominant predictor when auditory input is present. A computational model based on phase-resetting theory supports these findings. Additionally, a transcranial stimulation experiment confirms the causal role of alpha frequency in unimodal visual perception but not in cross-modal contexts.

      Strengths:

      The authors elegantly combined several approaches-from behavior to computational modeling and EEG-to provide a comprehensive overview of the mechanisms involved in visual integration in the presence or absence of auditory input. The methods used are state-of-the-art, and the authors attempted to address possible pitfalls.

      Weaknesses:

      The use of Bayesian statistics could further strengthen the paper, especially given that a few p-values are close to the significance threshold (lines 162 & 258), but they are interpreted differently in different cases (absence of effect vs. trend).

      We appreciate the reviewer’s suggestion regarding the use of Bayesian statistics. We agree that a Bayesian framework can offer valuable complementary insights to our analysis by helping to distinguish whether a marginal p-value represents a trend or truly indicates the absence of an effect. To enhance the robustness of our conclusions, we will incorporate supplemental Bayesian analyses in the revised manuscript.

      Overall, these results provide new insights into the role of alpha oscillations in visual processing and offer an interesting perspective on the current debate regarding the roles of alpha phase and frequency in visual perception. More generally, they contribute to our understanding of the neural dynamics of multisensory integration.

      Reviewer #3 (Public review):

      Summary:

      The authors investigated the impact of an auditory stimulus on visual integration at the behavioral, electrophysiological, and mechanistic levels. Although the role of alpha brain oscillations on visual perception has been widely studied, how the brain dynamics in the visual cortices are influenced by a cross-modal stimulus remains ill-defined. The authors demonstrated that auditory stimulation systematically induced a drop in visual alpha frequency, increasing the time window for audio-visual integration, while in the unimodal condition, visual integration was modulated by small variations within the alpha frequency range. In addition, they only found a role of the phase of alpha brain oscillations on visual perception in the cross-modal condition. Based on the perceptual cycles' theory framework, the authors developed a model allowing them to describe their results according to a phase resetting induced by the auditory stimulation. These results showed that the influence of well-known brain dynamics on one modality can be disrupted by another modality. They provided insights into the importance of investigating cross-modal brain dynamics, and an interesting model that extends the perceptual cycle framework.

      Strengths:

      The results are supported by a combination of various, established experimental and analysis approaches (e.g., two-flash fusion task, psychometric curves, phase opposition), ensuring strong methodological bases and allowing direct comparisons with related findings in the literature.

      The model the authors proposed is an extension and an improvement of the perceptual cycle's framework. Interestingly, this model could then be tested in other experimental approaches.

      Weaknesses:

      There is an increasing number of studies in cognitive neuroscience showing the importance of considering inter-individual variability. The individual alpha frequency (IAF) varied from 8 to 13 Hz with a huge variability across participants, and studies have shown that the IAF influenced visual perception. Investigating inter-individual variations of the IAF in the reported results would be of great interest, especially for the model.

      We appreciate the reviewer’s valuable feedback regarding the importance of inter-individual variability in alpha frequency. In our current study, we have already addressed participant-level variability in our neural data by performing inter-subject correlation analyses, investigating whether individual reductions in alpha frequency correlate with broader temporal integration windows at the behavioral level.

      Moreover, our computational model incorporates physiologically realistic distributions for key parameters, including frequency and amplitude, which captures some degree of individual variability. Nevertheless, we acknowledge that a more targeted examination of how different IAF values specifically affect the model’s predictions would be highly valuable. In response, we will expand our simulations to systematically explore a range of IAF values and assess their impact on temporal integration windows and related measures of audiovisual processing. These additional analyses will help clarify the role of inter-individual variability in alpha frequency and further strengthen the mechanistic account offered by our model. We will detail these enhancements and discuss their implications in the revised manuscript.

      Although the use of non-invasive brain stimulation to infer causality is a method of great interest, the use of tACS in the presented work is not optimal. Instead of inducing alpha brain oscillations in visual cortices, the use of tACS to activate the auditory cortex instead of the actual auditory stimulation would have presented more interest.

      We appreciate the reviewer’s suggestion and acknowledge that non-invasive brain stimulation offers promising avenues for inferring causality. In our study, our primary hypothesis focused on the role of occipital alpha oscillations in defining the temporal window for visual integration, and accordingly we targeted visual cortex in our tACS protocol.

      We recognize that stimulating the auditory cortex could provide additional insights into auditory contributions to phase resetting. However, accurately targeting the auditory cortex with tACS presents technical challenges. The auditory cortex is located deeper within the temporal lobe, and factors such as variable skull thickness and complex current spread make it difficult to reliably modulate its neural activity compared to the more superficial visual areas. Indeed, recent studies have demonstrated that tACS-induced electric fields in the temporal regions tend to be weaker and less focal—for example, Huang et al. (2017) and Opitz et al. (2016) highlight the limitations in achieving robust stimulation of deeper or anatomically complex brain regions using conventional tACS approaches.

      Given these considerations, while we agree that future investigations could benefit from exploring auditory cortex stimulation—either as an alternative or as a complementary approach—the present study remains focused on visual alpha modulation, where our protocol is well validated and yields reliable results. In the revised manuscript, we will clearly discuss these issues and acknowledge the potential, yet technically challenging, possibility of stimulating the auditory cortex in future work to further disentangle the contributions of auditory and visual inputs to cross-modal integration.

    1. Author response:

      Reviewer 1 (Public Review):

      “Summary:

      In this paper, the authors aimed to test the ability of bumblebees to use bird-view and ground-view for homing in cluttered landscapes. Using modelling and behavioural experiments, the authors showed that bumblebees rely most on ground-views for homing.

      Strengths:

      The behavioural experiments are well-designed, and the statistical analyses are appropriate for the data presented.

      Weaknesses:

      Views of animals are from a rather small catchment area.

      Missing a discussion on why image difference functions were sufficient to explain homing in wasps (Murray and Zeil 2017).

      The artificial habitat is not really 'cluttered' since landmarks are quite uniform, making it difficult to infer ecological relevance.”

      Thank you for your thorough evaluation of our study. We aimed to investigate local homing behaviour on a small scale, which is ecologically relevant given that the entrance of bumblebee nests is often inconspicuously hidden within the vegetation. This requires bees to locate their nest entrance using views within a confined area. While many studies have focused on larger scales using radar tracking (e.g. Capaldi et al. 2000; Osborne et al. 2013; Woodgate et al. 2016), there is limited understanding of the mechanisms behind local homing on a smaller scale, especially in dense environments.

      We appreciate your suggestion to include the study by Murray and Zeil (2017) in our discussion. Their research explored the catchment areas of image difference functions on a larger spatial scale with a cubic volume of 5m x 5m x 5m. Aligned with their results, we found that image difference functions pointed towards the location of the objects surrounding the nest when the images were taken above the objects. However, within the clutter, i.e. the dense set of objects surrounding the nest, the model did not perform well in pinpointing the nest position.

      We agree with your comment about the term "clutter". Therefore, we will refer to our landmark arrangement as a "dense environment" instead. Uniformly distributed objects do indeed occur in nature, as seen in grasslands, flower meadows, or forests populated with similar plants.

      Reviewer 2 (Public Review):

      Summary:

      In a 1.5m diameter, 0.8m high circular arena bumblebees were accustomed to exiting the entrance to their nest on the floor surrounded by an array of identical cylindrical landmarks and to forage in an adjacent compartment which they could reach through an exit tube in the arena wall at a height of 28cm. The movements of one group of bees were restricted to a height of 30cm, the height of the landmark array, while the other group was able to move up to heights of 80cm, thus being able to see the landmark array from above.

      During one series of tests, the flights of bees returning from the foraging compartment were recorded as they tried to reach the nest entrance on the floor of the arena with the landmark array shifted to various positions away from the true nest entrance location. The results of these tests showed that the bees searched for the net entrance in the location that was defined by the landmark array.

      In a second series of tests, access to the landmark array was prevented from the side, but not from the top, by a transparent screen surrounding the landmark array. These tests showed that the bees of both groups rarely entered the array from above, but kept trying to enter it from the side.

      The authors express surprise at this result because modelling the navigational information supplied by panoramic snapshots in this arena had indicated that the most robust information about the location of the nest entrance within the landmark array was supplied by views of the array from above, leading to the following strong conclusions:

      line 51: "Snapshot models perform best with bird's eye views"; line 188: "Overall, our model analysis could show that snapshot models are not able to find home with views within a cluttered environment but only with views from above it."; line 231: "Our study underscores the limitations inherent in snapshot models, revealing their inability to provide precise positional estimates within densely cluttered environments, especially when compared to the navigational abilities of bees using frog's-eye views." Strengths:

      The experimental set-up allows for the recording of flight behaviour in bees, in great spatial and temporal detail. In principle, it also allows for the reconstruction of the visual information available to the bees throughout the arena.

      The experimental set-up allows for the recording of flight behaviour in bees, in great spatial and temporal detail. In principle, it also allows for the reconstruction of the visual information available to the bees throughout the arena.

      Weaknesses:

      Modelling:

      Modelling left out information potentially available to the bees from the arena wall and in particular from the top edge of the arena and cues such as cameras outside the arena. For instance, modelled IDF gradients within the landmark array degrade so rapidly in this environment, because distant visual features, which are available to bees, are lacking in the modelling. Modelling furthermore did not consider catchment volumes, but only horizontal slices through these volumes.

      When we started modelling the bees’ homing based on image-matching, we included the arena wall. However, the model simulations pointed only coarsely towards the clutter but not toward the nest position. We hypothesised that the arena wall and object location created ambiguity. Doussot et al. (2020) showed that such a model can yield two different homing locations when distant and local cues are independently moved. Therefore, we reduced the complexity of the environment by concentrating on the visual features, which were moved between training and testing. (Neither the camera nor the wall were moved between training and test). We acknowledge that this information should have been provided to substantiate our reasoning. As such, we will include model results with the arena wall in the revised paper.

      As we wanted to investigate if bees would use ground views or bird’s eye views to home in a dense environment, we think the catchment volumes would provide qualitatively similar, though quantitatively more detailed information as catchment slices. Our approach of catchment slices is sufficient to predict whether ground or bird' s-eye views perform better in leading to the nest, and we will, therefore, not include further computations of catchment volumes.

      Behavioural analysis:

      The full potential of the set-up was not used to understand how the bees' navigation behaviour develops over time in this arena and what opportunities the bees have had to learn the location of the nest entrance during repeated learning flights and return flights.

      Without a detailed analysis of the bees' behaviour during 'training', including learning flights and return flights, it is very hard to follow the authors' conclusions. The behaviour that is observed in the tests may be the result of the bees' extended experience shuttling between the nest and the entry to the foraging arena at 28cm height in the arena wall. For instance, it would have been important to see the return flights of bees following the learning flights shown in Figure 17.

      Basically, both groups of bees (constrained to fly below the height of landmarks (F) or throughout the height of the arena (B)) had ample opportunities to learn that the nest entrance lies on the floor of the landmark array. The only reason why B-bees may not have entered the array from above when access from the side was prevented, may simply be that bumblebees, because they bumble, find it hard to perform a hovering descent into the array.

      A prerequisite for studying the learning flight in a given environment is showing that the bees manage to return to their home. Here, our primary goal was to demonstrate this within a dense environment. While we understand that a detailed analysis of the learning and return flights would be valuable, we feel this is outside the scope of this particular study.

      Multi-snapshot models have been repeatedly shown to be sufficient to explain the homing behaviour in natural as well as artificial environments. A model can not only be used to replicate but also to predict a given outcome and shape the design of experiments. Here, we used the models to shape the experimental design, as it does not require the entire history of the bee's trajectory to be tested and provides interesting insight into homing in diverse environments.

      Our current knowledge of learning flights did not permit these investigations of bee training. Firstly, our setup does not allow us to record each inbound and outbound flight of the bumblebees during training. Doing so would require blocking the entire colony for extended time periods, potentially impairing the motivation of the bees to forage or the survival and development of the colony. Secondly, the exact locations where bees learn or if and whether they continuously learn by weighting the visual experience based on their positions and orientations is not always clear. It makes it difficult to categorise these flights accurately in learning and return flights. Additionally, homing models remain elusive on the learning mechanisms at play during the learning flights. Therefore, we believe that continuous effort must be made to understand bees' learning and homing ability. We felt it was necessary first to establish that bees could navigate back to the nest in a dense, cluttered environment. With this understanding, we are currently conducting a detailed study of the bees' learning flights in various dense environments and provide these results in a separate article.

      While we acknowledge that the bees had ample opportunities to learn the location of the nest entrance, we believe that their behaviour of entering the dense environment at a very low altitude cannot be solely explained by extended experience. It is possible that the bees could have also learned to enter at the edge of the objects or above the objects before descending within the clutter.

      General:

      The most serious weakness of the set-up is that it is spatially and visually constrained, in particular lacking a distant visual panorama, which under natural conditions is crucial for the range over which rotational image difference functions provide navigational guidance. In addition, the array of identical landmarks is not representative of natural clutter and, because it is visually repetitive, poses un-natural problems for view-based homing algorithms. This is the reason why the functions degrade so quickly from one position to the next (Figures 9-12), although it is not clear what these positions are (memory0-memory7).

      In conclusion, I do not feel that I have learnt anything useful from this experiment; it does suggest, however, that to fully appreciate and understand the homing abilities of insects, there is no alternative but to investigate these abilities in the natural conditions in which they have evolved.

      We respectfully disagree with the evaluation that our study does not provide new insights due to the controlled lab conditions. Both field and lab research are absolutely necessary and should feed each other. Dismissing the value of controlled lab experiments would overlook the contributions of previous lab-based research, which has significantly advanced our understanding of animal behaviour. It is only possible to precisely define the visual test environments under laboratory conditions and to identify the role of these components for the behaviour through targeted variation of individual components of the environment. These results should guide field-based experiments for validation.

      Our lab settings are a kind of abstraction of natural situations focusing on those aspects that are at the centre of the research question. Our approach here was that bumblebees have to find their inconspicuous nest hole in nature, which is difficult to find in often highly dense environments, and ultimately on a spatial scale in the metre range. We first wanted to find out if bumblebees can find their nest hole under the particularly challenging condition that all objects surrounding the nest hole are the same. This was not yet clear. Uniformly distributed objects may, however, also occur in nature, as seen with visually inconspicuous nest entrances of bumblebees in grass meadows, flower meadows, or forests with similar plants. We agree that the term "clutter" is not well-defined in the literature and will refer to our environment as a "dense environment."

      Despite the lack of a distant visual panorama, or also UV light, wind, or other confounding factor inherent to field work, the bees successfully located the nest position even when we shifted the dense environment within the flight arena. We used rotational-image difference functions based on snapshots taken around the nest position to predict the bees' behaviour, as this is one of the most widely accepted and computationally most parsimonious

      mechanisms for homing. This approach also proved effective in our more restricted conditions, where the bees still managed to pinpoint their home.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This paper tackles an important question: What drives the predictability of pre-stimulus brain activity? The authors challenge the claim that "pre-onset" encoding effects in naturalistic language data have to reflect the brain predicting the upcoming word. They lay out an alternative explanation: because language has statistical structure and dependencies, the "pre-onset" effect might arise from these dependencies, instead of active prediction. The authors analyze two MEG datasets with naturalistic data.

      Strengths:

      The paper proposes a very reasonable alternative hypothesis for claims in prior work. Two independent datasets are analyzed. The analyses with the most and least predictive words are clever, and nicely complement the more naturalistic analyses.

      Weaknesses:

      I have to admit that I have a hard time understanding one conceptual aspect of the work, and a few technical aspects of the analyses are unclear to me. Conceptually, I am not clear on why stimulus dependencies need to be different from those of prediction. Yes, it is true that actively predicting an upcoming word is different from just letting the regression model pick up on stimulus dependencies, but given that humans are statistical learners, we also just pick up on stimulus dependencies, and is that different from prediction? Isn't that in some way, the definition of prediction (sensitivity to stimulus dependencies, and anticipating the most likely upcoming input(s))?

      This brings me to some of the technical points: If the encoding regression model is learning one set of regression weights, how can those reflect stimulus dependencies (or am I misunderstanding which weights are learned)? Would it help to fit regression models on for instance, every second word or something (that should get rid of stimulus dependencies, but still allow to test whether the model predicts brain activity associated with words)? Or does that miss the point? I am a bit unclear as to what the actual "problem" with the encoding model analyses is, and how the stimulus dependency bias would be evident. It would be very helpful if the authors could spell out, more explicitly, the precise predictions of how the bias would be present in the encoding model.

      We thank the reviewer for their comments and address both points.

      Conceptually, there is a key difference between encoding predictions, i.e. pre-activations of future words, versus encoding stimulus dependencies. The speech acoustics provide a useful control case: they encode the stimulus (and therefore stimulus dependencies) but do not predict. When we apply the encoding analysis to the acoustics (i.e. when we estimate the acoustics pre-onset from post-onset words), we observe the “hallmarks of prediction” – yet, clearly, the acoustics aren't "predicting" the next word.

      This reveals the methodological issue: if the brain were just passively filtering the stimulus (akin to a speech spectrogram), these "prediction hallmarks" would still appear in the acoustics encoding results, despite no actual prediction taking place. Therefore, one necessary criterion for concluding pre-activation from pre-stimulus neural encoding, is that at least the pre-stimulus encoding performance is better on neural data than on the stimulus itself. This would show that the pre-onset neural signal contains additional predictive information about the next word beyond that of the stimulus (e.g. acoustics) itself. We will make this point more prominent in the revision.

      Regarding the regression: different weights are estimated per time point in a time-resolved regression. This allows for modeling of unfolding responses over time, but also for the learning of stimulus dependencies.

      To sum up, the difference between encoding dependencies and predictions is at the core of our work. We appreciate this was not clear in the initial version and we will make this much clearer in the revision, conceptually and methodologically.

      Reviewer #2 (Public review):

      Summary:

      At a high level, the reviewers demonstrate that there is an explanation for pre-word-onset predictivity in neural responses that does not invoke a theory of predictive coding or processing. The paper does this by demonstrating that this predictivity can be explained solely as a property of the local mutual information statistics of natural language. That is, the reason that pre-word onset predictivity exists could simply boil down to the common prevalence of redundant bigram or skip-gram information in natural language.

      Strengths:

      The paper addresses a problem of significance and uses methods from modern NeuroAI encoding model literature to do so. The arguments, both around stimulus dependencies and the problems of residualization, are compellingly motivated and point out major holes in the reasoning behind several influential papers in the field, most notably Goldstein et al. This result, together with other papers that have pointed out other serious problems in this body of work, should provoke a reconsideration of papers from encoding model literature that have promoted predictive coding. The paper also brings to the forefront issues in extremely common methods like residualization that are good to raise for those who might be tempted to use or interpret these methods incorrectly.

      Weaknesses:

      The authors don't completely settle the problem of whether pre-word onset predictivity is entirely explainable by stimulus dependencies, instead opting to show why naive attempts at resolving this problem (like residualization) don't work. The paper could certainly be better if the authors had managed to fully punch a hole in this.

      We thank the reviewer for their assessment.

      We believe the limitation we highlight extends beyond the specific method of residualizing features. Rather, it points to a fundamental problem: adjusting the features (X matrix) alone cannot address stimulus dependencies that persist in the signal (y matrix), as we demonstrate by using a different signal (acoustics) that encodes no predictions. While removing dependencies from the signal would be more thorough, this would also eliminate the effect of interest. We view this as a fundamental limitation of the encoding analysis approach combined with the experimental design, rather than something that can be resolved analytically. We will perform additional analyses to test this premise and elaborate on this point in our revision.

      Reviewer #3 (Public review):

      Summary:

      The study by Schönmann et al. presents compelling analyses based on two MEG datasets, offering strong evidence that the pre-onset response observed in a highly influential study (Goldstein et al., 2022) can be attributed to stimulus dependencies, specifically, the auto-correlation in the stimuli-rather than to predictive processing in the brain. Given that both the pre-onset response and the encoding model are central to the landmark study, and that similar approaches have been adopted in several influential works, this manuscript is likely to be of high interest to the field. Overall, this study encourages more cautious interpretation of pre-onset responses in neural data, and the paper is well written and clearly structured.

      Strengths:

      (1) The authors provide clear and convincing evidence that inherent dependencies in word embeddings can lead to pre-activation of upcoming words, previously interpreted as neural predictive processing in many influential studies.

      (2) They demonstrate that dependencies across representational domains (word embeddings and acoustic features) can explain the pre-onset response, and that these effects are not eliminated by regressing out neighboring word embeddings - an approach used in prior work.

      (3) The study is based on two large MEG datasets, showing that results previously observed in ECoG data can be replicated in MEG. Moreover, the stimulus dependencies appear to be consistent across the two datasets.

      Weaknesses:

      (1) To allow a more direct comparison with Goldstein et al., the authors could consider using their publicly available dataset.

      (2) Goldstein et al. already addressed embedding dependencies and showed that their main results hold after regressing out the embedding dependencies. This may lessen the impact of the concerns about self-dependency raised here.

      (3) While this study shows that stimulus dependency can account for pre-onset responses, it remains unclear whether this fully explains them, or whether predictive processing still plays a role. The more important question is whether pre-activation remains after accounting for these confounds.

      We thank the reviewer for their comments.

      We want to address a key unclarity regarding the procedure of regressing out embedding dependencies. While Goldstein et al. showed that neural encoding results persist after their control analysis (like we did, too, in our supplementary Figure S3), this does not lessen the concern surrounding stimulus dependencies. Our analyses demonstrate that even after such residualization, the "hallmarks of prediction" remain encodable in the speech acoustics – a control system that, by definition, cannot predict upcoming words. Therefore, the hallmarks of prediction can be fully explained by stimulus dependencies. This persistence in the acoustics strengthens rather than lessens our concerns about dependencies.

      This connects to a broader methodological point: our key evidence comes from analyzing the stimulus material itself as a control system. By comparing results from encoding neural responses to those of a system that encodes the stimulus, and therefore the dependencies that cannot predict the upcoming input (like acoustics), we can establish proper criteria for concluding that the brain engages in prediction. Notably, the Goldstein dataset was not available when we conducted this research. However, for the revision we will perform additional analyses to make a more direct comparison.

      Finally, our focus was not to definitively test whether the brain predicts upcoming words, but rather to establish rigorous methodological and epistemological criteria for making such claims. We will elaborate on this crucial distinction in our revision and more prominently feature our central argument about the limitations of current evidence for neural prediction.

    1. Author response:

      The following is the authors’ response to the original reviews

      Response to public reviews:

      We thank the reviewers for their careful evaluation of our manuscript and appreciate the suggestions for improvement. We will outline our planned revisions in response to these reviews.

      Reviewer 2: “The one exception is the claim that "maintenance of respiration is the only cellular target of chalkophore mediated copper acquisition." While under the in vitro conditions tested this does appear to be the case; however, it can't be ruled out that the chalkophore is important in other situations. In particular, for maintenance of the periplasmic superoxide dismutase, SodC, which is the other M. tuberculosis enzyme known to require copper.”

      And

      Reviewer 3: “Because the phenotype of M. tuberculosis lacking chalkophores is similar, if not identical, to using Q203, an inhibitor of cytochrome bcc:aa3, the authors propose that the coppercontaining cytochrome bcc:aa3 is the only recipient of copper-uptake by chalkophores. A minor weakness of the work is that this latter conclusion is not verified under infection conditions and other copper-enzymes might still be functionally required during one or more stages of infection.

      Both comments concern the question of whether the bcc:aa3 respiratory oxidase supercomplex is the only target of chalkophore delivered copper. In culture, our experiments suggest that bcc:aa3 is the only target. The evidence for this claim is in Figure 2E and F. In 2E, we show that M. tuberculosis D_ctaD_ (a subunit of bcc:aa3) is growth impaired, copper chelation with TTM does not exacerbate that growth defect, and that a D_ctaD_D_nrp_ double mutant is no more sensitive to TTM than D_ctaD_. These data indicate that role of the chalkophore in protecting against copper deprivation is absent when the bcc:aa3 oxidase is missing. Similar results were obtained with Q203 (Figure 2F). Q203 or TTM arrest growth of M. tuberculosis D_nrp, but the combination has no additional effect, indicating that when Q203 is inhibiting the _bcc:aa3 oxidase, the chalkophore has no additional role. However, we agree with the reviewers that we cannot exclude the possibility that during infection, there is an additional target of chalkophore mediated Cu acquisition. We have added this caveat to the discussion of revised version of this manuscript. 

      Response to Reviewers Recommendations for the authors:

      Reviewing Editor Comments:

      In addition to the specific recommendations below, there was consensus that the conclusions/discussion should contextualize that the results cannot exclude that in other conditions (such as in infection), enzymes other than cytochrome bcc:aa3 receive copper from the chalkophore system.

      Reviewer #1 (Recommendations for the authors):

      (1) In the introduction, the authors mention that the nrp operon is only present in pathogenic Mtb and Mycobacterium marinum but not non-pathogenic mycobacterium. Is the nrp operon present in other pathogenic mycobacterium such as in M. leprae, M. avium or M. abscessus?

      Bhatt et al (PMID 30381350) presented an analysis of the distribution of nrp gene clusters in mycobacteria and concluded that M. bovis, M. leprae and M. canetti clearly encode nrp genes. M. marinum has been shown to have a functional chalkophore biosynthetic cluster, but the presence of this system in other mycobacteria awaits experimental validation. We have added the Bhatt reference to this sentence in the introduction. 

      (2) Figure 1A - it would be helpful if the genes were grouped and labeled as per their purpose (for example, CytBD components, bcc:aa3 components). While these are described in the text, the genes belonging to the chalkophore cluster are not defined in the text, and are thus not easily identified in the figure.

      The order of genes in the heatmap is determined by unsupervised clustering as indicated by the dendrogram to the left of the heatmap. To highlight chalkophore and CytBD genes, we have added color coding to the gene names and explained this color coding in the legend. 

      (3) Figure 2B/2C - it is interesting that complementation of ΔnrpΔcydAB with cydABCD does not rescue growth to Δnrp levels. Is there an explanation for this? 

      AND

      (4) Figure 2C - BCS is not introduced in the text for this figure nor are the results described - which seems like an oversight. It is interesting that BCS treatment does have a full rescue with cydABCD complementation, while TTM treatment does not. Is there an explanation for this?

      We thank the reviewer for raising this issue. We have attempted several different complementation constructs, including CydAB alone and different promoters, to address the partial complementation in question. However, we do not have an adequate explanation for this partial complementation. As the reviewer notes, the partial complementation is only evident with TTM, not BCS. However, we cannot speculate on the reason for this difference at present.  We have added a note to the text in the results section noting this difference. 

      (5) Figure 2F - is there a reason for the change in TTM concentrations (50 μM TTM vs 10 μM TTM)? Is the concentration for Q203 in both single treatment and combinatory tests 100nM?  

      We have clarified the 100nm Q203 concentration in the figure legend. To avoid confusion, we have removed the 50µM TTM condition from panel F because the growth inhibition phenotype of 10µM is shown in panel E and is the comparator for the combined TTM/Q203 condition in panel F. 

      (6) Figure 3A - I assume d0 = day 0, d3 = day 3. This should be defined.

      We have modified the legend to clarify these abbreviations. 

      (7) Figure 4B - as complementation of nrp for ΔnrpΔcydAB returns levels back to WT, I assume there is no attenuation with ΔcydAB alone? Clarification would be appreciated.

      The mouse phenotype of M. tuberculosis D_cydAB_ is reported here:

      https://www.pnas.org/doi/10.1073/pnas.1706139114#sec-1 and this paper is reference 22 of the paper and was noted in the discussion. 

      Reviewer #2 (Recommendations for the authors):

      In vitro conditions that require SodC could reveal a role for the chalkophore (ie., exposure to extracellular or periplasmic superoxide stress under low iron conditions). Some minor confusion exists with the terminology around the two oxidases found in M. tuberculosis. The bcc:aa3 oxidase is a supercomplex between the reductase and oxidase complexes. This point should be clarified in the introduction as the term supercomplex isn't used until later in line 194 and without definition. Referring to the bcc:aa3 supercomplex as an oxidase is fine but is sometimes confusing especially when mentioning the target of Q203 is the oxidase as it targets the reductase portion of the supercomplex.

      We thank the reviewer for this point. We have modified the text to refer to the supercomplex at first mention and modified subsequent mentions to be clearer. 

      In the RNA preparation section boxes appear in several places where spaces should be.

      We do not see these boxes so we suspect this is a conversion error of some type. 

      Reviewer #3 (Recommendations for the authors):

      The authors have very carefully performed their studies and their main conclusions are amply supported by the data. The manuscript is also very clearly written, and easily accessible to a broad audience interested in both bioinorganic chemistry and mycobacteria. I have two recommendations:

      (1) I agree that the evidence shows that chalkophores provide copper to cytochrome bcc:aa3. Under lab-culture conditions, it could well be that, when cytochrome bd is deleted or inhibited, cytochrome bcc:aa3 is rate limiting. Under lab-culture conditions, it is also clear that only the expression of a select number of enzymes is affected. However, this does not mean that cytochrome bcc:aa3 is the ONLY enzyme that receives copper from chalkophores. Thus, under infection conditions, other copper enzymes might be important. For instance, M. tuberculosis expresses a Cu-Zn superoxide dismutase. In summary, perhaps the authors would consider changing the wording of statements such as that in Figure 2E and the conclusions drawn in the discussion.

      This comment concerns the question of whether the bcc:aa3 respiratory supercomplex is the only target of chalkophore delivered copper. In culture, our experiments suggest that the supercomplex is the only target. The evidence for this claim is in Figure 2E and F. In 2E, we show that M. tuberculosis D_ctaD_ (a subunit of the bcc:aa3 supercomplex) is growth impaired, copper chelation with TTM does not exacerbate that growth defect, and that a D_ctaD_D_nrp_ double mutant is no more sensitive to TTM than D_ctaD_. These data indicate that role of the chalkophore in protecting against copper deprivation is absent when the bcc:aa3 supercomplex is missing. Similar results were obtained with Q203 (Figure 2F). Q203 or TTM arrest growth of M. tuberculosis D_nrp, but the combination has no additional effect, indicating that when Q203 is inhibiting _bcc:aa3, the chalkophore has no additional role. However, we agree with the reviewers that we cannot exclude the possibility that during infection, there is an additional target of chalkophore mediated Cu acquisition. We have added the following to the discussion: “Although chalkophore mediated protection of the bcc:aa3 supercomplex is an important virulence function, we cannot exclude the possibility that additional copper dependent enzymes use chalkophore delivered copper during infection.”

      (2) There is a difference between copper-uptake (e.g. by chalkophores) and the maturation of metallo-enzymes. A short paragraph discussing knowledge from other bacteria in this area would help understand the role chalkophores (e.g. see 10.1128/mBio.00065-18 or 10.1111/mmi.14701). This could possibly be extended with a genome analysis to check which other proteins are present in M. tuberculosis.

      We thank the reviewer for this point. We agree that our data does not distinguish between 1) a generic role for the chalkophore in copper uptake, with the ultimate candidate metalloenzyme rendered dysfunctional by copper loss, and 2) the chalkophore being an intrinsic part of the cytochrome maturation pathway and interacting directly with the target enzymes. We have added this point to the discussion but have not otherwise added the suggested full discussion of metalloenzyme maturation as we believe this discussion is beyond the scope of our data. 

      Finally, can I suggest the labels d0 and d3 are made clearer in Figure 3A (and defined in the legend).

      We have modified the legend to be clearer.

    1. Author response:

      The following is the authors’ response to the previous reviews

      We thank the editors and Reviewers 1 and 3 for their though6ul consideration of our manuscript. The present revision is submitted to address comments raised concerning rank determinations and the following sentence in the editorial assessment:

      The evidence that food-washing is deliberate is compelling, but the evidence for variable and adaptive investment depending on rank, including the fitness-relevance and ultimate evolutionary implications of the findings, is incomplete given limitations of the experimental design.

      Close reading of this sentence reveals two parallel threads. The first can be read as “…evidence for variable rank is incomplete given the limitations of the experimental design,” whereas the second can be read as “…evidence for adaptive investment and fitness is incomplete given the limitations of the experimental design.” The first alludes to a critique of our methods, while the second alludes to points of discussion unrelated to our experimental design. Unpacking this sentence is important because it casts the totality of our paper as “incomplete,” a word of consequence for early-career scholars because it prevents indexing in Web of Science.

      For clarity, we will refer to these topics as Thread 1 and Thread 2 in the following response.

      Thread 1 seems rooted in a comment made by Reviewer 1, which is reproduced below:

      I am still struck that there was an analysis of only trials where <3 individuals are present. If rank was important, I would imagine that behavior might be different in social contexts when theA, scrounging, policing, aggression, or other distractions might occur-- where rank would have effects on foraging behavior. Maybe lower rankers prioritize rapid food intake then. If rank should be related to investment in this behavior, we might expect this to be magnified (or different) in social contexts where it would affect foraging. It might just be that the data was too hard to score or process in those settings, or the analysis was limited. Additionally, I think that more robust metrics of rank from more densely sampled focal follow data would be a beJer measure, but I acknowledge the limitations in getting the ideal. Since rank is central to the interpretation of these results, I think that reduced social contexts in which rank was analyzed and the robustness of the data from which rank was calculated and analyzed are the main weaknesses of the evidence presented in this paper.

      We are grateful for this perspective of Reviewer 1, but it puts us in an uncomfortable position. We must respond rather forcefully because of its influence on the above assessment. A problem with R1’s comment is that it uses the word “foraging” (a behavior we did not study) instead of “cleaning” (the behavior we did study). Still, we can substitute the latter word with the former to get the gist of it. 

      R1 criticizes our methods as a prelude for imagining the behaviors of our study animals, a form of conjecture. R1 correctly supposes a positive relationship between the number of animals and the intensity of competition for a limited food resource, a well-known phenomenon; and, yes, the food in each trial was decidedly limited, being fixed at nine cucumber slices. But R1 incorrectly presumes rank effects on cleaning under conditions of intense food competition. When the number of monkeys participating in a trial exceeded the number of feeding stations (n = 3), we saw little or no cleaning effort, either brushing or washing. So, rank effects on cleaning are immaterial under these conditions. As our study goals were narrowly focused on detecting individual propensities, or choices, as a function of rank, we limited our analysis to trials involving three monkeys or fewer. In retrospect, we admit that we should have provided better justification for our choice of trials, so we’ve edited one of our sentences:

      Original sentence 

      Formerly lines 219-220: To minimize the potential confounding effects of dominance interactions, we analyzed trials with ≤ 3 monkeys.

      Revised sentence

      Current lines 219-224: We excluded trials from analysis if the number of participating monkeys exceeded the number of feeding stations, as these conditions produced high levels of feeding competition with scant cleaning behavior. Such conditions effectively erased individual variation in sand removal, the topic motivating our experiment. Accordingly, we analyzed trials with ≤ 3 monkeys, putting 937 food-handling bouts into the GLMM statistical models, which included data on individual rank, sex, and sand treatment.

      R1’s final criticism – “I think that more robust metrics of rank from more densely sampled focal follow data would be a better measure, but I acknowledge the limitations in getting the ideal” – seems to imply that rank data were collected during our experiment. On the contrary, we determined ranks from five years of focal follows preceding the experiment, achieving the very standard that R1 describes as ideal. The relevant text appeared on lines 165-169 in version 2.0:

      To determine the rank-order of adults, we recorded dyadic agonistic interactions and their outcomes (i.e., aggression, supplants, and silent-bared-teeth displays of submission) during 5min focal follows of individuals based on a randomized order of continuous rotation (Tan et al., 2018). In some cases, these data were supplemented with ad libitum observations. This protocol existed during five years (2013-2018) of continual observations before we conducted our experiment in July-August 2018. 

      Naturally, we were puzzled by R1’s dismissal of our methods, as well as R1’s conclusion, reached without evidence, that “[the] reduced social contexts in which rank was analyzed and the robustness of the data from which rank was calculated and analyzed are the main weaknesses of the evidence presented in this paper.” It is unsubstantiated assertation with no definition of robustness, making it difficult for anyone to objectively assess the quality of our data.

      We detect in R1’s words some unfamiliarity with the social organization of our study species, which is fair enough. To better orient readers to the dominance hierarchy of Macaca fascicularis, and to boost reader confidence in the volume and quality of our rank data, we have added several sentences to this section of the manuscript, lines 169-183:

      Macaques form multi-male multi-female (polygynandrous) social groups with individual dominance hierarchies. In M. fascicularis, the hierarchy is strictly linear and extremely steep, meaning aggression is unidirectional (de Waal, 1977; van Noordwijk and van Schaik, 2001) with profound asymmetries in outcomes for individuals of adjacent ranks (Balasubramaniam et al., 2012). Further, the dominance hierarchies of philopatric females are stable and predictable. Daughters follow the pattern of youngest ascendancy, ranking just below their mothers with few known exceptions among older sisters (de Waal, 1977; van Noordwijk and van Schaik, 1999). Taken together, these species traits are conducive to unequivocal rank determinations. 

      To determine the rank-order of adults in our study group, we recorded dyadic agonistic interactions and their outcomes (i.e., aggression, supplants, and silent-bared-teeth displays of submission) during 5-min focal follows of individuals based on a randomized order of continuous rotation (Tan et al., 2018). These data were supplemented with ad libitum observations and all rank determinations were updated monthly, and when males immigrated or emigrated. This protocol predates our experiment in July-August 2018, representing 970 hr of focal data during five years of systematic study (2013-2018). 

      Thread 2 criticizes our evidence for adaptive investment and fitness, describing it is a limitation of our experimental design. Accordingly, the totality of our experiment was classified as “incomplete.” Yet, our experiment was never designed to collect such evidence, and we make no claims of having it. Rather, we discussed potential fitness consequences to highlight the broader significance of our study, connecting it diverse bodies of literature, from evolutionary theory to paleoanthropology. Our intent was to follow the conventions of scientific writing; to put our results into conversation with the wider literature and set an agenda for future research.

      On reflection, Thread 2 seems to pivot around something as arbitrary as structure. Previously, our results and discussion were combined under a single section header (“Results and Discussion”), a stylistic choice to economize words. Our manuscript is a Short Report, which is limited to 1,500 words of main text. But this level of concision proved counterproductive. It blurred our results and discussion in the minds of readers. Indeed, Reviewer 3 described it as “misleading,” a barbed word that accomplishes the same act attributed to us. To counter this perspective, we have simply partitioned our Results (now “Experimental Results”) and Discussion to draw a sharper distinction between the two components of our paper.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Muramoto and colleagues have examined a mechanism by which the executioner caspase Drice is activated in a non-lethal context in Drosophila. The authors have comprehensively examined this in the Drosophila olfactory receptor neurons using sophisticated techniques. In particular, they had to engineer a new reporter by which non-lethal caspase activation could be detected. The authors conducted a proximity labeling experiment and identified Fasciclin 3 as a key protein in this context. While the removal of Fascilin 3 did not block non-lethal caspase activation (likely because of redundant mechanisms), its overexpression was sufficient to activate non-lethal caspase activation.

      Strengths:

      While non-lethal functions of caspases have been reported in several contexts, far less is known about the mechanisms by which caspases are activated in these non-lethal contexts. So, the topic is very timely. The overall detail of this work is impressive and the results for the most part are wellcontrolled and justified.

      Weaknesses:

      The behavioral results shown in Figure 6 need more explanation and clarification (more details below). As currently shown, the results of Figure 6 seem uninterpretable. Also, overall presentation of the Figures and description in legends can be improved.

      We sincerely thank the reviewer for their highly positive evaluation of our study, particularly from a technical perspective. We also greatly appreciate the valuable comments provided on our manuscript. In response, we have revised the manuscript with a particular focus on Figure 6, as well as the overall presentation of the figure and its description in the legends, in accordance with the reviewer’s suggestions. For further clarification, please refer to our detailed point-by-point responses provided below.

      Reviewer #2 (Public review):

      In this study, the authors investigate the role of caspases in neuronal modulation through non-lethal activation. They analyze proximal proteins of executioner caspases using a variety of techniques, including TurboID and a newly developed monitoring system based on Gal4 manipulation, called MASCaT. They demonstrate that overexpression of Fas3G promotes the non-lethal activation of caspase Dronc in olfactory receptor neurons. In addition, they investigate the regulatory mechanisms of non-lethal function of caspase by performing a comprehensive analysis of proximal proteins of executioner caspase Drice. It is important to point out that the authors use an array of techniques from western blot to behavioral experiments and also that the generated several reagents, from fly lines to antibodies.

      This is an interesting work that would appeal to readers of multiple disciplines. As a whole these findings suggest that overexpression of Fas3G enhances a non-lethal caspase activation in ORNs, providing a novel experimental model that will allow for exploration of molecular processes that facilitate caspase activation without leading to cell death.

      We sincerely thank the reviewer for their highly positive evaluation of our study, particularly from a methodological perspective. We also greatly appreciate the valuable comments provided on our manuscript. In response, we have revised the manuscript in line with the reviewer’s suggestions. For further clarification, please refer to our detailed point-by-point responses provided below.

      Reviewing Editor comments:

      I am pleased to let you know that our reviewers found the results in your paper important and the evidence compelling. There are a few minor comments and a point was raised regarding figure 6 for which further details were asked. Please see the reviewer's comments. We are looking forward to receiving an updated version of your very interesting paper.

      We are grateful to you and the reviewers for dedicating time to review our manuscript and for providing insightful comments and suggestions. We have revised our manuscript in line with the reviewers' feedback. The major revision involves clarifying the two-choice preference assay presented in Figure 6. Details of these revisions are provided in our point-by-point responses to the reviewers’ comments below. The new and extensively modified sections of text are highlighted in blue. We have introduced new panels (Figures 1D, 3D, 6B, and 6C) and made modifications to Figure 6A. The previous Figure 1D has been relocated to Figure 1–figure supplement 1B. Additionally, our detailed responses to the reviewers’ comments are also highlighted in blue within the point-by-point response section. With all concerns and suggestions from the Editor and reviewers addressed, our conclusion—that executioner caspase is proximal to Fasciclin 3 which facilitates non-lethal activation in Drosophila olfactory receptor neurons—is now more robustly supported. We are confident that our revised manuscript makes a significant contribution to the fields of caspase function and neurobiology. We remain hopeful that the reviewers will find it suitable for publication in eLife.

      Reviewer #1 (Recommendations for the authors):

      The main comment here is related to Figure 6, which needs to be better explained. First, if the results in Figure 6B and C are conducted with young flies, why is the preference index close to 0? Aren't these young flies more attracted to ACV? Second, what are the results with Dronc-RNAi and DroncDN alone? These should be shown to more accurately assess the outcome of Fas3G expression with and without Dronc inhibition. Third, if Fas3G overexpression induces non-lethal caspase activation and a behavioral change, why does Dronc inhibition enhance (and not suppress) this behavioral change?

      We sincerely thank the reviewer for the comment. We used one-week-old young flies for the two-choice preference assay. We found that 16 hours of starvation combined with 25% ACV in the trap elicited a robust attraction behavior to the vinegar (New Figure 6B). In contrast, 4 hours of starvation with 1% ACV in the trap resulted in milder attraction behavior, with the preference index value being close to 0 but still showing a positive trend (New Figure 6B). Since our hypothesis is that non-lethal caspase activation suppresses attraction behavior, and that inhibiting caspase activation could enhance attraction, we used the milder experimental condition for subsequent analyses.

      In the original manuscript, we did not test Dronc inhibition alone because caspase activation is rarely observed in young flies (as demonstrated in Figure 3C, New Figure 3D, etc), suggesting that Dronc inhibition during this stage would not affect behavior. This hypothesis is further supported by previous research showing that inhibition of caspase activity in aged flies restores attraction behavior but does has no effect in young flies (Chihara et al., 2014). To validate this hypothesis, we conducted the two-choice preference assay again, including caspase activity inhibition by Dronc<sup>DN</sup> expression alone. As expected, Dronc inhibition alone did not alter behavior in young flies (New Figure 6C).

      We also observed that Fas3G overexpression promotes a weak, though not statistically significant, enhancement in attraction behavior. Importantly, simultaneous inhibition of caspase activity further enhanced attraction behavior (New Figure 6C). These results suggest that Fas3G overexpression has a dual function: one aspect promotes attraction behavior, while the other induces non-lethal caspase activation. In this context, non-lethal caspase activation appears to counteract the behavioral response, acting as a regulatory brake. To address the reviewer’s comments comprehensively, we included the New Figure 6B and replaced the original Figure 6B and C with New Figure 6C. Additionally, we revised the manuscript text as follows:

      Using a two-choice preference assay with ACV (Figure 6A), we found that 16 hours of starvation combined with 25% ACV in the trap elicited a robust attraction behavior to the vinegar (Figure 6B). In contrast, 4 hours of starvation with 1% ACV in the trap resulted in milder attraction behavior, with the preference index value being close to 0 but still showing a positive trend (Figure 6B). Under the milder experimental condition, we first confirmed that inhibition of caspase activity through expressing Dronc<sup>DN</sup> didn’t affect attraction behavior in young adult (Figure 6C), consistent with a previous report (Chihara et al., 2014).We then observed that the overexpression of Fas3G, which activates caspases, did not impair attraction behavior. Instead, it rather appeared to enhance the tendency for attraction behavior (Figure 6C), suggesting that Fas3G promotes attraction behavior. Finally, we found that inhibiting Fas3G overexpression-facilitated non-lethal caspase activation by expressing Dronc<sup>DN</sup> strongly promoted attraction to ACV (Figure 6C). Overall, these results suggest that Fas3G overexpression has a dual function: it enhances attraction behavior while also triggering non-lethal caspase activation, which counteracts the behavioral response, functioning as a regulatory brake without causing cell death.

      Other minor comments are below:

      The authors should clarify that while they refer to their caspases reporters as "non-lethal caspase reporters", these are caspase reporters in general and can report both lethal and non-lethal caspase activation. Of course, the only surviving cells are those that experience non-lethal caspase activation.

      We thank the reviewer for pointing this out. This reporter can monitor caspase activation with high sensitivity only if the cell is capable of transcribing and translating the reporter proteins following cleavage of the probe, most likely in living cells. However, as mentioned, using the term “non-lethal reporter” is not accurate, as additional experiments are required to determine whether caspase activation leads to cell death. Therefore, we removed the term “non-lethal” and referred to this reporter simply as a highly sensitive caspase reporter in the revised manuscript.

      Some of the figure panels could be better described in the legends (e.g. Figure 1E, 1F, 4E, 4F).

      We thank the reviewer for the comment. We have included additional explanations in the figure legends throughout the manuscript.

      In Figure 3C, the OL and AL regions should be marked in the figure as done in Figure 1C.

      We thank the reviewer for the comment. We have marked OL and AL regions in Figure 3C and Figure 2A as in Figure 1C.

      In Figures 4A and B, the authors should rearrange the order of the x-axis to reflect the order that appears in the text (Dronc first).

      We thank the reviewer for the comment. We have rearranged the order of labels on the X-axis to reflect the order that appears in the text.

      In Figure 6B, do the colors imply anything? If so, it should be explained. 

      We thank the reviewer for pointing this out. We intended to use the colors where the light blue bars represent Fas3G overexpression, while the red dots indicate caspase-activated conditions. In the New Figure 6C, we used light blue dots for Fas3G overexpression and red bars for caspase-activated conditions. We have added an explanation in the figure legend. In addition, we have removed the colors in Figure 4B and have added an explanation in the figure legend in Figure 4D.  

      Reviewer #2 (Recommendations for the authors):

      (1) For the methods section make a table for the lines, the way they are listed is not the most easy to read.

      We thank the reviewer for the comment. We have listed the fly strains used in this study in Table S3.

      (2) Lines 420 to 573, not sure why this is here, this information should be in the figure or figure legend, or make a table if necessary.

      We thank the reviewer for the comment. We have listed the detailed genotypes corresponding to each figure in Table S4.

      (3) Blocking with donkey serum, do you get better results than bovine?

      We have not conducted tests with bovine serum for immunohistochemistry. Donkey serum was used throughout the manuscript.

      (4) The Methods section is very thorough and complete but I recommend the use of tables to organize some of the reagents used.

      We thank the reviewer for the comment. We have listed the fly strains used in this study in Table S3 and the detailed genotypes corresponding to each figure in Table S4.

      (5) Line 647 spells out LC-MS/MS.

      We thank the reviewer for pointing this out. We have provided the full spelling as “liquidchromatography-tandem mass spectrometry”.

      (6) Line 808 spells out ACV (apple cider vinegar) and MQ (MilliQ water).

      We thank the reviewer for pointing this out. We have provided the full spelling as suggested.

      (7) Figure 1D. Why do you use only females? 

      We thank the reviewer for pointing this out. In the original manuscript, we analyzed female flies by crossing each Gal4 strain with UAS-Drice-RNAi; Drice::V5::TurboID virgin females. In this case, because Pebbled-Gal4 is located on X chromosome, we could only use female flies for the analysis. To address this, we examined the expression pattern in males flies by crossing each Gal4 virgin female with UAS-Drice-RNAi; Drice::V5::TurboID males. As expected, Drice expression is also mostly depleted when using the ORN-specific Gal4 driver, Pebbled-Gal4, suggesting that Drice expression is predominantly observed in ORNs in males as well. We have added New Figure 1D to present the male data. The original Figure 1D, which presents female data, has been relocated to Figure 1–figure supplement 1B.

      (8) Figure 1D. Be clear about the LN driver used here in the figure.

      We thank the reviewer for pointing this out. We used Orb<sup>0449</sup>-Gal4 driver (#63325, Bloomington Drosophila Stock Center), which has been previously characterized as an LN-specific Gal4 driver (Wu et al., 2017). Accordingly, we have revised “LN-Gal4” to “Orb<sup>0449</sup>-Gal4” throughout the manuscript.

      (9) Figure 1 and Supplementary Figure 1 images are very good. I would recommend the use of a different color palette, to help visualization for colorblind readers (such as this reviewer).

      We apologize for any inconvenience caused. We chose the green/magenta color pair because these are complementary colors, which generally provide better contrast compared to other color pairs. Therefore, we have decided to continue using this pair. To enhance readability, we have intensified the magenta signal in the New Figure 1D and Figure 1–figure supplement 1B. We retained the original magenta signal levels in Figure 1C and Figure 1–figure supplement 1A to avoid oversaturation. Instead, we have kept the Streptavidin-only signal images alongside the color merged images for clarity. We hope these adjustments improve the visualization and help you better interpret the figures.

      (10) Based on Supplementary Figure 1 and based on the fact that Figures 1B and 1C use males, why not used also males for Figure 1D?

      Please refer to our reply to comment #7. We have now included the results for males in the New Figure 1D, which show a similar expression pattern to that observed in females. The results for females originally shown in Figure 1D have been relocated to Figure 1–figure supplement 1B.

      (11) Why were the old versus young flies used for Figure 3 raised at 29C? Why not let the animals age at 25C? The use of 29C throughout the manuscript is not clear.

      We thank the reviewer for pointing this out. Most of the UAS fly strains used in this study, including a Fas3G overexpression line, are UASz lines, which exhibit relatively low expression levels compared to UASt lines (DeLuca and Spradling, 2018). Since the Gal4/UAS system is temperature-dependent (Duffy, 2002), we performed most of the experiments at 29°C to enhance gene expression.

      For the aging experiments, we chose to rear flies at 29°C because higher temperatures accelerate aging including neuronal aging (Okenve-Ramos et al., 2024), allowing for faster experimentation, and 29°C is within the ecologically relevant range of temperatures for Drosophila melanogaster (SotoYéber et al., 2018). Additionally, we confirmed that a subset of olfactory receptor neurons undergo aging-dependent caspase activation at both 29°C and 25°C, as shown in New Figure 3D.

      (12) Why not use an Or42b specific GAL 4 for the aging experiment? What are the odorants that are detected by this ORN? Are any of the odorants behaviorally relevant compounds?

      We thank the reviewer for pointing this out. While the exact odorant detected by Or42b neurons has not been fully determined, these neurons innervate the DM1 region in the antennal lobe, which is activated by ACV. Additionally, Or42b neurons have been shown to be required for attraction behavior to ACV (Semmelhack and Wang, 2009), supporting the relevance of ACV for the behavioral experiment.   We used Or42b-Gal4 to confirm that Or42b neurons undergo aging-dependent caspase activation, which is detectable using the MASCaT system (New Figure 3D). Furthermore, we verified that these neurons exhibit aging-dependent caspase activation at both 25°C and 29°C (New Figure 3D).

      (13) Make the panel lettering in all the figures bigger or bold.

      We thank the reviewer for pointing this out. We have increased the size of the panel lettering and made it bold throughout the figures to improve the readability.

      (14) Line 806. MilliQ water.

      We thank the reviewer for pointing this out. We have ensured that “MilliQ water” is consistently spelled this way throughout the manuscript.

      (15) Figure 6. The authors need to be more clear on the experimental conditions. At what time of the day was this experiment performed? Was the experiment run in DD? Were the flies young or old?

      We thank the reviewer for pointing this out. We performed the assay using one-week-old young flies under constant dark conditions during both the starvation period and the assay. We have added a detailed explanation in the Methods section. For clarity, we have also revised Figure 6A to provide a more detailed explanation of the experimental setup.

      References

      Chihara T, Kitabayashi A, Morimoto M, Takeuchi K-I, Masuyama K, Tonoki A, Davis RL, Wang JW, Miura M. 2014. Caspase inhibition in select olfactory neurons restores innate attraction behavior in aged Drosophila. PLoS Genet 10:e1004437.

      DeLuca SZ, Spradling AC. 2018. Efficient expression of genes in the Drosophila germline using a UAS promoter free of interference by Hsp70 piRNAs. Genetics 209:381–387.

      Duffy JB. 2002. GAL4 system in Drosophila: a fly geneticist’s Swiss army knife. Genesis 34:1–15.

      Okenve-Ramos P, Gosling R, Chojnowska-Monga M, Gupta K, Shields S, Alhadyian H, Collie C, Gregory E, Sanchez-Soriano N. 2024. Neuronal ageing is promoted by the decay of the microtubule cytoskeleton. PLoS Biol 22:e3002504.

      Semmelhack JL, Wang JW. 2009. Select Drosophila glomeruli mediate innate olfactory attraction and aversion. Nature 459:218–223.

      Soto-Yéber L, Soto-Ortiz J, Godoy P, Godoy-Herrera R. 2018. The behavior of adult Drosophila in the wild. PLoS One 13:e0209917.

      Wu B, Li J, Chou Y-H, Luginbuhl D, Luo L. 2017. Fibroblast growth factor signaling instructs ensheathing glia wrapping of Drosophila olfactory glomeruli. Proc Natl Acad Sci U S A 114:7505–7512.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1 (Public Review):

      Summary:

      In this paper, the authors aimed to test the ability of bumblebees to use bird-view and ground-view for homing in cluttered landscapes. Using modelling and behavioural experiments, the authors showed that bumblebees rely most on ground-views for homing.

      Strengths:

      The behavioural experiments are well-designed, and the statistical analyses are appropriate for the data presented.

      Weaknesses:

      Views of animals are from a rather small catchment area.

      Missing a discussion on why image difference functions were sufficient to explain homing in wasps (Murray and Zeil 2017).

      The artificial habitat is not really 'cluttered' since landmarks are quite uniform, making it difficult to infer ecological relevance.

      Thank you for your thorough evaluation of our study. We aimed to investigate local homing behaviour on a small spatial scale, which is ecologically relevant given that the entrance of bumblebee nests is often inconspicuously hidden within the vegetation. This requires bees to locate their nest hole within a confined area. While many studies have focused on larger spatial scales using radar tracking (e.g. Capaldi et al. 2000; Osborne et al. 2013; Woodgate et al. 2016), there is limited understanding of the mechanisms behind local homing, especially in dense environments as we propose here.

      We appreciate your suggestion to include the study by Murray and Zeil (2017) in our discussion. Their research explored the catchment areas of image difference functions on a larger spatial scale with a cubic volume of 5m x 5m x 5m. Aligned with their results, we found that image difference functions pointed towards the location of the objects surrounding the nest when the images were taken above the objects. However, within the clutter, i.e. the dense set of objects surrounding the nest, the model did not perform well in pinpointing the nest position.

      See the new discussion at lines 192-197

      We agree with your comment about the term "clutter". Therefore, we referred to our landmark arrangement as a "dense environment" instead. Uniformly distributed objects do indeed occur in nature, as seen in grasslands, flower meadows, or forests populated with similar plants.

      See line 20 and we changed the wording throughout the manuscript and figures.

      Reviewer 1 (Recommendations): 

      The manuscript is well written, nicely designed experiments and well illustrated. I have a few comments below.

      It would be useful to discuss known data of learning flights in bumblebees, and the height or catchment area of their flights. This will allow the reader to compare your exp design to the natural learning flights.

      In our study, we first focused on demonstrating the ability to solve a homing task in a dense environment. As we observed the bees returning within the dense environment and not from above it (contrary to the model predictions), we investigated whether they flew above it during their first flights. The bees did indeed fly above, demonstrating their ability to ascend and descend within the constellation of objects (see Supplementary Material Fig. 22).

      In nature, the learning flight of bumblebees may cover several decametres, with the loops performed during these flights increasing with flight time (e.g. Osborne et al. 2013; Woodgate et al. 2016). A similar pattern can be observed on a smaller spatial scale (e.g. Philippides et al. 2013). Similar to the loops that extend over time, the bees gradually gain altitude (Lobecke et al., 2018). However, these observations come from studies where few conspicuous objects surround the nest entrance.

      Although our study  focussed on the performance in goal finding in cluttered environments, we now also address the issue of learning flights in the discussion, as learning flights are the scaffolding of visual learning. We have already conducted several learning flight experiments to fill the knowledge gap mentioned above. These will allow us in a forthcoming paper to compare learning flights in this environment with the existing literature (Sonntag et al., 2024).

      We added a reference to this in the discussion (lines 218-219 and 269-272)

      Include bumblebee in the title rather than 'bees'.

      We adapted the title accordingly:

      “Switching perspective: Comparing ground-level and bird’s-eye views for bumblebees navigating dense environments”

      I found switching between bird-views and frog-views to explain bee-views slightly tricky to read. Why not use 'ground-views', which you already have in the title?

      We agree and adapted the wording in the manuscript according to this suggestion.

      I am not convinced there is evidence here to suggest the bees do not use view-based navigation, because of the following: In L66: unclear what were the views centred around, I assume it is the nest. Is 45cm above the ground the typical height gained by bumblebees during learning flight? The clutter seems to be used more as an obstacle that they are detouring to reach the goal, isn't it?

      Based on many previous studies, view-based navigation can be assumed to be one of the plausible mechanisms bees use for homing (Cartwright & Collett, 1987; Doussot et al., 2020; Lehrer & Collett, 1994; Philippides et al., 2013; Zeil, 2022). In our tests, when the dense environment was shifted to a different position in the flight arena, almost no bees searched at the real location of the nest entrance but at the fictive new location within the dense environment, indicating that the bees assumed  the nest to be located within the dense environment, and therefore  that vision played a crucial role for homing. We thus never meant that the bees were not using view-based navigation. We clarified this point in the revised manuscript.

      See lines 247-248, 250-259, added visual memory to schematic in Fig. 6

      In our model simulations, the memorised snapshots were centred around the nest. However, we found that a multi-snapshot model could not explain the behaviour of the bees. This led us to suggest that bees likely employ acombination of multiple mechanisms for navigation.

      We refined paragraph about possible alternative homing mechanisms. See lines  218-263

      The height of learning flights has not been extensively investigated in previous studies, and typical heights are not well-documented in the literature. However, from our observations of the first outbound flights of bumblebees within the dense environment, we noted that they quickly increased their altitude and then flew above the objects. Since the objects had a height of 0.3 metres, we chose 0.45 metres as a height above the objects for our study.

      Furthermore, the nest is positioned within the arrangement of objects, making it a target the bees must actively find rather than detour around.

      I think a discussion to contrast your findings with Murray and Zeil 2017 will be useful. It was unclear to me whether the flight arena had UV availability, if it didn't, this could be a reason for the difference.

      We referred to this study in the discussion of the revised paper (see our response to the public review). Lines 192-197

      As in most lab studies on local homing, the bees did not have UV light available in the arena. Even without this, they were successful in finding their nest position during the tests. We clarified that in the revised manuscript. See line 334-336

      Figure 2A, can you add a scale bar?

      We added a scale bar to the figure showing the dimensions of the arena. See Fig. 2

      The citation of figure orders is slightly off. We have Figure 5 after Figure 2, without citing Figures 3 and 4. Similarly for a few others.

      We carefully checked the order of cited figures and adapted them.

      Reviewer 2 (Public Review):

      Summary:

      In a 1.5m diameter, 0.8m high circular arena bumblebees were accustomed to exiting the entrance to their nest on the floor surrounded by an array of identical cylindrical landmarks and to forage in an adjacent compartment which they could reach through an exit tube in the arena wall at a height of 28cm. The movements of one group of bees were restricted to a height of 30cm, the height of the landmark array, while the other group was able to move up to heights of 80cm, thus being able to see the landmark array from above.

      During one series of tests, the flights of bees returning from the foraging compartment were recorded as they tried to reach the nest entrance on the floor of the arena with the landmark array shifted to various positions away from the true nest entrance location. The results of these tests showed that the bees searched for the net entrance in the location that was defined by the landmark array.

      In a second series of tests, access to the landmark array was prevented from the side, but not from the top, by a transparent screen surrounding the landmark array. These tests showed that the bees of both groups rarely entered the array from above, but kept trying to enter it from the side.

      The authors express surprise at this result because modelling the navigational information supplied by panoramic snapshots in this arena had indicated that the most robust information about the location of the nest entrance within the landmark array was supplied by views of the array from above, leading to the following strong conclusions: line 51: "Snapshot models perform best with bird's eye views"; line 188: "Overall, our model analysis could show that snapshot models are not able to find home with views within a cluttered environment but only with views from above it."; line 231: "Our study underscores the limitations inherent in snapshot models, revealing their inability to provide precise positional estimates within densely cluttered environments, especially when compared to the navigational abilities of bees using frog's-eye views."

      Strengths:

      The experimental set-up allows for the recording of flight behaviour in bees, in great spatial and temporal detail. In principle, it also allows for the reconstruction of the visual information available to the bees throughout the arena.

      The experimental set-up allows for the recording of flight behaviour in bees, in great spatial and temporal detail. In principle, it also allows for the reconstruction of the visual information available to the bees throughout the arena.

      Weaknesses:

      Modelling:

      Modelling left out information potentially available to the bees from the arena wall and in particular from the top edge of the arena and cues such as cameras outside the arena. For instance, modelled IDF gradients within the landmark array degrade so rapidly in this environment, because distant visual features, which are available to bees, are lacking in the modelling. Modelling furthermore did not consider catchment volumes, but only horizontal slices through these volumes.

      When we started modelling the bees’ homing based on image-matching, we included the arena wall. However, the model simulations pointed only coarsely towards the dense environment but not toward the nest position. We hypothesised that the arena wall and object location created ambiguity. Doussot et al. (2020) showed that such a model can yield two different homing locations when distant and local cues are independently moved. Therefore, we reduced the complexity of the environment by concentrating on the visual features, which were moved between training and testing (neither the camera nor the wall were moved between training and test). We acknowledge that this information should have been provided to substantiate our reasoning. As such, we included model results with the arena wall in the supplements of the revised paper. See lines 290-293, Figures S17-21

      We agree that the catchment volumes would provide quantitatively more detailed information as catchment slices. Nevertheless, since our goal was  to investigate if bees would use ground views or bird's eye views to home in a dense environment, catchment slices, which provide qualitatively similar information as catchment volumes, are sufficient to predict whether ground or bird's-eye views perform better in leading to the nest. Therefore, we did not include further computations of catchment volumes. (ll. 296-297)

      Behavioural analysis:

      The full potential of the set-up was not used to understand how the bees' navigation behaviour develops over time in this arena and what opportunities the bees have had to learn the location of the nest entrance during repeated learning flights and return flights.

      Without a detailed analysis of the bees' behaviour during 'training', including learning flights and return flights, it is very hard to follow the authors' conclusions. The behaviour that is observed in the tests may be the result of the bees' extended experience shuttling between the nest and the entry to the foraging arena at 28cm height in the arena wall. For instance, it would have been important to see the return flights of bees following the learning flights shown in Figure 17. Basically, both groups of bees (constrained to fly below the height of landmarks (F) or throughout the height of the arena (B)) had ample opportunities to learn that the nest entrance lies on the floor of the landmark array. The only reason why B-bees may not have entered the array from above when access from the side was prevented, may simply be that bumblebees, because they bumble, find it hard to perform a hovering descent into the array.

      A prerequisite for studying the learning flight in a given environment is showing that the bees manage to return to their home. Here, our primary goal was to demonstrate this within a dense environment. While we understand that a detailed analysis of the learning and return flights would be valuable, we feel this is outside the scope of this particular study.

      Multi-snapshot models have been repeatedly shown to be sufficient to explain the homing behaviour in natural as well as artificial environments(Baddeley et al., 2012; Dittmar et al., 2010; Doussot et al., 2020; Möller, 2012; Wystrach et al., 2011, 2013; Zeil, 2012). A model can not only be used to replicate but also to predict a given outcome and shape the design of experiments. Here, we used the models to shape the experimental design, as it does not require the entire history of the bee's trajectory to be tested and provides interesting insight into homing in diverse environments.

      Since we observed behavioural responses different from the one suggested by the models, it becomes interesting to look at the flight history. If we had found an alignment between the model and the behaviour, looking at thehistory would have become much less interesting. Thus our results raise an interest in looking at the entire flight history, which will require not only effort on the recording procedure, but as well conceptually. At the moment the underlying mechanisms of learning during outbound, inbound, exploration, or orientation flight remains evasive and therefore difficult to test a hypothesis. A detailed description of the flight during the entire bee history would enable us to speculate alternative models to the one tested in our study, but would remain limited in testing those.

      While we acknowledge that the bees had ample opportunities to learn the location of the nest entrance, we believe that their behaviour of entering the dense environment at a very low altitude cannot be solely explained by extended experience. It is possible that the bees could have also learned to enter at the edge of the objects or above the objects before descending within the dense environment.

      General:

      The most serious weakness of the set-up is that it is spatially and visually constrained, in particular lacking a distant visual panorama, which under natural conditions is crucial for the range over which rotational image difference functions provide navigational guidance. In addition, the array of identical landmarks is not representative of natural clutter and, because it is visually repetitive, poses un-natural problems for view-based homing algorithms. This is the reason why the functions degrade so quickly from one position to the next (Figures 9-12), although it is not clear what these positions are (memory0-memory7).

      In conclusion, I do not feel that I have learnt anything useful from this experiment; it does suggest, however, that to fully appreciate and understand the homing abilities of insects, there is no alternative but to investigate these abilities in the natural conditions in which they have evolved.

      We respectfully disagree with the evaluation that our study does not provide new insights due to the controlled laboratory conditions. Both field and laboratory research are necessary and should complement each other. Dismissing the value of controlled lab experiments would overlook the contributions of previous lab-based research, which has significantly advanced our understanding of animal behaviour. It is only possible to precisely define the visual test environments under laboratory conditions and to identify the role of the components of the environment for the behaviour through targeted variation of them. These results yield precious information to then guide future field-based experiments for validation.

      Our laboratory settings are a kind of abstraction of natural situations focusing on those aspects that are at the centre of the research question. Our approach here was based on the knowledge that bumblebees have to find their inconspicuous nest hole in nature, which is difficult to find in often highly dense environments, and ultimately on a spatial scale in the metre range. We first wanted to find out if bumblebees can find their nest hole under the particularly challenging condition that all objects surrounding the nest hole are the same. This was not yet clear. Uniformly distributed objects may, however, also occur in nature, as seen with visually inconspicuous nest entrances of bumblebees in grass meadows, flower meadows, or forests with similar plants. We agree that the term "clutter" is not well-defined in the literature and now refer to the  environment as a "dense environment."

      We changed the wording throughout the manuscript and figures.

      Despite the lack of a distant visual panorama, or also UV light, wind, or other confounding factors inherent to field work conditions, the bees successfully located the nest position even when we shifted the dense environment within the flight arena. We used rotational-image difference functions based on snapshots taken around the nest position to predict the bees' behaviour, as this is one of the most widely accepted and computationally most parsimonious assessments of catchment areas in the context of local homing. This approach also proved effective in our more restricted conditions, where the bees still managed to pinpoint their home.

      Reviewer 2 (Recommendations):

      (1) Clarify what is meant by modelling panoramic images at 1cm intervals (only?) along the x-axis of the arena.

      The panoramic images were taken along a grid with 0.5cm steps within the dense environment and 1cm steps in the rest of the arena. A previous study (Doussot et al., 2020) showed successful homing of multi-snapshot models in an environment of similar scale with a grid with 2cm steps. Therefore, we think that our scaling is sufficiently fine. We apologise for the missing information in the method section and added it to the revised manuscript. See lines 286-287

      (2) In Figures 9-12 what are the memory0 to memory7 locations and reference image orientations? Explain clearly which image comparisons generated the rotIDFs shown.

      Memory 0 to memory 7 are examples of the eight memorised snapshots, which are aligned in the nest direction and taken around the nest. In the rotIDFs shown, we took memory 0 as a reference image, and compared the 7 others by rotating them against memory 0. We clarified that in the revised manuscript.

      See revised figure caption in Fig. S9 – 16

      (3) Figure 9 seems to compare 'bird's-eye', not 'frog's-eye' views.

      We apologise for that mistake and carefully double-checked the figure caption.

      See revised figure caption Fig. S9

      (4) Why do you need to invoke a PI vector (Figure 6) to explain your results?

      Since the bees were able to home in the dense environment without entering the object arrangement from above but from the side, image matching alone could not explain the bees’ behaviour. Therefore, we suggest, as an hypothesis for future studies, a combination of mechanisms such as a home vector. Other alternatives, perhaps without requiring a PI vector, may explain the bees’ behaviour, and we will welcome any future contributions from the scientific community.

      References

      Baddeley, B., Graham, P., Husbands, P., & Philippides, A. (2012). A Model of Ant Route Navigation Driven by Scene Familiarity. PLoS Computational Biology,8(1), e1002336. https://doi.org/10.1371/journal.pcbi.1002336

      Capaldi, E. A., Smith, A. D., Osborne, J. L., Farris, S. M., Reynolds, D. R., Edwards, A. S., Martin, A., Robinson, G. E., Poppy, G. M., & Riley, J. R. (2000).

      Ontogeny of orientation flight in the honeybee revealed by harmonic radar. Nature, 403. https://doi.org/10.1038/35000564

      Cartwright, B. A., & Collett, T. S. (1987). Landmark maps for honeybees. Biological Cybernetics, 57(1), 85–93. https://doi.org/10.1007/BF00318718

      Dittmar, L., Stürzl, W., Baird, E., Boeddeker, N., & Egelhaaf, M. (2010). Goal seeking in honeybees: Matching of optic flow snapshots? Journal of Experimental Biology, 213(17), 2913–2923. https://doi.org/10.1242/jeb.043737

      Doussot, C., Bertrand, O. J. N., & Egelhaaf, M. (2020). Visually guided homing of bumblebees in ambiguous situations: A behavioural and modelling study. PLoS Computational Biology, 16(10). https://doi.org/10.1371/journal.pcbi.1008272

      Lehrer, M., & Collett, T. S. (1994). Approaching and departing bees learn different cues to the distance of a landmark. Journal of Comparative Physiology A, 175(2), 171–177. https://doi.org/10.1007/BF00215113

      Lobecke, A., Kern, R., & Egelhaaf, M. (2018). Taking a goal-centred dynamic snapshot as a possibility for local homing in initially naïve bumblebees. Journal of Experimental Biology, 221(2), jeb168674. https://doi.org/10.1242/jeb.168674

      Möller, R. (2012). A model of ant navigation based on visual prediction. Journal of Theoretical Biology, 305, 118–130. https://doi.org/10.1016/j.jtbi.2012.04.022

      Murray, T., & Zeil, J. (2017). Quantifying navigational information: The catchment volumes of panoramic snapshots in outdoor scenes. PLOS ONE, 12(10), e0187226. https://doi.org/10.1371/journal.pone.0187226

      Osborne, J. L., Smith, A., Clark, S. J., Reynolds, D. R., Barron, M. C., Lim, K. S., & Reynolds, A. M. (2013). The ontogeny of bumblebee flight trajectories: From Naïve explorers to experienced foragers. PLoS ONE, 8(11). https://doi.org/10.1371/journal.pone.0078681

      Philippides, A., de Ibarra, N. H., Riabinina, O., & Collett, T. S. (2013). Bumblebee calligraphy: The design and control of flight motifs in the learning and return flights of Bombus terrestris. Journal of Experimental Biology, 216(6), 1093–1104. https://doi.org/10.1242/jeb.081455

      Sonntag, A., Lihoreau, M., Bertrand, O. J. N., & Egelhaaf, M. (2024). Bumblebees increase their learning flight altitude in dense environments. bioRxiv, 2024.10.14.618154. https://doi.org/10.1101/2024.10.14.618154

      Woodgate, J. L., Makinson, J. C., Lim, K. S., Reynolds, A. M., & Chittka, L. (2016). Life-long radar tracking of bumblebees. PLoS ONE, 11(8). https://doi.org/10.1371/journal.pone.0160333

      Wystrach, A., Mangan, M., Philippides, A., & Graham, P. (2013). Snapshots in ants? New interpretations of paradigmatic experiments. Journal of Experimental Biology, 216(10), 1766–1770. https://doi.org/10.1242/jeb.082941

      Wystrach, A., Schwarz, S., Schultheiss, P., Beugnon, G., & Cheng, K. (2011). Views, landmarks, and routes: How do desert ants negotiate an obstacle course? Journal of Comparative Physiology A: Neuroethology, Sensory, Neural, and Behavioral Physiology, 197(2), 167–179. https://doi.org/10.1007/s00359-010-0597-2

      Zeil, J. (2012). Visual homing: An insect perspective. Current Opinion in Neurobiology, 22(2), 285–293. https://doi.org/10.1016/j.conb.2011.12.008

      Zeil, J. (2022). Visual navigation: Properties, acquisition and use of views. Journal of Comparative Physiology A. https://doi.org/10.1007/s00359-022-01599-2

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The manuscript titled "Household clustering and seasonal genetic  variation of Plasmodium falciparum at the community-level in The Gambia" presents a valuable genetic spatio-temporal analysis of  malaria-infected individuals from four villages in The Gambia, covering  the period between December 2014 and May 2017. The majority of samples  were analyzed using a SNP barcode with the Spotmalaria panel, with a  subset validated through WGS. Identity-by-descent (IBD) was calculated  as a measure of genetic relatedness and spatio-temporal patterns of the  proportion of highly related infections were investigated. Related  clusters were detected at the household level, but only within a short  time period.

      Strengths:

      This study offers a valuable dataset, particularly due to its  longitudinal design and the inclusion of asymptomatic cases. The  laboratory analysis using the Spotmalaria platform combined and  supplemented with WGS is solid, and the authors show a linear  correlation between the IBD values determined with both methods,  although other studies have reported that at least 200 SNPs are required for IBD analysis. Data-analysis pipelines were created for (1) variant  filtering for WGS and subsequent IBD analysis, and (2) creating a  consensus barcode from the spot malaria panel and WGS data and  subsequent SNP filtering and IBD analysis.

      Weaknesses:

      Further refining the data could enhance its impact on both the scientific community and malaria control efforts in The Gambia.

      (1) The manuscript would benefit from improved clarity and better  explanation of results to help readers follow more easily. Despite  familiarity with genotyping, WGS, and IBD analysis, I found myself  needing to reread sections. While the figures are generally clear and  well-presented, the text could be more digestible. The aims and  objectives need clearer articulation, especially regarding the rationale for using both SNP barcode and WGS (is it to validate the approach with the barcode, or is it to have less missing data?). In several analyses, the purpose is not immediately obvious and could be clarified.

      The text of the manuscript has now been thoroughly revised. But please let us know if a specific section remains unclear.

      (2) Some key results are only mentioned briefly in the text without  corresponding figures or tables in the main manuscript, referring only  to supplementary figures, which are usually meant for additional detail, but not main results. For example, data on drug resistance markers  should be included in a table or figure in the main manuscript.

      We agree with the reviewer suggesting to move the prevalence of drug resistance markers from supplementary figures (previously Figure S8) to the main manuscript (now Figure 5). If other Figure/Table should be moved to the main manuscript please let us know.

      (3) The study uses samples from 2 different studies. While these are  conducted in the same villages, their study design is not the same,  which should be addressed in the interpretation and discussion of the  results. Between Dec 2014 and Sept 2016, sampling was conducted only in 2 villages and at less frequent intervals than between Oct 2016 to May  2017. The authors should assess how this might have impacted their  temporal analysis and conclusions drawn. In addition, it should be  clarified why and for exactly in which analysis the samples from Dec  2016 - May 2017 were excluded as this is a large proportion of your  samples.

      We have clarified which set of samples was used in our Results (Lines 293-295, 316-319). While two villages were recruited halfway through the study, two villages (J and K, Figure 1C) consistently provided data for each transmission season. Importantly, our temporal analysis accounts for these differences by grouping paired barcodes based on their respective locations (Figure 3B). Despite variations in sampling frequency, we still observe a clear overall decline in relatedness between the ‘0-2 months’ and ‘2-5 months’ groups, both of which include barcodes from all four villages.

      (4) Based on which criteria were samples selected for WGS? Did the  spatiotemporal spread of the WGS samples match the rest of the genotyped samples? I.e. were random samples selected from all times and places,  or was it samples from specific times/places selected for WGS?

      All P. falciparum positive samples were sent for genotyping and whole genome sequencing, ensuring no selection bias. However, only samples with sufficient parasite DNA were successfully sequenced. We have updated the text (Line 129-130) and added a supplementary figure (Figure S4) to show the sample collection broken down by type of data (barcode or genome). High quality genomes are distributed across all time points.

      (5) The manuscript would benefit from additional detail in the methods section.

      Please see our response in the section “Recommendation for the authors”.

      (6) Since the authors only do the genotype replacement and build  consensus barcode for 199 samples, there is a bias between the samples  with consensus barcode and those with only the genotyping barcode. How  did this impact the analysis?

      While we acknowledge the potential for bias between samples with a consensus barcode (based on WGS) and those with genotyping-only barcodes, its impact is minimal. WGS does indeed produce a more accurate barcode compared to SNP genotyping, but any errors in the genotyping barcodes were mitigated by excluding loci that systematically mismatched with WGS data (see Figure S3). Additionally, the use of WGS improved the accuracy of 51 % (216/425) of barcodes, which strengthens the overall quality and validity of our analysis.

      (7) The linear correlation between IBD-values of barcode vs genome is  clear. However, since you do not use absolute values of IBD, but a  classification of related (>=0.5 IBD) vs. unrelated (<0.5), it  would be good to assess the agreement of this classification between the 2 barcodes. In Figure S6 there seem to be quite some samples that would be classified as unrelated by the consensus barcode, while they have  IBD>0.5 in the Genome-IBD; in other words, the barcode seems to be  underestimating relatedness.

      a. How sensitive is this correlation to the nr of SNPs in the barcode?

      We measured the agreement between the two classifications using specificity (0.997), sensitivity (0.841) and precision (0.843) described in the legend of Figure S8. To further demonstrate the good agreement between the two methods, we calculated a Cohen’s kappa value of 0.839 (Lines 226, 290), indicative of a strong agreement (McHugh 2012). As expected, the correlation between IBD values obtained by both methods improves (higher Cohen’s kappa and R<sup>2</sup>) as the cutoff for the minimal number of comparable and informative loci per barcode pair is raised (data not shown).

      (8) With the sole focus on IBD, a measure of genetic relatedness, some of the conclusions from the results are speculative.

      a. Why not include other measures such as genetic diversity, which  relates to allele frequency analysis at the population level (using, for example, nucleotide diversity)? IBD and the proportion of highly  related pairs are not a measure of genetic diversity. Please revise the  manuscript and figures accordingly.

      We agree with the fact that IBD is not a direct measure of genetic diversity, even though both are related (Camponovo et al., 2023). More precisely, IBD is a measure of the level of inbreeding in the population (Taylor et al., 2019). We have updated our manuscript by replacing “genetic diversity” with “genetic relatedness” or “inbreeding/outcrossing” when appropriate. Nucleotide diversity would be relevant if we wanted to compare different settings, e.g. Africa vs Asia, however this is not the case here.

      b. Additionally, define what you mean by "recombinatorial genetic  diversity" and explain how it relates to IBD and individual-level  relatedness.

      We considered the term ‘recombinatorial genetic diversity’ to be equivalent to the level of inbreeding in the population. Because this expression is rather uncommon, we decided to drop it from our manuscript and replace it with “inbreeding/outcrossing”.

      c. Recombination is one potential factor contributing to the loss of  relatedness over time. There are several other factors that could  contribute, such as mobility/gene flow, or study-specific limitations  such as low numbers of samples in the low transmission season and many  months apart from the high transmission samples.

      Indeed, the loss of relatedness could be attributed not only to the recombination of local cases but also to new parasites introduced by imported malaria cases. As we stated in our manuscript, previous studies have shown a limited effect of imported cases on maintaining transmission (Lines 72-74). Nevertheless, we cannot definitely exclude that imported cases have an effect on inbreeding levels, since we do not have access to genetic data of surrounding parasites at the time of the study. We updated the discussion accordingly (Lines 497-501).

      d. By including other measures such as linkage disequilibrium you could  further support the statements related to recombination driving the loss of relatedness.

      This commendable suggestion is actually part of an ongoing project focusing on the sharing of IBD fragments and how it correlates with linkage disequilibrium. However, we believe that this analysis would not fit in the scope of our manuscript which is really about spatio-temporal effects on parasite relatedness at a local scale.

      (9) While the authors conclude there is no seasonal pattern in the  drug-resistant markers, one can observe a big fluctuation in the dhps  haplotypes, which go down from 75% to 20% and then up and down again  later. The authors should investigate this in more detail, as dhps is  related to SP resistance, which could be important for seasonal malaria  chemoprofylaxis, especially since the mutations in dhfr seem near-fixed  in the population, indicating high levels of SP resistance at some of  the time points.

      As the reviewer noted, the DHPS A437G haplotype appears to decrease in prevalence twice throughout our study: from the 2015 and 2016 high transmission seasons to the subsequent 2016 and 2017 low transmission seasons. Seasonal Malaria Chemoprophylaxis (SMC) was carried out in the area through the delivery of sulfadoxine–pyrimethamine plus amodiaquine to children 5 years old and younger during high transmission seasons. As DHPS A437G haplotype has been associated with resistance to sulfadoxine, its apparent increase in prevalence during high transmission seasons could be resulting from the selective pressure imposed on parasites. After SMC, the decrease in prevalence observed during low transmission seasons could be caused by a fitness cost of the mutation favouring wild-type parasites over resistant ones. We updated our manuscript to reflect this relevant observation (Lines 400-405).

      (10) I recommend that raw data from genotyping and WGS should be deposited in a public repository.

      Genotyping data is available in the supplementary table 4 (Table S4). Whole genome sequencing is accessible in a European Nucleotide Archive public repository with the identifiers provided in supplementary table 5 (Table S5). We added references to these tables in the manuscript (Lines 249-250).

      Reviewer #2 (Public review):

      Summary:

      Malaria transmission in the Gambia is highly seasonal, whereby periods  of intense transmission at the beginning of the rainy season are  interspersed by long periods of low to no transmission. This raises  several questions about how this transmission pattern impacts the  spatiotemporal distribution of circulating parasite strains. Knowledge  of these dynamics may allow the identification of key units for targeted control strategies, the evaluation of the effect of selection/drift on  parasite phenotypes (e.g., the emergence or loss of drug resistance  genotypes), and analyze, through the parasites' genetic nature, the  duration of chronic infections persisting during the dry season. Using a combination of barcodes and whole genome analysis, the authors try to  answer these questions by making clever use of the different  recombination rates, as measured through the proportion of genomes with  identity-by-descent (IBD), to investigate the spatiotemporal relatedness of parasite strains at different spatial (i.e., individual, household,  village, and region) and temporal (i.e., high, low, and the  corresponding the transitions) levels. The authors show that a large  fraction of infections are polygenomic and stable over time, resulting  in high recombinational diversity (Figure 2). Since the number of  recombination events is expected to increase with time or with the  number of mosquito bites, IBD allows them to investigate the  connectivity between spatial levels and to measure the fraction of  effective recombinational events over time. The authors demonstrate the  epidemiological connectivity between villages by showing the presence of related genotypes, a higher probability of finding similar genotypes  within the same household, and how parasite-relatedness gradually  disappears over time (Figure 3). Moreover, they show that transmission  intensity increases during the transition from dry to wet seasons  (Figure 4). If there is no drug selection during the dry season and if  resistance incurs a fitness cost it is possible that alleles associated  with drug resistance may change in frequency. The authors looked at the  frequencies of six drug-resistance haplotypes (aat1, crt, dhfr, dhps,  kelch13, and mdr1), and found no evidence of changes in allele  frequencies associated with seasonality. They also find chronic  infections lasting from one month to one and a half years with no  dependence on age or gender.

      The use of genomic information and IBD analytic tools provides the  Control Program with important metrics for malaria control policies, for example, identifying target populations for malaria control and  evaluation of malaria control programs.

      Strength:

      The authors use a combination of high-quality barcodes (425 barcodes  representing 101 bi-allelic SNPs) and 199 high-quality genome sequences  to infer the fraction of the genome with shared Identity by Descent  (IBD) (i.e. a metric of recombination rate) over several time points  covering two years. The barcode and whole genome sequence combination  allows full use of a large dataset, and to confidently infer the  relatedness of parasite isolates at various spatiotemporal scales.

      Reviewer #3 (Public review):

      Summary

      This study aimed to investigate the impact of seasonality on the malaria parasite population genetic. To achieve this, the researchers conducted a longitudinal study in a region characterized by seasonal malaria  transmission. Over a 2.5-year period, blood samples were collected from  1,516 participants residing in four villages in the Upper River Region  of The Gambia and tested the samples for malaria parasite positivity.  The parasites from the positive samples were genotyped using a genetic  barcode and/or whole genome sequencing, followed by a genetic  relatedness analysis.

      The study identified three key findings:

      (1) The parasite population continuously recombines, with no single genotype dominating, in contrast to viral populations;

      (2) The relatedness of parasites is influenced by both spatial and temporal distances; and

      (3) The lowest genetic relatedness among parasites occurs during the  transition from low to high transmission seasons. The authors suggest  that this latter finding reflects the increased recombination associated with sexual reproduction in mosquitoes.

      The results section is well-structured, and the figures are clear and  self-explanatory. The methods are adequately described, providing a  solid foundation for the findings. While there are no unexpected  results, it is reassuring to see the anticipated outcomes supported by  actual data. The conclusions are generally well-supported; however, the  discussion on the burden of asymptomatic infections falls outside the  scope of the data, as no specific analysis was conducted on this aspect  and was not stated as part of the aims of the study. Nonetheless, the  recommendation to target asymptomatic infections is logical and  relevant.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The manuscript would benefit from additional detail in the methods section.

      a. Refer to Figure 1 when you describe the included studies and sample processing.

      We added the reference to Figure 1 (Line 131).

      b. While you describe each step in the pipeline, you do not specify the  tools, packages, or environment used (the GitHub link is also  non-functional). A graphic representation of the pipeline, with more  bioinformatic details than Supplementary Figure S1, would be helpful.  Add references to used tools and software created by others.

      The GitHub link has been updated and is now functional. We find Figure S1 already heavy in details, adding in more would be detrimental to our will of it being an easily readable summary of our pipeline. Readers seeking in-depth explanation of our pipeline might be more interested in reading the methods section instead. We are very much committed to credit the authors of the tools that were essential for us to create our analysis pipeline. The two most relevant tools that we used are hmmIBD and the Fws calculation, which were both cited in the methods (Lines 148-152, 214-215).

      c. What changed in the genotyping protocol after May 2016? Does it not  lead to bias in the (temporal) analysis by leaving these loci in for  samples collected before May 2016 and making them 'unknown' for the  majority of samples collected after this date?

      These 21 SNPs all clustered in 1 of the 4 multiplexes used for molecular genotyping, which likely failed to produce accurate base calls. We updated the text to include this information (Lines 198-200).

      The rationale behind the discarding of these 21 SNPs for barcodes sampled after May 2016 was that they were consistently mismatching with the WGS SNPs, probably due to genotyping error as mentioned above. However, by replacing these unknown positions in the molecular barcodes with WGS SNPs, 141 samples did recover some of these 21 SNPs with the accurate base calls (Figure S3A). Additionally, we added an extra analysis to assess the agreement between barcodes and WGS data (Figure S3B).

      d. Related to this, how are unknown and mixed genotypes treated in the  binary matrix? How is the binary matrix coded? Is 0 the same as the  reference allele? So all the missing and mixed are treated as  references? How many missing and mixed alleles are there, how often does it occur and how does this impact the IBD analysis?

      We acknowledge that the details that we provided regarding the IBD analysis were confusing. hmmIBD requires a matrix that contains positive or null integers for each different allele at a given loci (all our loci were bi-allelic, thus only 0 and 1 were used) and -1 for missing data. In our case, we set missing and mixed alleles to -1, which were then ignored during the IBD estimation. The corresponding text was updated accordingly (Lines 173-175).

      e. By excluding households with less than 5 comparisons, are you not preselecting households with high numbers of cases, and therefore higher likelihood of transmission within the household?

      All participants in each household were sampled at every collection time point. This sampling was unbiased towards likelihood of transmission. Excluding pairs of households with less than 5 comparisons was necessary to ensure statistical robustness in our analyses. Besides, this does not necessarily restrict the analysis to only households with a high number of cases as it is the total number of pairs between households that must equal 5 at least (for instance these pairs would pass the cutoff: household with 1 case vs household with 5 cases; household with 2 cases vs household with 3 cases).

      (2) Since the authors only do the genotype replacement and build  consensus barcode for 199 samples, there is a bias between the samples  with consensus barcode and those with only the genotyping barcode. How  did this impact the analysis?

      See (6) in the Public Review.

      a. It would be good to get a better sense of the distribution of the nr  of SNPs in the barcode. The range is 30-89, and 30 SNPs for IBD is  really not that much.

      Adding the range of the number of available SNPs per barcode is indeed particularly relevant. We added a supplementary figure (Figure S5) showing the distribution of homozygous SNPs per barcode, showing that a very small minority of barcodes have only 30 SNPs available for IBD (average of 65, median of 64).

      b. Did you compare the nr of SNPs in the consensus vs. only genotyped  barcodes? Is there more missing data in the genotype-only barcodes?

      We added a supplementary figure (Figure S5) with the distribution of homozygous SNPs in consensus (216 samples) and molecular (209 samples) barcodes. Consensus barcodes have more homozygous SNPs (average 76, median 82) than molecular barcodes (average of 54, median of 53), showing the improvement resulting from using whole genome sequencing data.

      c. How was the cut-off/sample exclusion criteria of 30 SNPs in the barcode determined?

      As described above (Public review section 7.a.), we removed pairs of barcodes with less than 30 comparable loci (and 10 informative loci) because this led to a good agreement between IBD values obtained from barcodes and genomes while still retaining a majority of pairwise IBD values.

      d. Was there more/less IBD between sample pairs with a consensus barcode vs those with genotype-only barcodes?

      We separated pairwise IBD values into two groups: “within consensus” and “within molecular”. The percentages of related barcodes (IBD ≥ 0.5) was virtually identical between “within consensus” (1.88 %) and “within molecular” (1.71 %) groups (χ<sup>2</sup> = 1.33, p value > 0.24).

      (3) Line 124 adds a reference for the PCR method used.

      We have updated this information: varATS qPCR (Line 121).

      (4) Line 126, what is MN2100ff? Is this the catalogue number of the  cellulose columns? Please clarify and add manufacturer details.

      MN2100ff was a replacement for CF11. We added a link to the MalariaGen website describing the product and the procedure (Lines 124-125).

      (5) Line 143: Figure S7 is the first supplementary figure referenced. Change the order and make this Figure S1?

      The numbering of figures is now fixed.

      (6) Line 154: How many SNPs were in the vcf before filtering?

      There were 1,042,186 SNPs before filtering. This information was added to the methods (Line 168).

      (7) Line 156: Why is QUAL filtered at 10000? This seems extremely high.  (I could be mistaken, but often QUAL above 50 or so is already fine, why discard everything below 10000?). What is the range of QUAL scores in  your vcf?

      We used the QUAL > 10000 to make our analyses less computationally intensive while keeping enough relevant genetic information. We agree that keeping variants with extremely high values of QUAL is not relevant above a certain threshold as it translates into infinitesimally low probabilities (10<sup>-(QUAL/10)</sup>) of the variant calling being wrong. We then decided to use a minimal population minor allele frequency (MAF) of 0.01 to keep a variant as this will make the IBD calculation more accurate (Taylor et al., 2019). The variant filtering was carried out with the MAF > 0.01 filter, resulting in 27,577 filtered SNPs with a minimal QUAL of 132. With a cutoff of 3000 available SNPs, we retrieved all 199 genomes previously obtained with the QUAL > 10000 condition. The methods have been updated accordingly (Lines 166-170).

      (8) Line 161-165: How did you handle the mixed alleles in the hmmIBD  analysis for the WGS data? Did you set them as 0 as you do later on for  the consensus barcode?

      Mixed alleles and missing data were ignored. This translated into a value of -1 for the hmmIBD matrix and not 0 as we incorrectly stated previously. We updated our manuscript with this correct information (Lines 173-175).

      (9) Line 168-171: How many SNPs do you have in the WGS dataset after all the filtering steps? If the aim of the IBD with WGS was to validate the IBD-analysis with the barcode, wouldn't it make sense to have at least  200 loci (as shown in Taylor et al to be required for hmmIBD) in the WGS data? What proportion of comparisons were there with only 100 pairs of  loci? This seems like really few SNPs from WGS data.

      There were 27,577 SNPs overall in the 199 high quality genomes. In our analysis, we make the distinction between comparable and informative loci. For two loci to be comparable, they both have to be homozygous. To be informative, they must be comparable and at least one of them must correspond to the minor allele in the population. We borrowed this term and definition from hmmIBD software which yields directly the number of informative loci per pair. By keeping pairs with at least 100 informative SNPs, we aimed to reduce the number of samples artificially related because only population major alleles are being compared. Pairs of genomes had between 1073 and 27466 of these, way above the recommended 200 loci in Taylor et al. (2019). We added more details on comparable and informative sites (Lines 152-160).

      (10) Line 178: why remove the 12 loci that are absent from the WGS? Are  these loci also poorly genotyped in the spotmalaria panel?

      As our goal is to validate the reliability of molecular genotyped SNPs, these 12 loci have to be removed. Especially because we did find a consistent discrepancy between genotyped and WGSed SNPs, which cannot be tested if these SNPs are absent from the genomes.

      (11) Line 180-182: What do you mean by this sentence: "Genomic barcodes  are built using different cutoffs of within-sample MAF and aligned  against molecular barcodes from the same isolates." Is this the analysis presented in the supplementary figure and resulting in the cut-off of  MAF 0.2? Please clarify.

      A loci where both alleles are called can result from two distinct haploïd genomes present or from an error occurring during sequencing data acquisition or processing. To distinguish between the two, we empirically determined the cutoff of within-sample MAF above which the loci can be considered heterozygous and below which only the major allele is kept. The corresponding figure was indeed Figure S2 (referenced in next sentence Lines 192-195). We clarified our approach in the methods (Lines 190-192) and legends of Figures S2 and Figure S3.

      (12) Line 191: How often was there a mismatch between WGS and SNP barcode?

      We added a panel (Figure S3B) showing the average agreement of each SNP between molecular genotyping and WGS. We highlighted the 21 discrepant SNPs showing a lower agreement only for samples collected after May 2016.

      (13) Line 201-204: This part is unclear (as above for the WGS): did you  include sample pairs with more than 10 paired loci? But isn't 10 loci  way too few to do IBD analysis?

      We included pairs of samples with at least 30 comparable loci and 10 informative paired loci (refer to our answer to comment 8 for the difference between the two). We added more details regarding comparable and informative sites (Lines 152-160). Indeed, using fewer than 200 loci leads to an IBD estimation that is on average off by 0.1 or more (Taylor et al., 2019). However we showed that the barcode relatedness classification based on a cutoff of IBD (related when above 0.5, unrelated otherwise) was close enough to our gold standard using genomes (each pair having more than 1000 comparable sites). Because we use this classification approach rather than the exact value of barcode-estimated IBD in our study, our 30 minimum comparable sites cutoff seems sufficient.

      (14) Lines 206-207: which program did you use to analyse Fws?

      We did not use any program, we computed Fws according to Manske et al. (2012) methods.

      (15) Line 233: "we attempted parasite genotyping and whole genome  sequencing of 522 isolates over 16 time points" => This is confusing, you did not do WGS of 522 samples, only 199 as mentioned in the next  sentence.

      We attempted whole genome sequencing on 331 isolates and molecular genotyping on 442 isolates with 251 isolates common between the two methods. We updated our text to clarify this point (Lines 247-252).

      (16) Lines 256-259: Add a range of proportions or some other summary  statistic in this section as you are only referring here to  supplementary figures to support these statements.

      The text has been updated (Lines 271-274).

      (17) Line 260: check the formatting of the reference "Collins22" as the rest of the document references are numbered.

      Fixed.

      (18) Figure 2/3:

      a. You could also inspect relatedness at the temporal level, by  adjusting the network figure where the color is village and shape is  time (month/year).

      Although visualising the effect of time on the parasite relatedness network would be a valuable addition, we did not find any intuitive and simple way of doing so. Using shapes to represent time might end up being more confusing than helpful, especially because the sampling was not done at fixed intervals.

      b. To further support the statement of clustering at the household  level, it might be useful to add a (supplementary) figure with the  network with household number/IDs as color or shape. In the network,  there seems to be a lot of relatedness within the villages and between  villages. Perhaps looking only at the distribution of the proportion of  highly related isolates is simplifying the data too much. Besides, there is no statistical difference between clustering at the household vs  within-village levels as indicated in Figure 3.

      Unfortunately, there are too many households (71 in Figure 2) to make a figure with one color or shape per household readable. The statistical test of the difference between the within household and within village relatedness yielded a p value above the cutoff of 0.05 (p value of 0.084). However, it is possible that the lack of significance arises from the relatively low number of data points available in the “within household” group. This is even more plausible considering the statistical difference of both “within household” and “within village” groups with “between village” group. Overall, our results indicate a decreasing parasite relatedness with spatial distance, and that more investigation would be needed to quantify the difference between “within household” and “within village” groups. 

      (19) Figure 4: Please add more description in the caption of this figure to help interpret what is displayed here. Figure 4A is hard to  interpret and does not seem to show more than is already shown in Figure 3A. What do the dots represent in Figure 4B? It is not clear what is  presented here.

      Compared to Figure 3A, Figure 4A enables the visualization of the relatedness between each individual pair of time points, which are later used in the comparison of relatedness between seasonal groups in Figure 4B. For this reason, we believe that Figure 4A should remain in the manuscript. However, we agree that the relationship between Figure 4A and Figure 4B is not intuitive in the way we presented it initially. For this reason, we added more details in the legend and modified Figure 4A to highlight the seasonal groups used in Figure 4B. 

      (20) Line 360-361: what did you do when haplotypes were not identical?

      We explained it in the methods section (Lines 144-146): in this case, only WGS haplotypes were kept.

      (21) Section chronic infections: it is important to mention that the  majority of chronic infections are individuals from the monthly  dry-season cohort.

      We added a statement about the 21 chronically infected individuals that were also part of the December 2016 – May 2017 monthly follow-up (Lines 423-426).

      (22) Lines 381-386: Did you investigate COI in these individuals? Could  it be co-circulating strains that you do not pick up at all times due to the consensus barcodes and discarding of mixed genotypes (and does not  necessarily show intra-host competition. That is speculation and should  perhaps not be in the results)?

      This is exactly what we think is happening. Due to the very nature of genotyping, only one strain may be observed at a time in the case of a co-infection, where distinct but related strains are simultaneously present in the host. The picked-up strain is typically the one with the highest relative abundance at the time of sampling. As the reviewer stated, fluctuation of strain abundance might not only be due to intra-host competition but also asynchronous development stages of the two strains. We added this observation to the manuscript (Lines 432-435).

      (22) Figure 6: highlight the samples where the barcode was not available in a different color to be able to see the difference between a  non-matching barcode and missing data.

      We thank the reviewer for this great suggestion. We have now added to Figure 6 barcodes available along with their level of relatedness with the dominant genotypes for each continuous infections.

      (24) Improve the discussion by adding a clear summary of the main  findings and their implications, as well as study-specific limitations.

      The Discussion has been updated with a paragraph summarizing the primary results (Lines 451-457).

      (25) Line 445: "implying that the whole population had been replaced in just one year "

      a. What do you mean by replaced? Did other populations replace the  existing populations? I am not sure the lack of IBD is enough to show  that the population changed/was replaced. Perhaps it is more accurate to say that the same population evolved. Nevertheless, other measures such as genetic diversity and genetic differentiation or population  structure.would be more suitable to strengthen these conclusions.

      We agree that “replaced” was the wrong term in this case. We rather intended to describe how the numerous recombinations between malaria parasites completely reshaped the same initial population which gradually displayed lower levels of relatedness over time. We updated the manuscript accordingly (Lines 507-512).

      Reviewer #2 (Recommendations for the authors):

      (1) Line 260: Remove Collins 22.

      Fixed.

      (2) Lines 270-274: 73 + 213 = 286 not 284; sum of percentages is equal to 101%.

      The numbers are correct: the 73 barcodes identical (IBD >= 0.9) to another barcode are a subset of the 213 related (IBD >= 0.5) to another barcode. However we agree that this might be confusing and will considering barcodes to be related if they have an IBD between 0.5 and 0.9, while excluding those with an IBD >= 0.9. The text has been updated (Lines 299-301).

      (3) Section: "Independence of seasonality and drug resistance markers prevalence".

      The text has been revised and the supplementary figure is now a main figure.

      (4) For readers unaware of malaria control policy in the Gambia it would be helpful to have more details on the specifics of anti-malarial drug  administration.

      We added the drugs used in SMC (sulfadoxine-pyrimethamine and amodiaquine) and the first line antimalarial treatment in use in The Gambia during our study (Coartem) (Lines 383-388).

      Reviewer #3 (Recommendations for the authors):

      (1) The abstract is not as clear as the authors' summary. For example, I found the sentence starting with "with 425 P. falciparum..." hard to  follow.

      The abstract has been updated.

      (2) It is better to consistently use "barcode genotyping "or "genotyping by barcode". Sometimes "molecular genotyping" is used instead of  "barcode genotyping"

      We have now replaced all occurrences of “barcode genotyping” with “molecular genotyping” or “molecular barcode genotyping”. We prefer to stick with “molecular genotyping” as this let us distinguish between the molecular and the genomic barcode.

      (3) The introduction is quite disjoined and does not provide a clear  build-up to the gap in knowledge that the study is attempting to fill.  please revise.

      Introduction is now thoroughly revised.

      (4) Line 31 "with notable increase of parasite differentiation" is an interpretation and not an observation.

      We have modified that sentence (Lines 31-33).

      (5) Overall, the introduction requires substantial revision.

      Introduction is now thoroughly revised.

      (6) Line 70 "parasite population adapts..." I thought this required phenotypic analysis and not genetics?

      The idea is that population of parasites may adapt to environmental conditions (such as seasonality) by selecting the most fitted genotypes. For instance, antimalarial exposure has an effect of selecting parasites with specific mutations in drug resistance related genes, and this even appears to be transient (for example with chloroquine). As such, there is good reason to think that seasonality might have a similar effect on parasite genetics.

      (7) Line 129-130: the #442 is not reflected in the schematic Figure 1.

      This is an intentional choice to make the figure more synthetic. For this reason, we included the Figure S1, which provides more details on the data collection and analysis pipeline.

      (8) Line 242-243: "Made with natural earth". What is this?

      This is a statement acknowledging the use of Natural Earth data to produce the map presented in Figure 1A.

      (9) Line 260: "collins22", is this a reference?

      Fixed.

      (10) Line 269-70. Very hard to follow. Please revise.

      We changed the text (Lines 293-297).

      (11) Line 324: similarly... I think there is a typo here.

      We did not find any typo in this specific sentence. However, “Similarly to Figure 3” sounds maybe a bit off, so we changed it to “As in Figure 3” (Line 351).

      (12) Line 332-334: very hard to follow. please revise. Again, the lower  parasite relatedness during the transition from low to high was linked  to recombination occurring in the mosquito but what about infection  burden shifting to naive young children? Is there a role for host  immunity in the observed reduction in parasite-relatedness during the  transition period?

      This text has been rewritten (Lines 356-361).

      About the hypothesis of infection burden shifting to naïve young children, this question is difficult to address in The Gambia because children under 5 years old received Seasonal Malaria Chemoprophylaxis during the high transmission season. In older children (6-15 years old), the prevalence was similar to adults (Fogang et al., 2024).

      About the role of host immunity on parasite relatedness across time and space, our dataset is too small to divide it in different age groups. Further studies should address this very interesting question.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper examines changes in relaxation time (T1 and T2) and magnetization transfer parameters that occur in a model system and in vivo when cells or tissue are depolarized using an equimolar extracellular solution with different concentrations of the depolarizing ion K<sup>+</sup>. The motivation is to explain T2 changes that have previously been observed by the authors in an in vivo model with neural stimulation (DIANA) and to try to provide a mechanism to explain those changes.

      Strengths:

      The authors argue that the use of various concentrations of KCL in the extracellular fluid depolarize or hyperpolarize the cell pellets used and that this change in membrane potential is the driving force for the T2 (and T1-supplementary material) changes observed. In particular, they report an increase in T2 with increasing KCL concentration in the extracellular fluid (ECF) of pellets of SH-SY5Y cells. To offset the increasing osmolarity of the ECF due to the increase in KCL, the NaCL molarity of the ECF is proportionally reduced. The authors measure the intracellular voltage using patch clamp recordings, which is a gold standard. With 80 mM of KCL in the ECF, a change in T2 of the cell pellets of ~10 ms is observed with the intracellular potential recorded as about -6 mv. A very large T1 increase of ~90 ms is reported under the same conditions. The PSR (ratio of hydrogen protons on macromolecules to free water) decreases by about 10% at this 80 mM KCL concentration. Similar results are seen in a Jurkat cell line and similar, but far smaller changes are observed in vivo, for a variety of reasons discussed. As a final control, T1 and T2 values are measured in the various equimolar KCL solutions. As expected, no significant changes in T1 and T2 of the ECF were observed for these concentrations.

      Weaknesses:

      [Reviewer 1, Comment 1] While the concepts presented are interesting, and the actual experimental methods seem to be nicely executed, the conclusions are not supported by the data for a number of reasons. This is not to say that the data isn't consistent with the conclusions, but there are other controls not included that would be necessary to draw the conclusion that it is membrane potential that is driving these T1 and T2 changes. Unfortunately for these authors, similar experiments conducted in 2008 (Stroman et al. Magn. Reson. in Med. 59:700-706) found similar results (increased T2 with KCL) but with a different mechanism, that they provide definite proof for. This study was not referenced in the current work.

      It is well established that cells swell/shrink upon depolarization/hyperpolarization. Cell swelling is accompanied by increased light transmittance in vivo, and this should be true in the pellet system as well. In a beautiful series of experiments, Stroman et al. (2008) showed in perfused brain slices that the cells swell upon equimolar KCL depolarization and the light transmittance increases. The time course of these changes is quite slow, of the order of many minutes, both for the T2-weighted MRI signal and for the light transmittance. Stroman et al. also show that hypoosmotic changes produce the exact same time course as the KCL depolarization changes (and vice versa for the hyperosmotic changes - which cause cell shrinkage). Their conclusion, therefore, was that cell swelling (not membrane potential) was the cause of the T2-weighted changes observed, and that these were relatively slow (on the scale of many minutes).

      What are the implications for the current study? Well, for one, the authors cannot exclude cell swelling as the mechanism for T2 changes, as they have not measured that. It is however well established that cell swelling occurs during depolarization, so this is not in question. Water in the pelletized cells is in slow/intermediate exchange with the ECF, and the solutions for the two compartment relaxation model for this are well established (see Menon and Allen, Magn. Reson. in Med. 20:214-227 (1991). The T2 relaxation times should be multiexponential (see point (3) further below). The current work cannot exclude cell swelling as the mechanism for T2 changes (it is mentioned in the paper, but not dealt with). Water entering cells dilutes the protein structures, changes rotational correlation times of the proteins in the cell and is known to increase T2. The PSR confirms that this is indeed happening, so the data in this work is completely consistent with the Stroman work and completely consistent with cell swelling associated with depolarization. The authors should have performed light scattering studies to demonstrate the presence or absence of cell swelling. Measuring intracellular potential is not enough to clarify the mechanism.

      [Reviewer 1, Response 1] We appreciate the reviewer’s comments. We agree that changes in cell volume due to depolarization and hyperpolarization significantly contribute to the observed changes in T2, PSR, and T1, especially in pelletized cells. For this reason, we already noted in the Discussion section of the original manuscript that cell volume changes influence the observed MR parameter changes, though this study did not present the magnitude of the cell volume changes. In this regard, we thank the reviewer for introducing the work by Stroman et al. (Magn Reson Med 59:700-706, 2008). When discussing the contribution of the cell volume changes to the observed MR parameter changes, we additionally discussed the work of Stroman et al. in the revised manuscript.

      In addition, we acknowledge that the title and main conclusion of the original manuscript may be misleading, as we did not separately consider the effect of cell volume changes on MR parameters. To more accurately reflect the scope and results of this study and also take into account the reviewer 2’s suggestion, we adjusted the title to “Responses to membrane potential-modulating ionic solutions measured by magnetic resonance imaging of cultured cells and in vivo rat cortex” and also revised the relevant phrases in the main text.

      Finally, when [K<sup>+</sup>]-induced membrane potential changes are involved, there seems to be factors other than cell volume changes that appear to influence T<sup>2</sup> changes. Our follow-up study shows that there are differences in volume changes for the same T<sup>2</sup> change in the following two different situations: pure osmotic volume changes versus [K<sup>+</sup>]-induced volume changes. For example, for the same T<sup>2</sup> change, the volume change for depolarization is greater than the volume change for hypoosmotic conditions. We will present these results in this coming ISMRM 2025 and are also preparing a manuscript to report shortly.

      [Reviewer 1, Comment 2] So why does it matter whether the mechanism is cell swelling or membrane potential? The reason is response time. Cell swelling due to depolarization is a slow process, slower than hemodynamic responses that characterize BOLD. In fact, cell swelling under normal homeostatic conditions in vivo is virtually non-existent. Only sustained depolarization events typically associated with non-naturalistic stimuli or brain dysfunction produce cell swelling. Membrane potential changes associated with neural activity, on the other hand, are very fast. In this manuscript, the authors have convincingly shown a signal change that is virtually the same as what was seen in the Stroman publication, but they have not shown that there is a response that can be detected with anything approaching the timescale of an action potential. So one cannot definitely say that the changes observed are due to membrane potential. One can only say they are consistent with cell swelling, regardless of what causes the cell swelling.

      For this mechanism to be relevant to explaining DIANA, one needs to show that the cell swelling changes occur within a millisecond, which has never been reported. If one knows the populations of ECF and pellet, the T2s of the ECF and pellet and the volume change of the cells in the pellet, one can model any expected T2 changes due to neuronal activity. I think one would find that these are minuscule within the context of an action potential, or even bulk action potential.

      [Reviewer 1, Response 2] In the context of cell swelling occurring at rapid response times, if we define cell swelling simply as an “increase in cell volume,” there are several studies reporting transient structural (or volumetric) changes (e.g., ~nm diameter change over ~ms duration) in neuron cells during action potential propagation (Akkin et al., Biophys J 93:1347-1353, 2007; Kim et al., Biophys J 92:3122-3129, 2007; Lee et al., IEEE Trans Biomed Eng 58:3000-3003, 2011; Wnek et al., J Polym Sci Part B: Polym Phys 54:7-14, 2015; Yang et al., ACS Nano 12:4186-4193, 2018). These studies show a good correlation between membrane potential changes and cell volume changes (even if very small) at the cellular level within milliseconds.

      As mentioned in the Response 1 above, this study does not address rapid dynamic membrane potential changes on the millisecond scale, which we explicitly mentioned as one of the limitations in the Discussion section of the original manuscript. For this reason, we do not claim in this study that we provide the reader with definitive answers about the mechanisms involved in DIANA. Rather, as a first step toward addressing the mechanism of DIANA, this study confirms that there is a good correlation between changes in membrane potential and measurable MR parameters (e.g., T<sup>2</sup> and PSR) when using ionic solutions that modulate membrane potential. Identifying MR parameter changes that occur during millisecond-scale membrane potential changes due to rapid neural activation will be addressed in the follow-up study mentioned in the Response 1 above.

      There are a few smaller issues that should be addressed.

      [Reviewer 1, Comment 3] (1) Why were complicated imaging sequences used to measure T1 and T2? On a Bruker system it should be possible to do very simple acquisitions with hard pulses (which will not need dictionaries and such to get quantitative numbers). Of course, this can only be done sample by sample and would take longer, but it avoids a lot of complication to correct the RF pulses used for imaging, which leads me to the 2nd point.

      [Reviewer 1, Response 3] We appreciate the reviewer’s suggestion regarding imaging sequences. In fact, we used dictionaries for fitting in vivo T<sup>2</sup> decay data, not in vitro data. Sample-by-sample nonlocalized acquisition with hard pulses may be applicable for in vitro measurements. However, for in vivo measurements, a slice-selective multi-echo spin-echo sequence was necessary to acquire T<sup>2</sup> maps within a reasonable scan time. Our choice of imaging sequence was guided by the need to spatially resolve MR signals from specific regions of interest while balancing scan time constraints.

      [Reviewer 1, Comment 4] (2) Figure S1 (H) is unlike any exponential T2 decay I have seen in almost 40 years of making T2 measurements. The strange plateau at the beginning and the bump around TE = 25 ms are odd. These could just be noise, but the fitted curve exactly reproduces these features. A monoexponential T2 decay cannot, by definition, produce a fit shaped like this.

      [Reviewer 1, Response 4] The T<sup>2</sup> decay curves in Figure S1(H) indeed display features that deviate from a simple monoexponential decay. In our in vivo experiments, we used a multi-echo spin-echo sequence with slice-selective excitation and refocusing pulses. In such sequences, the echo train is influenced by stimulated echoes and imperfect slice profiles. This phenomenon is inherent to the pulse sequence rather than being artifacts or fitting errors (Hennig, Concepts Magn Reson 3:125-143, 1991; Lebel and Wilman, Magn Reson Med 64:1005-1014, 2010; McPhee and Wilman, Magn Reson Med 77:2057-2065, 2017). Therefore, we fitted the T<sub>2</sub> decay curve using the technique developed by McPhee and Wilman (2017).

      [Reviewer 1, Comment 5] (3) As noted earlier, layered samples produce biexponential T2 decays and monoexponential T1 decays. I don't quite see how this was accounted for in the fitting of the data from the pellet preparations. I realize that these are spatially resolved measurements, but the imaging slice shown seems to be at the boundary of the pellet and the extracellular media and there definitely should be a biexponential water proton decay curve. Only 5 echo times were used, so this is part of the problem, but it does mean that the T2 reported is a population fraction weighted average of the T2 in the two compartments.

      [Reviewer 1, Response 5] We understand the reviewer’s concern regarding potential biexponential decay due to the presence of different compartments. In our experiments, we carefully positioned the imaging slice sufficiently remote from the pellet-media interface. This approach ensures that the signal predominantly arises from the cells (and interstitial fluid), excluding the influence of extracellular media above the cell pellet. We described the imaging slice more clearly in the revised manuscript. As mentioned in our Methods section, for in vitro experiments, we repeated a single-echo spin-echo sequence with 50 difference echo times. While Figure 1C illustrates data from five echo times for visual clarity, the full dataset with all 50 echo times was used for fitting. We clarified this point in the revised manuscript to avoid any misunderstanding.

      [Reviewer 1, Comment 6] (4) Delta T1 and T2 values are presented for the pellets in wells, but no absolute values are presented for either the pellets or the KCL solutions that I could find.

      [Reviewer 1, Response 6] As requested by the reviewer, we included the absolute values in the supplementary information.

      Reviewer #2 (Public review):

      Summary:

      Min et al. attempt to demonstrate that magnetic resonance imaging (MRI) can detect changes in neuronal membrane potentials. They approach this goal by studying how MRI contrast and cellular potentials together respond to treatment of cultured cells with ionic solutions. The authors specifically study two MRI-based measurements: (A) the transverse (T2) relaxation rate, which reflects microscopic magnetic fields caused by solutes and biological structures; and (B) the fraction or "pool size ratio" (PSR) of water molecules estimated to be bound to macromolecules, using an MRI technique called magnetization transfer (MT) imaging. They see that depolarizing K<sup>+</sup> and Ba2+ concentrations lead to T2 increases and PSR decreases that vary approximately linearly with voltage in a neuroblastoma cell line and that change similarly in a second cell type. They also show that depolarizing potassium concentrations evoke reversible T2 increases in rat brains and that these changes are reversed when potassium is renormalized. Min et al. argue that this implies that membrane potential changes cause the MRI effects, providing a potential basis for detecting cellular voltages by noninvasive imaging. If this were true, it would help validate a recent paper published by some of the authors (Toi et al., Science 378:160-8, 2022), in which they claimed to be able to detect millisecond-scale neuronal responses by MRI.

      Strengths:

      The discovery of a mechanism for relating cellular membrane potential to MRI contrast could yield an important means for studying functions of the nervous system. Achieving this has been a longstanding goal in the MRI community, but previous strategies have proven too weak or insufficiently reproducible for neuroscientific or clinical applications. The current paper suggests remarkably that one of the simplest and most widely used MRI contrast mechanisms-T2 weighted imaging-may indicate membrane potentials if measured in the absence of the hemodynamic signals that most functional MRI (fMRI) experiments rely on. The authors make their case using a diverse set of quantitative tests that include controls for ion and cell type-specificity of their in vitro results and reversibility of MRI changes observed in vivo.

      Weaknesses:

      [Reviewer 2, Comment 1] The major weakness of the paper is that it uses correlational data to conclude that there is a causational relationship between membrane potential and MRI contrast. Alternative explanations that could explain the authors' findings are not adequately considered. Most notably, depolarizing ionic solutions can also induce changes in cellular volume and tissue structure that in turn alter MRI contrast properties similarly to the results shown here. For example, a study by Stroman et al. (Magn Reson Med 59:700-6, 2008) reported reversible potassium-dependent T2 increases in neural tissue that correlate closely with light scattering-based indications of cell swelling. Phi Van et al. (Sci Adv 10:eadl2034, 2024) showed that potassium addition to one of the cell lines used here likewise leads to cell size increases and T2 increases. Such effects could in principle account for Min et al.'s results, and indeed it is difficult to see how they would not contribute, but they occur on a time scale far too slow to yield useful indications of membrane potential. The authors' observation that PSR correlates negatively with T2 in their experiments is also consistent with this explanation, given the inverse relationship usually observed (and mechanistically expected) between these two parameters. If the authors could show a tight correspondence between millisecond-scale membrane potential changes and MRI contrast, their argument for a causal connection or a useful correlational relationship between membrane potential and image contrast would be much stronger. As it is, however, the article does not succeed in demonstrating that membrane potential changes can be detected by MRI.

      [Reviewer 2, Response 1] We appreciate the reviewer’s comments. We agree that changes in cell volume due to depolarization and hyperpolarization significantly contribute to the observed MR parameter changes. For this reason, we have already noted in the Discussion section of the original manuscript that cell volume changes influence the observed MR parameter changes. In this regard, we thank the reviewer for introducing the work by Stroman et al. (Magn Reson Med 59:700-706, 2008) and Phi Van et al. (Sci Adv 10:eadl2034, 2024). When discussing the contribution of the cell volume changes to the observed MR parameter changes, we additionally discussed both work of Stroman et al. and Phi Van et al. in the revised manuscript.

      In addition, this study does not address rapid dynamic membrane potential changes on the millisecond scale, which we explicitly discussed as one of the limitations of this study in the Discussion section of the original manuscript. For this reason, we do not claim in this study that we provide the reader with definitive answers about the mechanisms involved in DIANA. Rather, as a first step toward addressing the mechanism of DIANA, this study confirms that there is a good correlation between changes in membrane potential and measurable MR parameters (although on a slow time scale) when using ionic solutions that modulate membrane potential. Identifying MR parameter changes that occur during millisecond-scale membrane potential changes due to rapid neural activation will be addressed in the follow-up study mentioned in the Response 1 to Reviewer 1’s Comment 1 above.

      Together, we acknowledge that the title and main conclusion of the original manuscript may be misleading. To more accurately reflect the scope and results of this study and also consider the reviewer’s suggestion, we adjusted the title to “Responses to membrane potential-modulating ionic solutions measured by magnetic resonance imaging of cultured cells and in vivo rat cortex” and also revised the relevant phrases in the main text.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      [Reviewer 1, Comment 7] The manuscript is well written. One thing to emphasize early on is that the KCL depolarization is done in an equimolar (or isotonic) manner. I was not clear on this point until I got to the very end of the methods. This is a strength of the paper and should be presented earlier.

      [Reviewer 1, Response 7] In response to the reviewer’s suggestion, we have revised the manuscript to present the equimolar characteristic of our experiment earlier.

      [Reviewer 1, Comment 8] In terms of experiments, the relaxation time measurements are not well constructed. They should be done with a CPMG sequence with hundreds of echos and properly curve fit. This is entirely possible on a Bruker spectrometer.

      [Reviewer 1, Response 8] As noted in our Response to Reviewer 1’s Comment 3, while a CPMG sequence with numerous echoes and straightforward curve fitting can be effective, it is less feasible for in vivo experiments. Our multi-echo spin-echo sequence was a balanced approach between spatial resolution, reasonable scan duration, and the need to localize signals within specific regions of interest.

      [Reviewer 1, Comment 9] Measurements of cell swelling should be done to determine the time course of the cell swelling. This could be with NMR (CPMG) or with light scattering. For this mechanism to be relevant to explaining DIANA, one needs to show that the cell swelling changes occur within a millisecond, which has never been reported. If one knows the populations of ECF and pellet, the T2s of the ECF and pellet and the volume change of the cells in the pellet, one can model any expected T2 changes due to neuronal activity.

      [Reviewer 1, Response 9] We acknowledge the importance of further research to further strengthened the claims of this study through additional experiments such as cell volume recording. We will do it in future studies.

      As noted in our Response 2 to Reviewer 1’s Comment 2, this study does not address rapid membrane potential changes on the millisecond scale, and we acknowledge that establishing the precise timing of cell swelling is crucial for fully understanding the mechanisms of DIANA. Our current work demonstrates that MR parameters (e.g., T<sup>2</sup> and PSR) correlate strongly with membrane potential-modulating ionic environments, but it does not extend to millisecond-scale neural activation. We recognize the importance of further experiments, such as direct cell volume measurements and plan to incorporate it in future studies to build on the insights gained from the present work.

      Reviewer #2 (Recommendations for the authors):

      Here are a few comments, questions, and suggestions for improvement:

      [Reviewer 2, Comment 2] I could not find much information about the various incubation times and delays used for the authors' in vitro experiments. For each of the in vitro experiments in particular, how long were cells exposed to the stated ionic condition prior to imaging, and how long did the imaging take? Could this and any other relevant information about the experimental timing please be provided and added to the methods section?

      [Reviewer 2, Response 2] We have included the information about the preparation/incubation times in the revised manuscript. For the scan time, it was already stated in the original manuscript: 23 minutes for the single-echo spin-echo sequence and 23 minutes for the inversion-recovery multi-echo spin-echo, for a total of 46 minutes.

      [Reviewer 2, Comment 3] In what format were the cells used for patch clamping, and were any controls done to ensure that characteristics of these cells were the same as those pelleted and imaged in the MRI studies? How long were the incubation times with ionic solutions in the patch clamp experiment? This information should likewise be added to the paper.

      [Reviewer 2, Response 3] We have clarified in the revised manuscript that SH-SY5Y cells were patch clamp-measured in their adherent state. On the other hand, the cells were dissociated from the culture plate and pelleted, so the experimental environments were not entirely identical. The patch clamp experiments involved a 20–30 minutes incubation period with the ionic solutions. We have included this information in the revised manuscript.

      [Reviewer 2, Comment 4] Can the authors provide information about the mean cell size observed under each condition in their in vitro experiments?

      [Reviewer 2, Response 4] We did not directly quantify the mean cell size for each in vitro condition in this study, so we do not have corresponding data. However, we acknowledge that this information could provide valuable insights into potential mechanisms underlying the observed MR parameter changes. In future experiments, we plan to include direct cell-size measurements to further elucidate how changes in cell volume or hydration contribute to our MR findings.

      [Reviewer 2, Comment 5] The ionic challenges used both in vitro and in vivo could also have affected cell permeability, with corresponding effects that would be detectable in diffusion weighted imaging. Did the authors examine this or obtain any results that could reflect on contributions of permeability properties to the contrast effects they report?

      [Reviewer 2, Response 5] We did not perform diffusion-weighted imaging and therefore do not have direct data regarding changes in cell permeability. We agree that incorporating diffusion-weighted measurements could help distinguish whether the MR parameters changes are driven primarily by membrane potential shifts, cell volume changes, or variations in permeability properties. We will consider these approaches in our future studies.

      [Reviewer 2, Comment 6] Clearly, a faster stimulation method such as optogenetics, in combination with time-locked MRI readouts of the pelleted cells, would be more effective at demonstrating a useful relationship between cellular neurophysiology and MRI contrast in vitro. Can the authors present data from such an experiment? Is there any information they can present that documents the time course of observed responses in their experiments?

      [Reviewer 2, Response 6] In the current study, our methodology did not include time-resolved or dynamic measurements. While it may be possible to obtain indirect information about the temporal dynamics using T<sup>2</sup>-weighted or MT-weighted imaging, such an experiment was beyond the scope of this work. However, we agree that an optogenetic approach with time-locked MRI acquisitions could help directly link cell physiology to MRI contrast, and we will explore this in future studies.

      [Reviewer 2, Comment 7] The authors used a drug cocktail to suppress hemodynamic effects in the experiments of Figs. 5-6. What evidence is there that this cocktail successfully suppresses hemodynamic responses and that it also preserves physiological responses to the ionic challenges used in their experiments? Were analogous in vivo results also obtained in the absence of the cocktail?

      [Reviewer 2, Response 7] We appreciate the reviewer’s concern regarding pharmacological suppression of hemodynamic effects. Although each component is known to inhibit nitric oxide synthesis, we did not directly measure the degree of hemodynamic suppression in this study. In addition, we cannot definitively confirm that these agents preserved the physiological responses to the ionic challenges. We have clarified these points in the revised manuscript and identified them as limitations of the study.

      [Reviewer 2, Comment 8] Why weren't PSR results reported as part of the in vivo experimental results in Fig. 5? Does PSR continue to vary inversely to T2 in these experiments?

      [Reviewer 2, Response 8] In our current experimental setup, acquiring the T<sup>2</sup> map four times required 48 minutes, and extending the scan to include additional quantitative MT measurements for PSR would have significantly prolonged the scanning session. Given that these experiments were conducted on acutely craniotomized rats, maintaining stable physiological conditions for such a long period of time was challenging. Therefore, due to time constraints, we did not perform MT measurements and focused on T<sub>2</sub> mapping.

      [Reviewer 2, Comment 9] The authors have established in vivo optogenetic stimulation paradigms in their laboratory and used them in the Toi et al. DIANA study. Were T2 or PSR changes observed in vivo using standard T2 measurement or T2-weighted imaging methods that do not rely on the DIANA pulse sequence they originally applied?

      [Reviewer 2, Response 9] Our current T<sub>2</sub> mapping experiments utilized a standard multi-echo spin-echo sequence, rather than the DIANA pulse sequence employed in our previous work. In this respect, the T<sub>2</sub> changes we observed in vivo do not rely on the specialized DIANA methodology.

      [Reviewer 2, Comment 10] In the discussion section, the authors state that to their knowledge, theirs "is the first report that changes in membrane potential can be detected through MRI." This cannot be true, as their own Toi et al. Science paper previously claimed this, and a number of the studies cited on p.2 also claimed to detect close correlates of neuroelectric activity. This statement should be amended or revised.

      [Reviewer 2, Response 10] We appreciate the reviewer’s comment. We have revised the discussion section of the manuscript to reflect the points raised by the reviewer.

      [Reviewer 2, Comment 11] Because the current study does not actually demonstrate that changes in membrane potential can be detected by MRI, the authors should alter the title, abstract, and a number of relevant statements throughout the text to avoid implying that this has been shown. The title, for instance, could be changed to "Responses to depolarizing and hyperpolarizing ionic solutions measured by magnetic resonance imaging of excitable cells and rat brains," or something along these lines.

      [Reviewer 2, Response 11] We appreciate the reviewer’s suggestions. We have revised the title, abstract, and relevant statements of the manuscript to clarify that our findings show MR-detectable responses to ionic solutions that are expected to modulate membrane potential, rather than demonstrating direct detection of membrane potential changes by MRI.

      [Reviewer 2, Comment 12] The axes in Fig. 3 seem to be mislabeled. I think the horizontal axes are supposed to be membrane potential measured in mV.

      [Reviewer 2, Response 12] Thank the reviewer for finding an error. We have corrected the axis labels in Figure 3 to indicate membrane potential (in mV) on the horizontal axis.

      [Reviewer 2, Comment 13] Since neither the experiments in Jurkat cells (Fig. 4) nor the in vivo MRI tests (Fig. 5-6) appear to have made in conjunction with membrane potential measurements, it seems like a stretch to refer to these experiments as involving manipulation of membrane potentials per se. Instead, the authors should refer to them as involving administration of stimuli expected to be depolarizing or hyperpolarizing. The "hyperpolarization" and "depolarization" labels of Fig. 4 similarly imply a result that has not actually been shown, and should ideally be changed.

      [Reviewer 2, Response 13] To prevent any misleading that membrane potential changes were directly measured in Jurkat cells or in vivo, we have revised the relevant text and figure labels.

      [Reviewer 2, Comment 14] The changes in T2 and PSR documented with various K<sup>+</sup> challenges to Jurkat cells in Fig. 4 seem to follow a step-function-like profile that differs from the results reported in SH-SY5Y cells. Can the authors explain what might have caused this difference?

      [Reviewer 2, Response 14] We currently do not have a definitive explanation for why Jurkat cells exhibit a step-function-like response to varying K⁺ levels, whereas SH-SY5Y cells show a linear response to log [K<sup>+</sup>]. Experiments that include direct membrane potential measurements in Jurkat cells would help clarify whether this difference arises from genuinely different patterns of depolarization/hyperpolarization or from other factors. We have revised the revised manuscript to address this point.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review): 

      Summary: 

      This fascinating manuscript studies the effect of education on brain structure through a natural experiment. Leveraging the UK BioBank, these authors study the causal effect of education using causal inference methodology that focuses on legislation for an additional mandatory year of education in a regression discontinuity design. 

      Strengths: 

      The methodological novelty and study design were viewed as strong, as was the import of the question under study. The evidence presented is solid. The work will be of broad interest to neuroscientists 

      Weaknesses: 

      There were several areas which might be strengthed from additional consideration from a methodological perspective. 

      We sincerely thank the reviewer for the useful input, in particular, their recommendation to clarify RD and for catching some minor errors in the methods (such as taking the log of the Bayes factors). 

      Reviewer #1 (Recommendations for the authors): 

      (1) The fuzzy local-linear regression discontinuity analysis would benefit from further description. 

      (2) In the description of the model, the terms "smoothness" and "continuity" appear to be used interchangeably. This should be adjusted to conform to mathematical definitions. 

      We have now added to our explanations of continuity regression discontinuity. In particular, we now explain “fuzzy”, and add emphasis on the two separate empirical approaches (continuity and local-randomization), along with fixing our use of “smoothness” and “continuity”.

      results:

      “Compliance with ROSLA was very high (near 100%; Sup. Figure 2). However, given the cultural and historical trends leading to an increase in school attendance before ROSLA, most adolescents were continuing with education past 15 years of age before the policy change (Sup Plot. 7b). Prior work has estimated 25 percent of children would have left school a year earlier if not for ROSLA 41. Using the UK Biobank, we estimate this proportion to be around 10%, as the sample is healthier and of higher SES than the general population (Sup. Figure 2; Sup. Table 2) 46–48.”

      methods:

      “RD designs, like ours, can be ‘fuzzy’ indicating when assignment only increases the probability of receiving it, in turn, treatment assigned and treatment received do not correspond for some units 33,53. For instance, due to cultural and historical trends, there was an increase in school attendance before ROSLA; most adolescents were continuing with education past 15 years of age (Sup Plot. 7b). Prior work has estimated that 25 percent of children would have left school a year earlier if not for ROSLA 41. Using the UK Biobank, we estimate this proportion to be around 10%, as the sample is healthier and of higher SES than the general population (Sup. Figure 2; Sup. Table 2) 46–48.”

      (3) The optimization of the smoother based on MSE would benefit from more explanation and consideration. How was the flexibility of the model taken into account in testing? Were there any concerns about post-selection inference? A sensitivity analysis across bandwidths is also necessary. Based on the model fit in Figure 1, results from a linear model should also be compared. 

      It is common in the RD literature to illustrate plots with higher-order polynomial fits while inference is based on linear (or at most quadratic) models (Cattaneo, Idrobo & Titiunik, 2019). We agree that this field-specific practice can be confusing to readers. Therefore, we have redone Figure 1 using local-linear fits better aligning with our analysis pipeline. Yet, it is still not a one-to-one alignment as point estimation and confidence are handled robustly while our plotting tools are simple linear fits. In addition, we updated Sup. Fig 3 and moved 3rd-order polynomial RD plots to Sup. Fig 4.

      Empirical RD has many branching analytical decisions (bandwidth, polynomial order, kernel) which can have large effects on the outcome. Fortunately, RD methodology is starting to become more standardized (Catteneo & Titiunik, 2022, Ann. Econ Rev) as there have been indications of publication bias using these methods (Stommes, Aronow & Sävje, 2023, Research and Politics (This paper suggest it is not researcher degrees of freedom, rather inappropriate inferential methods)). While not necessarily ill-intended, researcher degrees of freedom and analytic flexibility are major contributors to publication bias. We (self) limited our analytic flexibility by using pre-registration (https://osf.io/rv38z).

      One of the most consequential analytic decisions in RD is the bandwidth size as there is no established practice, they are context-specific and can be highly influential on the results. The choice of bandwidths can be framed as a ‘bias vs. variance trade-off’. As bandwidths increase, variance decreases since more subjects are added yet bias (misspecification error/smoothing bias) also increases (as these subjects are further away and less similar). In our case, our assignment (running/forcing) variable is ‘date of birth in months’; therefore our smallest comparison would be individuals born in August 1957 (unaffected/no treatment) vs September 1957 (affected/treated). This comparison has the least bias (subjects are the most similar to each other), yet it comes at the expense of very few subjects (high variance in our estimate). 

      MSE-derived bandwidths attempt to solve this issue by offering an automatic method to choose an analysis bandwidth in RD. Specifically, this aims to minimize the MSE of the local polynomial RD point estimator – effectively choosing a bandwidth by balancing the ‘bias vs. variance trade-off’ (explained in detail 4.4.2 Cattaneo et al., 2019 p 45 - 51 “A practical introduction to regression discontinuity designs: foundations”). Yet, you are very correct in highlighting potential overfitting issues as they are “by construction invalid for inference” (Calonico, Cattaneo & Farrell, 2020, p. 192). Quoting from Cattaneo and Titiunik’s Annual Review of Economics from 2022: 

      “Ignoring the misspecification bias can lead to substantial overrejection of the null hypothesis of no treatment effect. For example, back-of-the-envelop calculations show that a nominal 95% confidence interval would have an empirical coverage of about 80%.”

      Fortunately, modern RD analysis packages (such as rdrohust or RDHonest) calculate robust confidence intervals - for more details see Armstrong and Kolesar (2020). For a summary on MSE-bandwidths see the section “Why is it hard to estimate RD effects?” in Stommes and colleagues 2023 (https://arxiv.org/abs/2109.14526). For more in-depth handling see the Catteneo, Idrobo, and Titiunik primer (https://arxiv.org/abs/1911.09511).

      Lastly, with MSE-derived bandwidths sensitivity tests only make sense within a narrow window of the MSE-optimized bandwidth (5.5 Cattaneo et al., 2019 p 106 - 107). When a significant effect occurs, placebo cutoffs (artificially moving the cutoff) and donut-hole analysis are great sensitivity tests. Instead of testing our bandwidths, we decided to use an alternate RD framework (local randomization) in which we compare 1-month and 5-month windows. Across all analysis strategies, MRI modalities, and brain regions, we do not find any effects of the education policy change ROSLA on long-term neural outcomes.

      (4) In the Bayesian analysis, the authors deviated from their preregistered analytic plan. This whole section is a bit confusing in its current form - for example, point masses are not wide but rather narrow. Bayes factors are usually estimated; it is unclear how or why a prior was specified. What exactly is being modeled using a prior? Also, throughout - If the log was taken, as the methods seem to indicate for the Bayes factor, this should be mentioned in figures and reported estimates. 

      First, we would like to thank you for spotting that we incorrectly kept the log in the methods. We have fixed this and added the following sentence to the methods: 

      “Bayes factors are reported as BF<sub>10</sub> in support of the alternative hypothesis, we report Bayes factors under 1 as the multiplicative inverse (BF<sub>01</sub> = 1/BF)”

      All Bayesian analyses need to have a prior. In practice, this becomes an issue when you’re uncertain about 1) the location of the effect (directionality & center mass, defined by a location parameter), yet more importantly, the 2) confidence/certainty of the range-spread of possible effects (determined by a scale parameter). In normally distributed priors these two ‘beliefs’ are represented with a mean and a standard deviation (the latter impacts your confidence/certainty on the range of plausible parameter space). 

      Supplementary figure 6 illustrates several distributions (location = 0 for all) with varying scale parameters; when used as Bayesian priors this indicates differing levels of confidence in our certainty of the plausible parameter space. We illustrate our three reported, normally distributed priors centered at zero in blue with their differing scale parameters (sd = .5, 1 & 1.5).

      All of these five prior distributions have the same location parameter (i.e., 0) yet varying differences in the scale parameter – our confidence in the certainty of the plausible parameter space. At first glance it might seem like a flat/uniform prior (not represented) is a good idea – yet, this would put equal weight on the possibility of every estimate thereby giving the same probability mass to implausible values as plausible ones. A uniform prior would, for instance, encode the hypothesis that education causing a 1% increase in brain volume is just as plausible as it causing either a doubling or halving in brain volume. In human research, we roughly know a range of reasonable effect sizes and it is rare to see massive effects.

      A benefit of ‘weakly-informative’ priors is that they limit the range of plausible parameter values. The default prior in STAN (a popular Bayesian estimation program; https://mc-stan.org) is a normally distributed prior with a mean of zero and an SD of 2.5 (seen in orange in the figure; our initial preregistered prior). This large standard deviation easily permits positive and negative estimates putting minimal emphasis on zero. Contrast this to BayesFactor package’s (Morey R, Rouder J, 2023) default “wide” prior which is the Cauchy distribution (0, .7) illustrated in magenta (for more on the Cauchy see: https://distribution-explorer.github.io/continuous/cauchy.html). 

      These different defaults reflect differing Bayesian philosophical schools (‘estimate parameters’ vs ‘quantify evidence’ camps); if your goal is to accurately estimate a parameter it would be odd to have a strong null prior, yet (in our opinion) when estimating point-null BF’s a wide default prior gives far too much evidence in support of the null. In point-null BF testing the Savage-Dickey density ratio is the ratio between the height of the prior at 0 and the height of the posterior at zero (see Figure under section “testing against point null 0”). This means BFs can be very prior sensitive (seen in SI tables 5 & 6). For this reason, we thought it made sense to do prior sensitivity testing, to ensure our conclusions in favor of the null were not caused solely by an overly wide prior (preregistered orange distribution) we decided to report the 3 narrower priors (blue ones).

      Alternative Bayesian null hypotheses testing methods such as using Bayes Factors to test against a null region and ‘region of practical equivalence testing’ are less prior sensitive, yet both methods demand the researcher (e.g. ‘us’) to decide on a minimal effect size of practical interest. Once a minimal effect size of interest is determined any effect within this boundary is taken as evidence in support of the null hypothesis.

      (5) It is unclear why a different method was employed for the August / September data analysis compared to the full-time series. 

      We used a local-randomization RD framework, an entirely different empirical framework than continuity methods (resulting in a different estimate). For an overview see the primer by Cattaneo, Idrobo & Titiunik 2023 (“A Practical Introduction to Regression Discontinuity Designs: Extensions”; https://arxiv.org/abs/2301.08958).

      A local randomization framework is optimal when the running variable is discrete (as in our case with DOB in months) (Cattaneo, Idrobo & Titiunik 2023). It makes stronger assumptions on exchangeability therefore a very narrow window around the cutoff needs to be used. See Figure 2.1 and 2.2 (in the Cattaneo, Idrobo & Titiunik 2023) for graphical illustrations of 1) a randomized experiment, 2) a continuity RD design, and 3) local-randomization RD. Using the full-time series in a local randomization analysis is not recommended as there is no control for differences between individuals as we move further away from the cutoff – making the estimated parameter highly endogenous.

      We understand how it is confusing to have both a new framework and Bayesian methods (we could have chosen a fully frequentist approach) but using a different framework allows us to weigh up the aforementioned ‘bias vs variance tradeoff’ while Bayesian methods allow us to say something about the weight of evidence (for or against) our hypothesis.

      (6) Figure 1 - why not use model fits from those employed for hypothesis testing? 

      This is a great suggestion (ties into #3), we have now redone Figure 1.

      (7) The section on "correlational effect" might also benefit from additional analyses and clarifications. Indeed, the data come from the same randomized experiment for which minimum education requirements were adjusted. Was the only difference that the number of years of education was studied as opposed to the cohort? If so, would the results of this analysis be similar in another subsample of the UK Biobank for which there was no change in policy?

      We have clarified the methods section for the correlational/associational effect. This was the same subset of individuals for the local randomization analysis; all we did was change the independent variable from an exogenous dummy-coded ROSLA term (where half of the sample had the natural experiment) to a continuous (endogenous) educational attainment IV. 

      In principle, the results from the associational analysis should be exactly the same if we use other UK Biobank cohorts. To see if the association of education attainment with the global neuroimaging cohorts was similar across sub-cohorts of new individuals, we conducted post hoc Bayesian analysis on eight more subcohort of 10-month intervals, spaced 2 years apart from each other (Sup. Figure 7; each indicated by a different color). Four of these sub-cohorts predate ROSLA, while the other four are after ROSLA. Educational attainment is slowly increasing across the cohorts of individuals born from 1949 until 1965; intriguingly the effect of ROSLA is visually evident in the distributions of educational attainment (Sup. Figure 7). Also, as seen in the cohorts predating ROSLA more and more individuals were (already) choosing to stay in education past 15 years of age (see cohort 1949 vs 1955 in Sup. Figure 7).

      Sup. Figure 8 illustrates boxplots of the educational attainment posterior of the eight sub-cohorts in addition to our original analysis (s1957) using a normal distributed prior with a mean of 0 and a sd of 1. Total surface area shows a remarkably replicable association with education attainment. Yet, it is evident the “extremely strong” association we found for CSF was a statistical fluke – as the posterior of other cohorts (bar our initial test) crosses zero. The conclusions for the other global neuroimaging covariates where we concluded ‘no associational effect’ seems to hold across cohorts.

      We have now added methods, deviation from preregistration, and the following excerpt to the results:

      “A post hoc replication of this associational analysis in eight additional 10-month cohorts spaced two years apart (Sup. Figure 7) indicates our preregistered report on the associational effect of educational attainment on CSF to be most likely a false-positive (Sup. Figure 8). Yet, the positive association between surface area and educational attainment is robust across the additional eight replication cohorts.”

      Reviewer #2 (Public review): 

      Summary: 

      The authors conduct a causal analysis of years of secondary education on brain structure in late life. They use a regression discontinuity analysis to measure the impact of a UK law change in 1972 that increased the years of mandatory education by 1 year. Using brain imaging data from the UK Biobank, they find essentially no evidence for 1 additional year of education altering brain structure in adulthood. 

      Strengths: 

      The authors pre-registered the study and the regression discontinuity was very carefully described and conducted. They completed a large number of diagnostic and alternate analyses to allow for different possible features in the data. (Unlike a positive finding, a negative finding is only bolstered by additional alternative analyses). 

      Weaknesses: 

      While the work is of high quality for the precise question asked, ultimately the exposure (1 additional year of education) is a very modest manipulation and the outcome is measured long after the intervention. Thus a null finding here is completely consistent educational attainment (EA) in fact having an impact on brain structure, where EA may reflect elements of training after a second education (e.g. university, post-graduate qualifications, etc) and not just stopping education at 16 yrs yes/no. 

      The work also does not address the impact of the UK Biobank's well-known healthy volunteer bias (Fry et al., 2017) which is yet further magnified in the imaging extension study (Littlejohns et al., 2020). Under-representation of people with low EA will dilute the effects of EA and impact the interpretation of these results. 

      References: 

      Fry, A., Littlejohns, T. J., Sudlow, C., Doherty, N., Adamska, L., Sprosen, T., Collins, R., & Allen, N. E. (2017). Comparison of Sociodemographic and Health-Related Characteristics of UK Biobank Participants With Those of the General Population. American Journal of Epidemiology, 186(9), 1026-1034. https://doi.org/10.1093/aje/kwx246 

      Littlejohns, T. J., Holliday, J., Gibson, L. M., Garratt, S., Oesingmann, N., Alfaro-Almagro, F., Bell, J. D., Boultwood, C., Collins, R., Conroy, M. C., Crabtree, N., Doherty, N., Frangi, A. F., Harvey, N. C., Leeson, P., Miller, K. L., Neubauer, S., Petersen, S. E., Sellors, J., ... Allen, N. E. (2020). The UK Biobank imaging enhancement of 100,000 participants: rationale, data collection, management and future directions. Nature Communications, 11(1), 2624. https://doi.org/10.1038/s41467-020-15948-9 

      We thank the reviewer for the positive comments and constructive feedback, in particular, their emphasis on volunteer bias in UKB (similar points were mentioned by Reviewer 3). We have now addressed these limitations with the following passage in the discussion:

      “The UK Biobank is known to have ‘healthy volunteer bias’, as respondents tend to be healthier, more educated, and are more likely to own assets [71,72]. Various types of selection bias can occur in non-representative samples, impacting either internal (type 1) or external (type 2) validity. One benefit of a natural experimental design is that it protects against threats to internal validity from selection bias [43], design-based internal validity threats still exist, such as if volunteer bias differentially impacts individuals based on the cutoff for assignment. A more pressing limitation – in particular, for an education policy change – is our power to detect effects using a sample of higher-educated individuals. This is evident in our first stage analysis examining the percentage of 15-year-olds impacted by ROSLA, which we estimate to be 10% in neuro-UKB (Sup. Figure 2 & Sup. Table 2), yet has been reported to be 25% in the UK general population [41]. Our results should be interpreted for this subpopulation  (UK, 1973, from 15 to 16 years of age, compliers) as we estimate a ‘local’ average treatment effect [73]. Natural experimental designs such as ours offer the potential for high internal validity at the expense of external validity.”

      We also highlighted it both in the results and methods.

      We appreciate that one year of education may seem modest compared to the entire educational trajectory, but as an intervention, we disagree that one year of education is ‘a very modest manipulation’. It is arguably one of the largest positive manipulations in childhood development we can administer. If we were to translate a year of education into the language of a (cognitive) intervention, it is clear that the manipulation, at least in terms of hours, days, and weeks, is substantial. Prior work on structural plasticity (e.g., motor, spatial & cognitive training) has involved substantially more limited manipulations in time, intensity, and extent. There is even (limited) evidence of localized persistent long-term structural changes (Wollett & Maguire, 2011, Cur. Bio.).

      We have now also highlighted the limited generalizability of our findings since we estimate a ‘local’ average treatment effect. It is possible higher education (college, university, vocational schools, etc.) could impact brain structure, yet we see no theoretical reason why it would while secondary wouldn’t. Moreover, higher education education is even trickier to research empirically due to heightened self and administrative selection pressures. While we cannot discount this possibility, the impacts of endogenous factors such as genetics and socioeconomic status are most likely heightened. That being said, higher education offers exciting possibilities to compare more domain-specific processes (e.g., by comparing a philosophy student to a mathematics student). Causality could be tested in European systems with point entry into field-specific programs – allowing comparison of students who just missed entry criteria into one topic and settled for another.

      Regarding the amount of time following the manipulation, as we highlight in our discussion this is both a weakness and a strength. Viewed from a developmental neuroplasticity lens it would have been nice to have imaging immediately following the manipulation. Yet, from an aging perspective, our design has increased power to detect an effect.  

      Reviewer #2 (Recommendations for the authors): 

      (1) The authors assert there is no strong causal evidence for EA on brain structure. This overlooks work from Mendielian Randomisation, e.g. this careful work: https://pubmed.ncbi.nlm.nih.gov/36310536/ ... evidence from (good quality) MR studies should be considered. 

      We thank the reviewer for highlighting this well-done mendelian randomization study. We have now added this citation and removed previous claims on the “lack of causal evidence existing”. We refrain from discussing Mendelian randomization, as it it would need to be accompanied by a nuanced discussion on the strong limitations regarding EduYears-PGS in Mendelian randomization designs.

      (2) Tukey/Boxplot is a good name for your identification of outliers but your treatment of outliers has a well-recognized name that is missing: Windsorisation. Please add this term to your description to help the reader more quickly understand what was done. 

      Thanks, we have now added the term winsorized.

      (3) Nowhere is it plainly stated that "fuzzy" means that you allow for imperfect compliance with the exposure, i.e. some children born before the cut-off stayed in school until 16, and some born after the cut-off left school before 16. For those unfamiliar with RD it would be very helpful to explain this at or near the first reference of the term "fuzzy". 

      We have now clarified the term ‘fuzzy’ to the results and methods:

      methods:

      “RD designs, like ours, can be ‘fuzzy’ indicating when assignment only increases the probability of receiving it, in turn, treatment assigned and treatment received do not correspond for some units 33,53. For instance, due to cultural and historical trends, there was an increase in school attendance before ROSLA; most adolescents were continuing with education past 15 years of age (Sup Plot. 7b). Prior work has estimated that 25 percent of children would have left school a year earlier if not for ROSLA 41. Using the UK Biobank, we estimate this proportion to be around 10%, as the sample is healthier and of higher SES than the general population (Sup. Figure 2; Sup. Table 2) 46–48.”

      (4) Supplementary Figure 2 never states what the percentage actually measures. What exactly does each dot represent? Is it based on UK Biobank subjects with a given birth month? If so clarify. 

      Fixed!

      Reviewer #3 (Public review): 

      Summary: 

      This study investigates evidence for a hypothesized, causal relationship between education, specifically the number of years spent in school, and brain structure as measured by common brain phenotypes such as surface area, cortical thickness, total volume, and diffusivity. 

      To test their hypothesis, the authors rely on a "natural" intervention, that is, the 1972 ROSLA act that mandated an extra year of education for all 15-year-olds. The study's aim is to determine potential discontinuities in the outcomes of interest at the time of the policy change, which would indicate a causal dependence. Naturalistic experiments of this kind are akin to randomised controlled trials, the gold standard for answering questions of causality. 

      Using two complementary, regression-based approaches, the authors find no discernible effect of spending an extra year in primary education on brain structure. The authors further demonstrate that observational studies showing an effect between education and brain structure may be confounded and thus unreliable when assessing causal relationships. 

      Strengths: 

      (1) A clear strength of this study is the large sample size totalling up to 30k participants from the UK Biobank. Although sample sizes for individual analyses are an order of magnitude smaller, most neuroimaging studies usually have to rely on much smaller samples. 

      (2) This study has been preregistered in advance, detailing the authors' scientific question, planned method of inquiry, and intended analyses, with only minor, justifiable changes in the final analysis. 

      (3) The analyses look at both global and local brain measures used as outcomes, thereby assessing a diverse range of brain phenotypes that could be implicated in a causal relationship with a person's level of education. 

      (4) The authors use multiple methodological approaches, including validation and sensitivity analyses, to investigate the robustness of their findings and, in the case of correlational analysis, highlight differences with related work by others. 

      (5) The extensive discussion of findings and how they relate to the existing, somewhat contradictory literature gives a comprehensive overview of the current state of research in this area. 

      Weaknesses: 

      (1) This study investigates a well-posed but necessarily narrow question in a specific setting: 15-year-old British students born around 1957 who also participated in the UKB imaging study roughly 60 years later. Thus conclusions about the existence or absence of any general effect of the number of years of education on the brain's structure are limited to this specific scenario. 

      (2) The authors address potential concerns about the validity of modelling assumptions and the sensitivity of the regression discontinuity design approach. However, the possibility of selection and cohort bias remains and is not discussed clearly in the paper. Other studies (e.g. Davies et al 2018, https://www.nature.com/articles/s41562-017-0279-y) have used the same policy intervention to study other health-related outcomes and have established ROSLA as a valid naturalistic experiment. Still, quoting Davies et al. (2018), "This assumes that the participants who reported leaving school at 15 years of age are a representative sample of the sub-population who left at 15 years of age. If this assumption does not hold, for example, if the sampled participants who left school at 15 years of age were healthier than those in the population, then the estimates could underestimate the differences between the groups.". Recent studies (Tyrrell 2021, Pirastu 2021) have shown that UK Biobank participants are on average healthier than the general population. Moreover, the imaging sub-group has an even stronger "healthy" bias (Lyall 2022). 

      (3) The modelling approach used in this study requires that all covariates of no interest are equal before and after the cut-off, something that is impossible to test. Mentioned only briefly, the inclusion and exclusion of covariates in the model are not discussed in detail. Standard imaging confounds such as head motion and scanning site have been included but other factors (e.g. physical exercise, smoking, socioeconomic status, genetics, alcohol consumption, etc.) may also play a role. 

      We thank the reviewer for their numerous positive comments and have now attempted to address the first two limitations (generalizability and UKB bias) with the following passage in the discussion:

      “The UK Biobank is known to have ‘healthy volunteer bias’, as respondents tend to be healthier, more educated, and are more likely to own assets [71,72]. Various types of selection bias can occur in non-representative samples, impacting either internal (type 1) or external (type 2) validity. One benefit of a natural experimental design is that it protects against threats to internal validity from selection bias [43], design-based internal validity threats still exist, such as if volunteer bias differentially impacts individuals based on the cutoff for assignment. A more pressing limitation – in particular, for an education policy change – is our power to detect effects using a sample of higher-educated individuals. This is evident in our first stage analysis examining the percentage of 15-year-olds impacted by ROSLA, which we estimate to be 10% in neuro-UKB (Sup. Figure 2 & Sup. Table 2), yet has been reported to be 25% in the UK general population [41]. Our results should be interpreted for this subpopulation  (UK, 1973, from 15 to 16 years of age, compliers) as we estimate a ‘local’ average treatment effect [73]. Natural experimental designs such as ours offer the potential for high internal validity at the expense of external validity.”

      We further highlight this in the results section:

      “Compliance with ROSLA was very high (near 100%; Sup. Figure 2). However, given the cultural and historical trends leading to an increase in school attendance before ROSLA, most adolescents were continuing with education past 15 years of age before the policy change (Sup Plot. 7b). Prior work has estimated 25 percent of children would have left school a year earlier if not for ROSLA 41. Using the UK Biobank, we estimate this proportion to be around 10%, as the sample is healthier and of higher SES than the general population (Sup. Figure 2; Sup. Table 2) 46–48.”

      Healthy volunteer bias can create two types of selection bias; crucially participation itself can serve as a collider threatening internal validity (outlined in van Alten et al., 2024; https://academic.oup.com/ije/article/53/3/dyae054/7666749). Natural experimental designs are partially sheltered from this major limitation, as ‘volunteer bias’ would have to differentially impact individuals on one side of the cutoff and not the other – thereby breaking a primary design assumption of regression discontinuity. Substantial prior work (including this article) has not found any threats to the validity of the 1973 ROSLA (Clark & Royer 2010, 2013; Barcellos et al., 2018, 2023; Davies et al., 2018, 2023). While the Davies 2028 article did IP-weight with the UK Biobank sample, Barcellos and colleagues 2023 (and 2018) do not, highlighting the following “Although the sample is not nationally representative,  our estimates have internal validity because there is no differential selection on the two sides of the September 1, 1957 cutoff – see  Appendix A.”.

      The second (more acknowledged & arguably less problematic) type of selection bias results in threats to external validity (aka generalizability). As highlighted in your first point; this is a large limitation with every natural experimental design, yet in our case, this is further amplified by the UK Biobank’s healthy volunteer bias. We have now attempted to highlight this limitation in the discussion passage above.

      Point 3 – the inability to fully confirm design validity – is again, another inherent limitation of a natural experimental approach. That being said, extensive prior work has tested different predetermined covariates in the 1973 ROSLA (cited within), and to our knowledge, no issues have been found. The 1973 ROSLA seems to be one of the better natural experiments around (there was also a concerted effort to have an ‘effective’ additional year; see Clark & Royer 2010). For these reasons, we stuck with only testing the variables we wanted to use to increase precision (also offering new neuroimaging covariates that didn’t exist in the literature base). One additional benefit of ROSLA was that the cutoff was decided years later on a variable that happened (date of birth) in the past – making it particularly hard for adolescents to alter their assignments.

      Reviewer #3 (Recommendations for the authors): 

      (1) FMRIB's preprocessing pipeline is mentioned. Does this include deconfounding of brain measures? Particularly, were measures deconfounded for age before the main analysis? 

      This is such a crucial point that we triple-checked, brain imaging phenotypes were not corrected for age (https://biobank.ctsu.ox.ac.uk/crystal/crystal/docs/brain_mri.pdf) – large effects of age can be seen in the global metrics; older individuals have less surface area, thinner cortices, less brain volume (corrected for head size), more CSF volume (corrected for head size), more white matter hyperintensities, and worse FA values. Figure 1 shows these large age effects, which are controlled for in our continuity-based RD analysis.

      One’s date of birth (DOB) of course does not match perfectly to their age, this is why we included the covariate ‘visit date’; this interplay can now be seen in our updated SI Figure 1 (recommended in #3) which shows the distributions of visit date, DOB, and age of scan. 

      In a valid RD design covariates should not be necessary (as they should be balanced on either side of the cutoff), yet the inclusion of covariates does increase precision to detect effects. We tested this assumption, finding the effect of ‘visit date’ and its quadratic term to be not related to ROSLA (Sup. Table 1). This adds further evidence (specific to the UK Biobank sample) to the existing body of work showing the 1973 ROSLA policy change to not violate any design assumptions. Threats to internal validity would more than likely increase endogeneity and result in ‘false causal positive causal effects’ (which is not what we find).  

      (2) Despite the large overall sample size, I am wondering whether the effective number of samples is sufficient to detect a potentially subtle effect that is further attenuated by the long time interval before scanning. As stated, for the optimised bandwidth window (DoB 20 to 35 months around cut-off), N is about 5000. Does this mean that effectively about 250 (10%) out of about 2500 participants born after the cut-off were leaving school at 16 rather than 15 because of ROSLA? For the local randomisation analysis, this becomes about N=10 (10% out of 100). Could a power analysis show that these cohort sizes are large enough to detect a reasonably large effect? 

      This is a very valid point, one which we were grappling with while the paper was out for review. We now draw attention to this in the results and highlight this as a limitation in the discussion. While UKB’s non-representativeness limits our power (10% affected rather than 25% in the general population), it is still a very large sample. Our sample size is more in line with standard neuroimaging studies than with large cohort studies. 

      The novelty of our study is its causal design, while we could very precisely measure an effect of some phenotype (variable X) in 40,000 individuals. This effect is probably not what we think we are measuring. Without IP-weighting it could even have a different sign. But more importantly, it is not variable X – it is the thousands of things (unmeasured confounders) that lead an individual to have more or less of variable X. The larger the sample the easier it is for small unmeasured confounders to reach significance (Big data paradox) – this in no way invalidates large samples, it is just our thinking and how we handle large samples will hopefully change to a more casual lens.

      (3) Supplementary Figure 1: A similar raincloud plot of date of birth would be instructive to visualise the distribution of subjects born before and after the 1957 cut-off. 

      Great idea! We have done this in Sup Fig. 1 for both visit date and DOB.

      (4) p.9: Not sure about "extreme evidence", very strong would probably be sufficient. 

      As preregistered, we interpreted Bayes Factors using Jeffrey’s criteria. ‘Extreme evidence’ is only used once and it is about finding an associational effect of educational attainment on CSF (BF10 > 100). Upon Reviewer 1’s recommendation 7, we conducted eight replication samples (Sup. Figure 7 & 8) and have now added the following passage to the results:

      “A post hoc replication of this associational analysis in eight additional 10-month cohorts spaced two years apart (Sup. Figure 7) indicates our preregistered report on the associational effect of educational attainment on CSF to be most likely a false-positive (Sup. Figure 8). Yet, the positive association between surface area and educational attainment is robust across the additional eight replication cohorts.”

      (5) The code would benefit from a bit of clean-up and additional documentation. In its current state, it is not easy to use, e.g. in a replication study. 

      We have now further added documentation to our code; including a readme describing what each script does. The analysis pipeline used is not ideal for replications as the package used for continuity-based RD (RDHonest) initially could not handle covariates – therefore we manually corrected our variables after a discussion with Prof Kolesár (https://github.com/kolesarm/RDHonest/issues/7). 

      Prof Kolesár added this functionality recently and future work should use the latest version of the package as it can correct for covariates. We have a new preprint examining the effect of 1972 ROLSA on telomere length in the UK Biobank using the latest package version of RDHonest (https://www.biorxiv.org/content/10.1101/2025.01.17.633604v1). To ensure maximum availability of such innovations, we will ensure the most up-to-date version of this script becomes available on this GitHub link (https://github.com/njudd/EduTelomere).

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      In a heroic effort, Ozanna Burnicka-Turek et al. have made and investigated conduction system-specific Tbx3-Tbx5 deficient mice and investigated their cardiac phenotype. Perhaps according to expectations, given the body of literature on the function of the two T-box transcription factors in the heart/conduction system, the cardiomyocytes of the ventricular conduction system seemed to convert to "ordinary" ventricular working myocytes. As a consequence, loss of VCS-specific conduction system propagation was observed in the compound KO mice, associated with PR and QRS prolongation and elevated susceptibility to ventricular tachycardia.

      Strengths:

      Great genetic model. Phenotypic consequences at the organ and organismal levels are well investigated. The requirement of both Tbx3 and Tbx5 for maintaining VCS cell state has been demonstrated.

      We thank Reviewer #1 for acknowledging the effort involved in generating and characterizing the Tbx3/Tbx5 double conditional knockout mouse model and for highlighting the significance of this work in elucidating the role of these transcription factors in maintaining the functional and transcriptional identity of the ventricular conduction system. 

      Weaknesses:

      The actual cell state of the Tbx3/Tbx5 deficient conducting cells was not investigated in detail, and therefore, these cells could well only partially convert to working cardiomyocytes, and may, in reality, acquire a unique state.

      We agree with Reviewer #1 that the Tbx3/Tbx5 double mutant ventricular conduction myocardial cells may only partially convert to working cardiomyocytes or may acquire a unique state.  The transcriptional state of the double mutant VCS cells was investigated by bulk profiling of key genes associated with specific conduction and non-conduction cardiac regions, including fast conduction, slow conduction, or working myocardium. Neither the bulk transcriptional approaches nor the optical mapping approaches we employed capture single-cell data; in both cases, the data represents aggregated signals from multiple cells (1, 2). Single cell approaches for transcriptional profiling and cellular electrophysiology would clarify this concern and are appropriate for future studies. 

      (1) O’Shea C, Nashitha Kabri S, Holmes AP, Lei M, Fabritz L, Rajpoot K, Pavlovic D (2020) Cardiac optical mapping – State-of-the-art and future challenges. The International Journal of Biochemistry & Cell Biology 126:105804. doi: 10.1016/j.biocel.2020.105804. (2) Efimov IR, Nikolski VP, and Salama G (2004) Optical Imaging of the Heart. Circulation Research 95:21-33. doi: 10.1161/01.RES.0000130529.18016.35.

      Reviewer #2 (Public review):

      Summary:

      The goal of this work is to define the functions of T-box transcription factors Tbx3 and Tbx5 in the adult mouse ventricular cardiac conduction system (VCS) using a novel conditional mouse allele in which both genes are targeted in cis. A series of studies over the past 2 decades by this group and others have shown that Tbx3 is a transcriptional repressor that patterns the conduction system by repressing genes associated with working myocardium, while Tbx5 is a potent transcriptional activator of "fast" conduction system genes in the VCS. In a previous work, the authors of the present study further demonstrated that Tbx3 and Tbx5 exhibit an epistatic relationship whereby the relief of Tbx3-mediated repression through VCS conditional haploinsufficiency allows better toleration of Tbx5 VCS haploinsufficiency. Conversely, excess Tbx3-mediated repression through overexpression results in disruption of the fast-conduction gene network despite normal levels of Tbx5. Based on these data the authors proposed a model in which repressive functions of Tbx3 drive the adoption of conduction system fate, followed by segregation into a fast-conducting VCS and slow-conduction AVN through modulation of the Tbx5/Tbx3 ratio in these respective tissue compartments.

      The question motivating the present work is: If Tbx5/Tbx3 ratio is important for slow versus fast VCS identity, what happens when both genes are completely deleted from the VCS? Is conduction system identity completely lost without both factors and if so, does the VCS network transform into a working myocardium-like state? To address this question, the authors have generated a novel mouse line in which both Tbx5 and Tbx3 are floxed on the same allele, allowing complete conditional deletion of both factors using the VCS-specific MinK-CreERT2 line, convincingly validated in previous work. The goal is to use these double conditional knockout mice to further explore the model of Tbx3/Tbx5 co-dependent gene networks and VCS patterning. First, the authors demonstrate that the double conditional knockout allele results in the expected loss of Tbx3 and Tbx5 specifically in the VCS when crossed with Mink-CreERT2 and induced with tamoxifen. The double conditional knockout also results in premature mortality. Detailed electrophysiological phenotyping demonstrated prolonged PR and QRS intervals, inducible ventricular tachycardia, and evidence of abnormal impulse propagation along the septal aspect of the right ventricle. In addition, the mutants exhibit downregulation of VCS genes responsible for both fast conduction AND slow conduction phenotypes with upregulation of 2 working myocardial genes including connexin-43. The authors conclude that loss of both Tbx3 and Tbx5 results in "reversion" or "transformation" of the VCS network to a working myocardial phenotype, which they further claim is a prediction of their model and establishes that Tbx3 and Tbx5 "coordinate" transcriptional control of VCS identity.

      We appreciate Reviewer #2’s detailed summary of the study’s aims, methodologies, and findings, as well as their thoughtful suggestions for further analysis. We are grateful for their recognition of our genetic model’s novelty and robustness.

      Overall Appraisal:

      As noted above, the present study does not further explore the Tbx5/Tbx3 ratio concept since both genes are completely knocked out in the VCS. Instead, the main claims are that the absence of both factors results in a transcriptional shift of conduction tissue towards a working myocardial phenotype, and that this shift indicates that Tbx5 and Tbx3 "coordinate" to control VCS identity and function.

      We agree with this reviewer’s assessment of the assertions in our manuscript.  The novel combined Tbx5/Tbx3 double mutant model does not further explore the TBX5/TBX3 ratio concept, which we previously examined in detail (1). Instead, as the Reviewer notes, this manuscript focuses on testing a model that the coordinated activity of Tbx3 and Tbx5 defines specialized ventricular conduction identity. 

      (1) Burnicka-Turek O, Broman MT, Steimle JD, Boukens BJ, Petrenko NB, Ikegami K, Nadadur RD, Qiao Y, Arnolds DE, Yang XH, Patel VV, Nobrega MA, Efimov IR, Moskowitz IP (2020) Transcriptional Patterning of the Ventricular Cardiac Conduction System. Circulation Research 127:e94-e106. doi:10.1161/CIRCRESAHA.118.314460. 

      Strengths:

      (1) Successful generation of a novel Tbx3-Tbx5 double conditional mouse model.

      (2) Successful VCS-specific deletion of Tbx3 and Tbx5 using a VCS-specific inducible Cre driver line.

      (3) Well-powered and convincing assessments of mortality and physiological phenotypes. (4) Isolation of genetically modified VCS cells using flow.

      We thank Reviewer #2 for acknowledging the listed strengths of our study.

      Weaknesses:

      (1) In general, the data is consistent with a long-standing and well-supported model in which Tbx3 represses working myocardial genes and Tbx5 activates the expression of VCS genes, which seem like distinct roles in VCS patterning. However, the authors move between different descriptions of the functional relationship and epistatic relationship between these factors, including terms like "cooperative", "coordinated", and "distinct" at various points. In a similar vein, sometimes terms like "reversion" are used to describe how VCS cells change after Tbx3/Tbx5 conditional knockout, and other times "transcriptional shift" and at other times "reprogramming". But these are all different concepts. The lack of a clear and consistent terminology for describing the phenomena observed makes the overarching claims of the manuscript more difficult to evaluate.

      We discriminate prior work on the “long-standing and well-supported model’ supported by investigation of the role of Tbx5 and Tbx3 independently from this work examining the coordinated role of Tbx5 and Tbx3. Prior work demonstrated that Tbx3 represses working myocardial genes and Tbx5 activates expression of VCS genes, consistent with the reviewer’s suggestion of their distinct roles in VCS patterning. However, the current study uniquely evaluates the combined role of Tbx3 and Tbx5 in distinguishing specialized conduction identify from working myocardium, for the first time. 

      We appreciate Reviewer #2’s feedback regarding the need for consistent terminology when describing the impact of the double Tbx3 and Tbx5 mutant. We will edit the manuscript to replace terms like “reversion” with “transcriptional shift” or “transformation” when describing the observed phenotype, and we will use “coordination” to describe the combined role of Tbx5 and Tbx3 in maintaining VCS-specific identity.

      (2) A more direct quantitative comparison of Tbx5 Adult VCS KO with Tbx5/Tbx3 Adult VCS double KO would be helpful to ascertain whether deletion of Tbx3 on top of Tbx5 deletion changes the underlying phenotype in some discernable way beyond mRNA expression of a few genes. Superficially, the phenotypes look quite similar at the EKG and arrhythmia inducibility level and no optical mapping data from a single Tbx5 KO is presented for comparison to the double KO.

      We thank Reviewer #2 for the suggestions that a direct comparison between Tbx5 single conditional knockout and Tbx3/Tbx5 double conditional knockout models may help isolate the specific contribution of Tbx3 deletion in addition to Tbx5 deletion. 

      Previous studies have assessed the effect of single Tbx5 CKO in the VCS of murine hearts (1, 3, 5). Arnolds et al. demonstrated that the removal of Tbx5 from the adult ventricular conduction system results in VCS slowing, including prolonged PR and QRS intervals, prolongation of the His duration and His-ventricular (HV) interval (3).

      Furthermore, Burnicka-Turek et al. demonstrated that the single conditional knockout of Tbx5 in the adult VCS caused a shift toward a pacemaker cell state, with ectopic beats and inappropriate automaticity (1). Whole-cell patch clamping of VCS-specific Tbx5 deficient cells revealed action potentials characterized by a slower upstroke (phase 0), prolonged plateau (phase 2), delayed repolarization (phase 3), and enhanced phase 4 depolarization - features characteristic of nodal action potentials rather than typical VCS action potentials (3). These observations were interpreted as uncovering nodal potential of the VCS in the absence of Tbx5. Based on the role of Tbx3 in CCS specification (2), we hypothesized that the nodal state of the VCS uncovered in the absence of Tbx5 was enabled by maintained Tbx3 expression. This motivated us to generate the double Tbx5

      / Tbx3 knockout model to examine the state of the VCS in the absence of both T-box TFs. In the current study, we demonstrate that the VCS-specific deletion of Tbx3 and Tbx5 results in the loss of fast electrical impulse propagation in the VCS, similar to that observed in the single Tbx5 mutant. However, unlike the Tbx5 single mutant, the Tbx3/Tbx5 double deletion does not cause a gain of pacemaker cell state in the VCS. Instead, the physiological data suggests a transition toward non-conduction working myocardial physiology. This conclusion is supported by the presence of only a single upstroke in the optical action potential (OAP) recorded from the His bundle region and VCS cells in Tbx3/Tbx5 double conditional knockout mice. The electrical properties of VCS cells in the double knockout are functionally indistinguishable from those of ventricular working myocardial cells. As a result, ventricular impulse propagation is significantly slowed, resembling activation through exogenous pacing rather than the rapid conduction typically associated with the VCS. We will edit the text of the manuscript to more carefully distinguish the observations between these models, as suggested.

      (1) Burnicka-Turek O, Broman MT, Steimle JD, Boukens BJ, Petrenko NB, Ikegami K, Nadadur RD, Qiao Y, Arnolds DE, Yang XH, Patel VV, Nobrega MA, Efimov IR, Moskowitz IP (2020) Transcriptional Patterning of the Ventricular Cardiac Conduction System. Circulation Research 127:e94-e106. doi:10.1161/CIRCRESAHA.118.314460. 

      (2) Mohan RA, Bosada FM, van Weerd JH, van Duijvenboden K, Wang J, Mommersteeg MTM, Hooijkaas IB, Wakker V, de Gier-de Vries C, Coronel R, Boink GJJ, Bakkers J, Barnett P, Boukens BJ, Christoffels VM (2020) T-box transcription factor 3 governs a transcriptional program for the function of the mouse atrioventricular conduction system. Proc Natl Acad Sci U S A. 117:18617-18626. doi: 10.1073/pnas.1919379117.

      (3) Arnolds DE, Liu F, Fahrenbach JP, Kim GH, Schillinger KJ, Smemo S, McNally EM, Nobrega MA, Patel VV, Moskowitz IP (2012) TBX5 drives Scn5a expression to regulate cardiac conduction system function. The Journal of Clinical Investigation 122:2509–2518. doi: 10.1172/JCI62617.

      (4) Frank DU, Carter KL, Thomas KR, Burr RM, Bakker ML, Coetzee WA, Tristani-Firouzi M, Bamshad MJ, Christoffels VM, Moon AM (2012) Lethal arrhythmias in Tbx3-deficient mice reveal extreme dosage sensitivity of cardiac conduction system function and homeostasis. Proc Natl Acad Sci U S A. 109:E154-63. doi: 10.1073/pnas.1115165109.

      (5) Moskowitz IP, Pizard A, Patel VV, Bruneau BG, Kim JB, Kupershmidt S, Roden D, Berul CI, Seidman CE, Seidman JG (2004) The T-Box transcription factor Tbx5 is required for the patterning and maturation of the murine cardiac conduction system. Development 131:4107-4116. doi: 10.1242/dev.01265. PMID: 15289437.

      (3) The authors claim that double knockout VCS cells transform to working myocardial fate, but there is no comparison of gene expression levels between actual working myocardial cells and the Tbx3/Tbx5 DKO VCS cells so it's hard to know if the data reflect an actual cell state change or a more non-specific phenomenon with global dysregulation of gene expression or perhaps dedifferentiation. I understand that the upregulation of Gja1 and Smpx is intended to address this, but it's only two genes and it seems relevant to understand their degree of expression relative to actual working myocardium. In addition, the gene panel is somewhat limited and does not include other key transcriptional regulators in the VCS such as Irx3 and Nkx2-5. RNA-seq in these populations would provide a clearer comparison among the groups.

      And

      the main claims are that the absence of both factors results in a transcriptional shift of conduction tissue towards a working myocardial phenotype, and that this shift indicates that Tbx5 and Tbx3 "coordinate" to control VCS identity and function. However, only limited data are presented to support the claim of transcriptional reprogramming since the knockout cells are not directly compared to working myocardial cells at the transcriptional level and only a small number of key genes are assessed (versus genome-wide assessment).

      We appreciate Reviewer #2’s suggestion to expand the gene expression analysis in Tbx3/Tbx5-deficient VCS cells by including other specific genes and comparisons with “native”/actual working ventricular myocardial cells and broadening the gene panel. In this study, we evaluated core cardiac conduction system markers, revealing a loss of conduction system-specific gene expression in the double mutant VCS. Furthermore, we evaluated key working myocardial markers normally excluded from the conduction system, Gja1 and Smpx, revealing a shift towards a working myocardial state in the double mutant VCS (Figure 4). We agree that a more comprehensive analysis, such as transcriptome-wide approaches, would offer greater clarity on the extent and specificity of the observed shift from conduction to non-conduction identity. These approaches are appropriate directions for future studies.

      (4) From the optical mapping data, it is difficult to distinguish between the presence of (a) a focal proximal right bundle branch block due to dysregulation of gene expression in the VCS but overall preservation of the right bundle and its distal ramifications; from (b) actual loss of the VCS with reversion of VCS cells to a working myocardial fate. Related to this, the authors claim that this experiment allows for direct visualization of His bundle activation, but can the authors confirm or provide evidence that the tissue penetration of their imaging modality allows for imaging of a deep structure like the AV bundle as opposed to the right bundle branch which is more superficial? Does the timing of the separation of the sharp deflection from the subsequent local activation suggest visualization of more distal components of the VCS rather than the AV bundle itself? Additional clarification would be helpful.

      And

      In addition, the optical mapping dataset is incomplete and has alternative interpretations that are not excluded or thoroughly discussed.

      We agree with Reviewer #2 that the resolution of the optical mapping experiment may be insufficient to precisely localize the conduction block due to the limited signal strength from the VCS. It is possible that the region defined as the His Bundle also includes portions of the right bundle branch. Our control mice show VCS OAP upstrokes consistent with those reported by Tamaddon et al. (2000) using Di-4-ANEPPS (1). We appreciate the Reviewer’s attention to alternative interpretations, and we will incorporate these caveats into the manuscript text. 

      (1) Tamaddon HS, Vaidya D, Simon AM, Paul DL, Jalife J, Morley GE (2000) Highresolution optical mapping of the right bundle branch in connexin40 knockout mice reveals slow conduction in the specialized conduction system. Circulation Research 87:929-36. doi: 10.1161/01.res.87.10.929. 

      Impact:

      The present study contributes a novel and elegantly constructed mouse model to the field. The data presented generally corroborate existing models of transcriptional regulation in the VCS but do not, as presented, constitute a decisive advance.

      And

      In sum, while this study adds an elegantly constructed genetic model to the field, the data presented fit well within the existing paradigm of established functions of Tbx3 and Tbx5 in the VCS and in that sense do not decisively advance the field. Moreover, the authors' claims about the implications of the data are not always strongly supported by the data presented and do not fully explore alternative possibilities.

      We appreciate Reviewer # 2’s acknowledgment of the elegance and novelty of the mouse model we generated. However, we respectfully disagree with their assessment that this work merely corroborates existing models without providing a decisive advance. Previous studies have investigated single Tbx5 or Tbx3 gene knockouts in-depth and established the T-box ratio model for distinguishing fast VCS from slow nodal conduction identity (1) that the reviewer alludes to in earlier comments. In contrast, this study aimed to explore a different model, that the combined effects of Tbx5 and Tbx3 distinguish adult VCS identity from non-conduction working myocardium. The coordinated Tbx3 and Tbx5 role in conduction system identify remained untested due to the lack of a mouse model that allowed their simultaneous removal. The very model the reviewer recognizes as “novel and elegantly constructed” has allowed the examination of the coordinated role of Tbx5 and Tbx3 for the first time. While we acknowledge the opportunity for additional depth of investigation of this model in future studies, the data we present provides consistent experimental support for the coordinated requirement of both Tbx5 and Tbx3 for ventricular cardiac conduction system identity. 

      (1) Burnicka-Turek O, Broman MT, Steimle JD, Boukens BJ, Petrenko NB, Ikegami K, Nadadur RD, Qiao Y, Arnolds DE, Yang XH, Patel VV, Nobrega MA, Efimov IR, Moskowitz IP (2020) Transcriptional Patterning of the Ventricular Cardiac Conduction System. Circulation Research 127:e94-e106. doi:10.1161/CIRCRESAHA.118.314460. 

      Reviewer #3 (Public review):

      Summary:

      In the study presented by Burnicka-Turek et al., the authors generated for the first time a mouse model to cause the combined conditional deletion of Tbx3 and Tbx5 genes. This has been impossible to achieve to date due to the proximity of these genes in chromosome 5, preventing the generation of loss of function strategies to delete simultaneously both genes. It is known that both Tbx3 and Tbx5 are required for the development of the cardiac conduction system by transcription factor-specific but also overlapping roles as seen in the common and diverse cardiac defects found in patients with mutations for these genes. After validating the deletion efficiency and specificity of the line, the authors characterized the cardiac phenotype associated with the cardiac conduction system (CCS)-specific combined deletion of T_bx5_ and Tbx3 in the adult by inducing the activation of the CCS-specific tamoxifen-inducible Cre recombination (MinKcreERT) at 6 weeks after birth. Their analysis of 8-9-week-old animals did not identify any major morphological cardiac defects. However, the authors found conduction defects including prolonged PR and QTR intervals and ventricular tachycardia causing the death of the double mutants, which do not survive more than 3 months after tamoxifen induction. Molecular and optical mapping analysis of the ventricular conduction system (VCS) of these mutants concluded that, in the absence of Tbx5 and Tbx3 function, the cells forming the ventricular conduction system (VCS) become working myocardium and lose the specific contractile features characterizing VCS cells. Altogether, the study identified the critical combined role of Tbx3 and Tbx5 in the maintenance of the VCS in adulthood.

      Strengths:

      The study generated a new animal model to study the combined deletion of Tbx5 and Tbx3 in the cardiac conduction system. This unique model has provided the authors with the perfect tool to answer their biological questions. The study includes top-class methodologies to assess the functional defects present in the different mutants analyzed, and gathered very robust functional data on the conduction defects present in these mutants. They also applied optical action potential (OAP) methods to demonstrate the loss of conduction action potential and the acquisition of working myocardium action potentials in the affected cells because of Tbx5/Tbx3 loss of function. The study used simpler molecular and morphological analysis to demonstrate that there are no major morphological defects in these mutants and that indeed, the conduction defects found are due to the acquisition of working myocardium features by the VCS cells. Altogether, this study identified the critical role of these transcription factors in the maintenance of the VCS in the adult heart.

      We appreciate the Reviewer’s comments regarding the originality and utility of our model and the strengths of our methodological approach. The Reviewer’s appreciation of the molecular and morphological analyses as well as their constructive feedback is highly valuable.

      Weaknesses:

      In the opinion of this reviewer, the weakness in the study lies in the morphological and molecular characterization. The morphological analysis simply described the absence of general cardiac defects in the adult heart, however, whether the CCS tissues are present or not was not investigated. Lineage tracing analysis using the reporter lines included in the crosses described in the study will determine if there are changes in CCS tissue composition in the different mutants studied. Similarly, combining this reporter analysis with the molecular markers found to be dysregulated by qPCR and western blot, will demonstrate that indeed the cells that were specified as VCS in the adult heart, become working myocardium in the absence of Tbx3 and Tbx5 function.

      We appreciate the reviewer’s concern regarding the morphology of the cardiac conduction system in the Tbx3/Tbx5 double conditional knockout model. We did not observe any structural abnormalities, as the Reviewer notes. We agree with their suggestion for using Genetic Inducible Fate Mapping to mark cardiac conduction cells expressing MinKCre. In fact, we utilized this approach to isolate VCS cells for transcriptional profiling. Specifically, we combined the tamoxifen-inducible MinKCreERT allele with the Cre-dependent R26Eyfp reporter allele to label MinKCre-expressing cells in both control VCS and VCS-specific double Tbx3/Tbx5 knockouts. EYFP-positive cells were isolated for transcriptional studies, ensuring that our analysis exclusively targeted conduction system-lineage marked cells. The ability to isolate MinKCre-marked cells from both controls and Tbx5/Tbx3 double mutants indicates that VCS cells persisted in the double knockout. Nonetheless, the suggestion for in-vivo marking by Genetic Inducible

      Fate Mapping and morphologic analysis is a valuable recommendation for future studies. 

      Reviewer #1 (Recommendations for the authors):

      In a heroic effort, Ozanna Burnicka-Turek et al. have made and investigated conduction system-specific Tbx3-Tbx5 deficient mice and investigated their cardiac phenotype. Perhaps according to expectations, given the body of literature on the function of the two T-box transcription factors in the heart/conduction system, the cardiomyocytes of the ventricular conduction system seemed to convert to "ordinary" ventricular working myocytes. As a consequence, loss of VCS-specific conduction system propagation was observed in the compound KO mice, associated with PR and QRS prolongation and elevated susceptibility to ventricular tachycardia.

      Previous work suggested the prediction that VCS-specific genetic ablation of both the TBX3 and TBX5 would transform fast-conducting adult VCS into cells resembling working myocardium, eliminating specialized CCS fate. The current study suggests that this prediction is at least to some extent accurate.

      We appreciate Reviewer #1’s summary and recognition of our study. As the review notes, the simultaneous deletion of Tbx3 and Tbx5 in the mature ventricular conduction system (VCS) suggests a conversion of VCS to "ordinary" ventricular working myocytes. To our knowledge, this represents a novel observation and experimental model that uniquely captures the combined roles of these essential T-box transcription factors. We believe that this model offers a valuable platform for further investigation into the transcriptional mechanisms underlying conduction system specialization.

      (1) The huge effort made to generate the DKO model contrasts with the limited efforts made to study the mechanism. Conditional deficiency of Tbx3 and Tbx5 creates an artificial situation that is useful for addressing fundamental mechanistic questions. The authors provide a rather superficial analysis of the changes in the VCS upon deletion of these two critically important factors and do not provide really novel insights into their requirement/function in the VCS gene regulatory network and epigenetic state. So to what extent do VCS cardiomyocytes (CMs) from Tbx3/5 DKO mice resemble "simple" working myocardium? To what extent do these cells acquire the working myocardial (epigenetic) state, do these cells have an epigenetic memory of the Tbx3/Tbx5+ history, is the enhancer usage between the modified VCS CMs and the working CMs similar or not, etc.? The assumption that the authors' data indicate that the DKO VCS CMs simply acquire a ventricular working "fate" is unlikely. Following this reasoning, the reverse experiment to induce Tbx3 and Tbx5 expression in working CMs would result in complete conversion to VCS CMs, which is also unlikely.

      To answer such questions, transcriptomic and epigenetic state analysis, electrophysiologic analysis (e.g. patch-clamp), cell/subcellular level analysis, etc. would be required, as well as a comparison of the changed state of the DKO VCS CMs to that of working CMs.

      This initial study focused on generating the Tbx3:Tbx5 double-conditional knockout model and characterizing the resulting physiological and molecular changes within the VCS. We analyzed transcriptomic markers of fast conduction (VCS), slow conduction (nodal), and non-conduction (working myocardium). Additionally, we applied optical mapping to evaluate the physiological consequences of the double knockout, which allowed a calculated AP of the VCS to be generated. We agree that a more in-depth mechanistic investigation of the VCS transformation upon Tbx3/Tbx5 deletion by transcriptomic or cellular electrophysiology could provide a deeper understanding of the precise transcriptional/epigenetic state of the VCS in the double knockout and clarify whether there is a partial or complete conversion of VCS cells to a simple working myocardial phenotype. The suggestions by the reviewer will be considered for future studies.

      (2) Tbx3 stimulates BMP-TGFb signaling (e.g. positive loop between Tbx3-Bmp2), which in turn stimulates EMT and modulates the behavior of endocardial and mesenchymal cells. Did the authors investigate the impact of Tbx3/5 DKO on non-CM cells in and around the VCS? (see also comment 1). The insulation of the AVB for example could be a Tbx3/5 non cell autonomous target.

      We appreciate the Reviewer’s suggestion to examine the impact of Tbx3/Tbx5 deletion on non-CM cells surrounding the VCS. While this is an intriguing avenue for future exploration, it falls outside the scope of the current study, which focused on the cardiomyocyte-specific roles of Tbx3 and Tbx5 in maintaining adult VCS identity.

      (3) The MinK-Cre line used (from the Moskowitz lab) also recombines in the AVN (Arnolds et al 2011). The authors do not mention changes in the AVN, and systematically call the line VCS specific (which refers to the AVB, BB, PVCS I assume). This could also impact the PR interval. Please address.

      The MinK-Cre line recombines in the atrioventricular bundle (AVB) and bundle branches (BB). It recombines in cardiomyocytes adjacent to the atrioventricular node (AVN). We previously interpreted these cells as the penetrating portion of the His bundle into the AVN. This line does not recombine in the vast majority, if any, physiologic nodal cells. We also assessed nodal conduction parameters by invasive electrophysiologic (EP) studies. Our data showed that non-VCS parameters, including sinus node recovery time, AV node recovery time, and atrial and ventricular effective refractory periods, remained within normal ranges in Tbx3:Tbx5-deficient mice (please see Figure 2I). These findings indicate that AVN function is preserved in the VCS-specific double knockout, reinforcing the specificity of the observed conduction defects to the ventricular conduction system.

      (4) Did the authors also investigate the electrophysiological changes in the (EGFP+) DKO VCS CMs? Would these resemble the properties of ventricular working CMs, or would they still show some VCS properties? (see also comment 1).

      We performed electrophysiologic analysis of the double knockout by optical mapping. Optical mapping provides tissue-level resolution, capturing the functional behavior of clusters of thousands of cells simultaneously, rather than individual cells. While this technique does not achieve single-cell resolution, it allows for a comprehensive assessment of electrophysiological changes across the VCS region. Single cell electrophysiology is a good idea for future studies. 

      (5) Throughout the manuscript, the authors use "patterning" and "fate", which are applicable to development and differentiation, not to the situation where a gene is removed from fully differentiated cells in an adult organism resulting in a change of these cells. Perhaps more appropriate are "state" change and the requirement for "homeostasis/maintenance" of state.

      We appreciate the Reviewer’s concern regarding the terminology used to describe changes in VCS cell identity. To ensure precision and uniformity, we replaced terms such as “fate” and “patterning” with “state” or “maintenance” to reflect the shift in cellular characteristics in a fully differentiated adult tissue context. 

      Minor:

      (1) Please provide all data points in bar graphs.

      We have incorporated individual data points into the bar graphs as suggested, ensuring enhanced transparency and clarity in the data presentation.

      “(2) Formally, gene expression levels between samples are not normally distributed. The Welch t-test used here assumes a normal distribution. Therefore, nonparametric tests should be used.

      We appreciate Reviewer #1’s consideration of the appropriate statistical approach to the qPCR data and clarify our statistical approach here. Normality within each experimental group was assessed using the Shapiro-Wilk test. Between-group comparisons were conducted using Welch t-test, and multiple comparisons were corrected using the Benjamini & Hochberg method to control the false discovery rate (FDR) (71). If a significant difference was detected between two groups (t-test FDR < 0.05) but normality was rejected in any of the compared groups (Shapiro-Wilk P < 0.05), a non-parametric Wilcoxon rank-sum test was used for verification. A significant group-mean difference was confirmed at one-tailed Wilcoxon P≤0.05 (detailed in Supplementary Data Set I). Furthermore, we have updated the qRT-PCR information in each figure and their respective legends as follows. Statistical analysis was performed using R version 4.2.0. We have included a new Supplementary Data Set I, detailing the statistical analysis of qRT-PCR data. Additionally, we have revised the Methods/Statistics section to detail the applied statistical analysis. 

      (3) Some of the panels of figures are tiny and cannot be evaluated. For example, in Figure 1B the actual data (expression of Tbx3/5) is impossible to see.

      We appreciate the Reviewer’s observation and have revised the figures to improve visual clarity and ensure that the presented data are easily interpretable by readers.

      Reviewer #2 (Recommendations for the authors):

      Additional Experiments, Data, Analysis:

      (1) Comparisons between both single knockouts and double knockouts at the phenotypic level are needed. In some instances, the data is shown (e.g., mortality and EKG) but direct statistical comparison is not performed. In other instances (optical mapping and gene expression), data with single knockouts are not shown. If combined VCS Tbx3/Tbx5 deletion does not change the phenotype of the VCS Tbx5 single deletion, this should be explicitly stated and discussed.

      We appreciate Reviewer #2’s suggestion to compare the phenotypic outcomes of the Tbx3 and Tbx5 single conditional knockout models with those observed in Tbx3/Tbx5 double conditional knockout model. We have expanded the discussion section of our manuscript to incorporate a more detailed comparison between the double Tbx3/Tbx5 model and the single Tbx5 and Tbx3 models [1-5], highlighting the distinct phenotypic outcomes of the single and double knockouts.

      (1) Burnicka-Turek O, Broman MT, Steimle JD, Boukens BJ, Petrenko NB, Ikegami K, Nadadur RD, Qiao Y, Arnolds DE, Yang XH, Patel VV, Nobrega MA, Efimov IR, Moskowitz IP (2020) Transcriptional Patterning of the Ventricular Cardiac Conduction System. Circulation Research 127:e94-e106. doi:10.1161/CIRCRESAHA.118.314460. 

      (2) Mohan RA, Bosada FM, van Weerd JH, van Duijvenboden K, Wang J, Mommersteeg MTM, Hooijkaas IB, Wakker V, de Gier-de Vries C, Coronel R, Boink GJJ, Bakkers J, Barnett P, Boukens BJ, Christoffels VM (2020) T-box transcription factor 3 governs a transcriptional program for the function of the mouse atrioventricular conduction system. Proc Natl Acad Sci U S A. 117:18617-18626. doi: 10.1073/pnas.1919379117.

      (3) Arnolds DE, Liu F, Fahrenbach JP, Kim GH, Schillinger KJ, Smemo S, McNally EM, Nobrega MA, Patel VV, Moskowitz IP (2012) TBX5 drives Scn5a expression to regulate cardiac conduction system function. The Journal of Clinical Investigation 122:2509–2518. doi: 10.1172/JCI62617.

      (4) Frank DU, Carter KL, Thomas KR, Burr RM, Bakker ML, Coetzee WA, Tristani-Firouzi M, Bamshad MJ, Christoffels VM, Moon AM (2012) Lethal arrhythmias in Tbx3-deficient mice reveal extreme dosage sensitivity of cardiac conduction system function and homeostasis. Proc Natl Acad Sci U S A. 109:E154-63. doi: 10.1073/pnas.1115165109. [5] Moskowitz IP, Pizard A, Patel VV, Bruneau BG, Kim JB, Kupershmidt S, Roden D, Berul CI, Seidman CE, Seidman JG (2004) The T-Box transcription factor Tbx5 is required for the patterning and maturation of the murine cardiac conduction system. Development 131:4107-4116. doi: 10.1242/dev.01265.

      (2) Genome-wide expression analysis including working myocardium would provide stronger evidence for interconversion of cell states. Ideally, this would include single knockouts.

      We agree that a genome-wide expression analysis, including a direct comparison with working myocardium, would provide more comprehensive insights into cell state transitions in Tbx3:Tbx5-deficient VCS cells. Additionally, incorporating single knockout models into such analyses would further clarify the distinct and cooperative contributions of Tbx3 and Tbx5 to maintaining VCS identity. This is a good suggestion for future studies.

      (3) This may not be essential to support the authors' claims, but the addition of epigenetic data from single and double KO VCS using ATAC-seq (which can be performed with relatively small numbers of cells) could provide stronger evidence for cell state changes of the kind hypothesized by the authors.

      We agree that epigenetic data such as ATAC-seq would complement transcriptional analyses and provide insight into chromatin states that underlie the observed cellular reprogramming. This is a good suggestion for follow-up studies to further characterize the molecular state of Tbx3:Tbx5-deficient VCS cells.

      (4) Additional clarification of the optical mapping experiments to exclude alternative interpretations like focal right bundle branch block and to include single knockouts for comparison - if the Tbx5 single KO looks the same as the double KO that would be very important to know and would directly affect interpretation of the experiment.

      Right septal optical mapping preparation involved removing the right ventricular free wall to directly image the right ventricular septum, which contains the VCS. In a healthy mouse, there are two peak components of the optical action potential upstroke, the first peak due to the activation of the VCS and the second due to the activation of the ventricular cardiomyocytes. Importantly, in Tbx3:Tbx5 double-conditional knockout mice, the first peak was absent, rather than delayed, indicating loss of fast conduction through the VCS. This absence suggests a shift in VCS cells toward a ventricular working myocardial phenotype, rather than a regional conduction block or delayed propagation through a structurally intact VCS.

      Previous studies from our group have extensively characterized the effect of single Tbx5 knockout on the VCS in murine hearts [1, 2, 3]. Arnolds et al. demonstrated that VCSspecific Tbx5-deficiency results in significant slowing of VCS conduction, evidenced by prolonged PR and QRS intervals, along with lengthening of the atrio-Hisian interval, His duration, and Hisioventricular interval [1]. Although both single Tbx5 knockout and Tbx3:Tbx5 double knockout mice exhibit slowing of ventricular conduction system, our optical mapping studies reveal distinct differences in their electrophysiological phenotypes. Burnicka-Turek et al. showed that the single knockout of Tbx5 in the VCS leads to a shift toward a pacemaker cell state, evidenced by ectopic beats originating in the ventricles and inappropriate automaticity [3]. During spontaneous beats, electrical impulses were retrogradely activated, propagating from the ventricles to the atria [3]. Whole-cell patch clamping recordings confirmed that Tbx5-deficient VCS cells displayed action potentials resembling pacemaker cells, characterized by slower upstroke (phase 0), prolonged plateau (phase 2), delayed repolarization (phase 3), and enhanced phase 4 depolarization [3]. In contrast, our current study on VCS-specific Tbx3:Tbx5 double knockout demonstrates a loss of the VCS-specific fast conduction propagation. Optical mapping demonstrated the absence of the initial upstroke corresponding to VCS activation in the His bundle region, indicating a shift in the VCS cells toward a ventricular working myocardium state. This loss of fast conduction properties highlights a fundamental distinction between single and double knockouts, suggesting that both Tbx3 and Tbx5 are required to maintain VCS identity and function.

      (1) D. E. Arnolds et al., “TBX5 drives Scn5a expression to regulate cardiac conduction system function,” J. Clin. Invest., vol. 122, no. 7, pp. 2509–2518, Jul. 2012, doi: 10.1172/JCI62617.

      (2) Moskowitz, I.P., Pizard, A., Patel, V.V., Bruneau, B.G., Kim, J.B., Kupershmidt, S., Roden, D., Berul, C.I., Seidman, C.E., Seidman, J.G. (2004) The T-Box transcription factor Tbx5 is required for the patterning and maturation of the murine cardiac conduction system. Development 131(16):4107-4116. 

      (3) Burnicka-Turek, O., Broman, M.T., Steimle, J.D., Boukens, B.J., Peterenko, N.B, Ikegami, K., Nadadur, R.D., Qiao, Y., Arnolds, D.E., Yang, X.H., Patel, V.V., Nobrega, M.A., Efimov, I.R., Moskowitz, I.P. (2020) Transcriptional Patterning of the Ventricular Cardiac Conduction System. Circ Res. 127(3):e94-e106. 

      Methods:

      (1) Additional methods on FACS are required. The methods section references a paper from 2004 (reference 67) that describes the flow sorting of embryonic cardiomyocytes. However, flow cytometric isolation of intact adult cardiomyocytes, which the authors describe in the present work, is a distinct technique and generally requires special equipment. These need to be described in more detail to be fully replicable.

      We thank Reviewer #2 for highlighting the need to provide additional details regarding our flow cytometric isolation of adult VCS cardiomyocytes. While we referenced earlier methods, we agree that isolating adult cardiomyocytes requires specialized approaches. Therefore, we revised the Methods section to include a detailed description of the equipment, procedures, and adaptations specific to isolating intact adult VCS cells to ensure full replicability.

      Minor Corrections:

      (1) Figure 1D. Please add a statistical test for mortality between the double conditional KO and the Tbx5 conditional KO.

      We have revised Figure 1D to include the statistical test comparing mortality between the Tbx3:Tbx5 double conditional knockout and the Tbx5 conditional knockout cohorts.

      (2) Figure 2A, 2I, 3A: Please include all individual data points not just a bar graph with error bars.

      We have added all individual data points to the bar graphs as recommended, enhancing the transparency and clarity of the data presentation.

      (3) Figure 2A: Please consider separate graphs for PR and QRS with appropriately scaled Y-axis so differences are easier to see.

      We appreciate Reviewer #2’s suggestion and fully agree with it. As a result, we have revised Figure 2A to include separate graphs for PR and QRS intervals, each with appropriately scaled Y-axes. This adjustment enhanced both the readability and the clarity of the observed differences.

      (4) Figure 3 G-K: The figure would be easier to interpret for the reader if genotypes were shown in the figure not just in the legend.

      We agree with Reviewer #2’s suggestion and have revised Figure 3 accordingly by adding genotype labels directly to the histological sections in Panels G-K. This update improves clarity, making the data easier for readers to interpret without needing to refer to the figure legend.

      (5) Figure 4A, C: Are vertical axes mislabeled? They say, "CON VCS and TBX5OE VCS". Please double-check axis labels and data on the graph.

      We appreciate the Reviewer bringing the mislabeling of the vertical axis in Figure 4 to our attention. We have corrected the labeling errors and ensured consistency between the graph and the underlying data.

      (6) Legend to Supplementary Figure 6. Says "Tbx3:Tbx3" instead of "Tbx3:Tbx5".

      We thank Reviewer #2 for pointing out the typo. It has been corrected to: “Supplementary Figure 6. Tbx3:Tbx5 double-conditional knockout mice exhibit QRS prolongation”.

      (7) Discussion. The authors write, "In Tbx3:Tbx5 double VCS knockout, we observed repression of fast VCS markers and also repression of Pan-CCS markers transcribed throughout the entire CCS." The term 'repression' has a specific connotation with transcription regulators that is likely not intended in this context so perhaps 'reduced expression' would be better here?

      We agree with Reviewer #2 and have replaced “repression” with “reduced expression” throughout the text (look below for references).

      “In the Tbx3:Tbx5 double VCS knockout, we observed a reduction in the expression of both fast VCS markers and Pan-CCS markers transcribed throughout the entire CCS.”

      (8) Discussion, the authors write, "This study combined with prior literature (1, 7, 11, 15, 26, 53, 54) indicates that the presence of both Tbx3 and Tbx5 is necessary for the specification of the adult VCS (Figure 7)." Since this work presents data from an adult conditional deletion, it's not clear how it informs our understanding of the specification, which occurs during development. Perhaps "maintenance of VCS fate" would be more appropriate here?

      We agree with Reviewer #2 that the term “maintenance of VCS fate” is more appropriate in the context of our study. Accordingly, we have updated the text to reflect this terminology.

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 2B: It is hard to see the IF images. What is the cardiac structure studied? Maybe a dashed line and a label to define the region and the structure represented will help. As the authors have described that the crosses used contain a reporter allele (R26-EYFP), a clearer way to show these results would be to include images of the linage traced cells with the reporter, not only to identify the CCS structure analyzed, but also to demonstrate that the deletion is specific to the MinK-creERT expression in the CCS.

      We appreciate the Reviewer’s suggestion to improve the clarity of Figure 2B by delineating the cardiac structures analyzed. In response, we have added dashed lines and labels to highlight the regions of interest within the IF images. Unfortunately, we were unable to capture high-quality EYFP fluorescence images for these sections. However, to address this concern, we microdissected the region shown in the IF images and performed FACS to isolate EYFP-positive cells from this specific area. These sorted cells were subsequently used for qPCR analysis, which confirmed the presence of Tbx3 and Tbx5 in control samples and the successful deletion of both genes in the doubleconditional knockout samples (Figure 2C, middle panel). We believe this approach provides robust evidence for the specificity of the MinK-CreERT expression in the CCS and the efficiency of gene deletion in the targeted region.

      (2) 3G-K: The authors describe the absence of morphological defects in the tissue sections of adult hearts from the different genotypes analyzed. Although this reviewer agrees that there seem to be no major defects in the general cardiac morphology of these animals, the higher magnification images suggest some tissue differences at the level of the AVN especially in the double HET, double HOMO, and the Tbx3 HOMO. Is that due to the section plane used? If so, more appropriate and comparable sections must be provided. Again, as the crosses used by the authors contain a reporter allele (R26-EYFP), it is required that the authors show that the CCS cells, where deletions are induced, are still present in equivalent areas in the mutants and that they remain in similar numbers only failing to maintain their specification into CCS due to Tbx3 and Tbx5 loss of function.

      This analysis will reinforce the authors' claims on the role of Tbx5/Tbx3 in this process.

      We thank the reviewer for their thorough assessment and thoughtful feedback on our histological analysis. The higher magnification images in Figure 3G-K do not specifically present the AVN. These sections primarily represent areas of the ventricular conduction system (VCS), particularly the His bundle and bundle branches, rather than the AVN itself. We do not believe that the observed morphological differences are related to AVN tissue, and there were no functional deficits attributable to the AVN in the double knockout. Furthermore, the Mink-Cre allele used in this study does not recombine in the ANV proper.   We agree that confirming the presence of CCS cells in equivalent regions across different genotypes is crucial. Our approach using FACS-based isolation of EYFP-positive cells from the VCS, followed by qPCR analysis, provides evidence that these cells remain present in double conditional knockouts, although they fail to maintain their specialized gene expression profile. This reinforces our conclusion that Tbx3 and Tbx5 are essential for maintaining the molecular identity of CCS cells, rather than their physical presence.

      (3) Figure 4: The authors performed molecular analysis by qPCR and WB in Tbx5/Tbx3 double mutants to demonstrate that CCS cells lose the expression of CCS genes and express working myocardium genes. Could this be further demonstrated by ISH, HCR, or IF together with lineage tracing to provide evidence that these changes are located where the CCS tissues are in the control embryos? Analysis of 2 or 3 of these markers of each type on tissue sections would be enough.

      We thank the Reviewer for their insightful suggestion regarding additional validation of our molecular findings through ISH, HCR, or IF combined with lineage tracing. However, we would like to clarify that the molecular analyses we performed by qPCR and WB were conducted on EYFP-positive cells that were specifically isolated from the ventricular conduction system (VCS) region of both control and double conditional knockout (dCKO) mice. These EYFP-positive cells were obtained through fluorescence-activated cell sorting (FACS), ensuring that our analyses were confined to the targeted VCS population. Alternate approaches are appropriate for future studies to investigate the precise genomic and molecular nature of the transformation observed in the double knockout.

      (4) Discussion: in the discussion section the authors conclude that the combined role of Tbx5/Tbx3 is critical for the specification of the adult VCS. However, as the Tbx5/Tbx3 loss of function conditions are only induced in adult animals 6 weeks old, would it be more appropriate that their function is the maintenance of the VCS cell fate and that if not present these cells return to the working myocardium fate? If the authors believe that these genes are involved in the induction of VCS specification in adults, then they need to demonstrate that, before the loss of function induction at 6 weeks, these cells are not yet specified as adult VCS.

      We appreciate the Reviewer’s clarification regarding terminology. We agree that our study focuses on adult-specific conditional deletion and thus reflects the maintenance, rather than the specification, of VCS cell fate. Accordingly, we have revised the text to explicitly state that Tbx3 and Tbx5 are critical for maintaining VCS identity in adult mice, and that their loss leads to a shift toward a working myocardial fate.

      Minor:

      (1) There is no consistency in the way the quantitative data is shown in graphs. There are some graphs showing only bars, other dot plots, and other a combination of both. The authors must homogenise the representation of quantitative data showing the different data points in dot plots and not in bar graphs.

      We have standardized the quantitative data presentation across all figures, by including individual data points in bar graphs, ensuring enhanced transparency and clarity.

      (2) Figure 3: The labels defining the genotypes corresponding to the different histological sections of adult hearts (Panels G-K) are missing. Panels J and K are not referenced in the text.

      We thank Reviewer #3 for highlighting these omissions. We have added the genotype labels to the histological sections in Panels G-K of Figure 3 to ensure clarity. Furthermore, we have now referenced Panels J and K in the results and in the supplementary material (please look below for references).

      “Histological examination of all four-chambers demonstrated no discernible differences between VCS-specific Tbx3:Tbx5 double-knockout (Tbx3<sup>fl/fl</sup>;Tbx5<sup>fl/fl</sup>;R26<sup>EYFP/+</sup>; MinK<sup>CreERT2/+</sup>) and control (Tbx3<sup>+/+</sup>;Tbx5<sup>+/+</sup>;R26<sup>EYFP/+</sup>; MinK<sup>CreERT2/+</sup>) mice, nor between . the double-knockout (Tbx3<sup>fl/fl</sup>;Tbx5<sup>fl/fl</sup>;R26<sup>EYFP/+</sup>; MinK<sup>CreERT2/+</sup>) and single-knockout models for either Tbx3 (Tbx3<sup>fl/fl</sup>;Tbx5<sup>+/+</sup>;R26<sup>EYFP/+</sup>; MinK<sup>CreERT2/+</sup>) or Tbx5 (Tbx3<sup>+/+</sup>;Tbx5<sup>fl/fl</sup>;R26<sup>EYFP/+</sup>; MinK<sup>CreERT2/+</sup>).Ventricular muscle appeared normal without hypertrophy or myofibrillar disarray and no fibrosis was present (Figure 3G, 3I, 3J, and 3K, respectively).”

      “Additionally, we confirmed the absence of histological and structural abnormalities in these mice, aligning with previous findings (Figures 3A, 3F versus 3B, and 3K versus 3G, respectively)(1, 11).”

      (3) Typo: Supplementary Figure 6. Tbx3:Tbx3 double-conditional knockout: it should say Tbx5:Tbx3 double-conditional knockout.

      We thank Reviewer #3 for pointing out the typo. It has been corrected to: “Supplementary Figure 6. Tbx3:Tbx5 double-conditional knockout mice exhibit QRS prolongation”.

    1. Author response:

      General Statements

      We sincerely appreciate the constructive comments from the reviewers, which have significantly enhanced the clarity and rigor of our manuscript. Most of their suggestions have already been incorporated into the revised version. Additionally, we are conducting an additional experiment to further substantiate our conclusions, and preliminary data seem to support our findings.

      As pointed out by Reviewer #1, the regulation of neural circuit function by oligodendrocytes is currently a highly significant and actively studied topic. Our study demonstrates that regional heterogeneity in oligodendrocytes underlies the microsecond-level computational processes in the sound localization circuit. We believe this work represents a substantial contribution to the field.

      Description of the planned revisions

      • Evaluation of node formation along axons sparsely expressing eTeNT (related to Reviewer #2: comment 1)

      Based on the approximately 90% expression efficiency of A3V-eTeNT in NM neurons, we interpreted that vesicular release from NM axons was largely inhibited in the NL region, leading to the suppression of oligodendrogenesis and the subsequent emergence of unmyelinated segments. However, the effects of eTeNT on myelination are likely diverse, and a possibility remains that eTeNT directly disrupted axon-oligodendrocyte interactions, preventing oligodendrocytes from myelinating the axons expressing eTeNT.

      To test this possibility, we have initiated an additional experiment to evaluate formation of nodes along axons, while expressing eTeNT sparsely by electroporation. Preliminary results indicated that unmyelinated segments did not increase, supporting our original conclusion. After completion of the experiment, we will include the findings as a Supplementary Figure associated with Figure 6, which will provide a clearer understanding of how eTeNT influences myelination.

      Description of the revisions that have already been incorporated in the transferred manuscript

      • Revised terminology from "nodal distribution" to "nodal spacing" throughout the manuscript. (Reviewer #1: comment 1)

      • Emphasized that our analyses were focused on the main trunk of NM axons (Reviewer #1: comment 2) We explicitly stated throughout the manuscript that we analyzed the main trunk of NM axons and made it clear that our findings do not contradict those by Seidl et al. (J Neurosci 2010), showing the similar axon diameter between midline and ventral NL regions (page 7, line 7).

      • Added an explanation on the maturation of sound localization circuit (Reviewer #1: comment 3) We explained that chickens have high ability of sound localization at hatch, emphasizing that the sound localization circuit is almost fully developed by E21 (page 4, line 12).

      • Emphasized the diverse effects of neuronal activity on oligodendrocytes (page 10, line 18) (Reviewer #1: comment 4)

      • Added details on the efficiency of A3V-eTeNT expression in NM neurons to the Results section (page 8, line 5) (Reviewer #2: comment 1)  

      • Made it clear in Figure Legend for Figure 6D that the analysis was conducted under the condition, where most of the axons were labeled by A3V-eTeNT (page 31, line 9) (Reviewer #2: comment 2)

      • Clarified the rationale for statistical test selection (Reviewer #2: comment 3.1)

      • Reanalyzed all statistical data with appropriate methods using R (Reviewer #2: comment 3.2)

      • Clearly indicated which statistical tests were used in each figure (Reviewer #2: comment 3.3)

      • Clarified what n represents and N used in each experiment (Reviewer #2: comment 3.4)

      • Added individual data points to bar graphs in Figure  5 and 6 (Reviewer #2: comment 3.5)

      • Emphasized the importance of comparing the ITD circuit with that of rodents (page 11, line 32) (Reviewer #2: comment 4) 

      • Softened the expressions related to "determine" (Reviewer #2: comment 5)

      Our study demonstrates that regional differences in the intrinsic properties of oligodendrocytes are the prominent determinant of nodal spacing patterns. However, we acknowledge that this does not establish a direct causation. Accordingly, relevant expressions have been revised throughout the manuscript.

      • Added references (Reviewer #2: comment 6)

      • Corrected units in Figure 1G (Reviewer #2: comment 7)

      • Added discussion about the involvement of pre-nodal clusters in the regional differences in nodal spacing (page 9, line 35) (Reviewer #3: comment 1).

      Related to this issue, we have added new data to Figure 6I.

      • Discussed the possibility that the developmental origin and/or the pericellular microenvironment of OPCs contributed to the regional heterogeneity of oligodendrocytes (page 9, line 21) (Reviewer #3: comment 3).

      • Added references used in the response to reviewers into the main text.

      • Corrected the data error in Figure 6G, H

      • Corrected the dataset in Figure 3E

      We limited the data in Figure 3E–G to those measuring both myelin length and diameter simultaneously.

      Description of analyses that authors prefer not to carry out

      • Analysis in adult chickens (Reviewer #1: comment 3,4)

      The chick brainstem auditory circuit is nearly fully developed by E21, and we have also demonstrated that nodal spacing increases by approximately 20% while maintaining regional differences up to P9. Therefore, our study covers the period from pre-myelination to postfunctional maturation, and we think that the necessity of analyzing aged animals is small.

      • Functional evaluation of the efficiency of eTeNT suppression (Reviewer #2: comment 1)

      It is technically challenging to quantitatively assess the inhibition of vesicular release by eTeNT in NM axons given that multiple synapses from different NM axons converge onto postsynaptic neurons. In addition, previous studies have already validated the efficacy of this construct in multiple species. Therefore, we will not evaluate electrophysiologically the extent of vesicular release inhibition by eTeNT in this study. Instead, we have provided clear evidence that A3V-eTeNT is expressed efficiently and leads to notable phenotypic changes, such as the inhibition of oligodendrogenesis. (page 8, line 5).

      • Replacing figures with data averaged per animal (Reviewer #2: comment 3.4)

      Our study focuses on the distribution of morphological characteristics at the single-cell level rather than solely on group means. Averaging measurements per animal could obscure this cellular heterogeneity and potentially misrepresent our findings. Given that data distributions in our plots show clear distinctions, we believe that averaging per biological replicate is not essential in this case. If requested, we will be happy to provide the outputs of PlotsOfDifferences as supplementary source data files, similar to those used in eLife publications, for each figure.

      • Additional experiments to manipulate oligodendrocyte density (Reviewer #2: comment 5)

      We have already demonstrated that A3V-eTeNT reduces oligodendrocyte density in the NL region, and some of the arguments in our study are based on this result. Therefore, we think that further experiments are not necessary.

      • Verification of the presence of pre-nodal clusters (Reviewer #3: comment 1)

      We investigated the presence of pre-nodal clusters on NM axons, but we could not identify them in the immunohistochemistry of AnkG. As the occurrence of pre-nodal clusters varies depending on neuronal type, we consider that pre-nodal clusters are not prominent in the NM axons and that further experimental validation would not be necessary. Instead, we have added a discussion on the possibility that pre-nodal clusters contribute to regional differences in nodal spacing along NM axons (page 9, line 35).

      • Axon diameter measurements using EM (Reviewer #3: comment 2)

      This experiment was already done by Seidl et al. (2010), and hence, we do not think it necessary to repeat it. We believe that the relative differences in axon diameter between the regions could be adequately assessed using the optical approach with membrane-targeted GFP.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review): 

      Despite evidence suggesting the benefits of neutralizing mucosa-derived IgA in the upper airway in protection against the SARS-CoV-2 virus, all currently approved vaccines are administered intramuscularly, which mainly induces systemic IgG. Waki et al. aimed to characterize the benefits of intranasal vaccination at the molecular level by isolating B cell clones from nasal tissue. The authors found that Spike-specific plasma cells isolated from the spleen of vaccinated mice showed significant clonal overlap with Spikespecific plasma cells isolated from nasal tissue. Interestingly, they could not detect any spike-specific plasma cells in the bone marrow or Peyer's patches, indicating that these nose-derived cells did not necessarily home to and reside in these locations, although the Peyer's patch is not a typical plasma cell niche - rather the lamina propria of the gut would have been a better place to look. Furthermore, they found that multimerization improves the antibody/antigen binding when the antibody is of low or intermediate affinity, but that high-affinity monomeric antibodies do not benefit from multimerization. Lastly, the authors used a competitive ELISA assay to show that multimerization could improve the neutralizing capacity of these

      antibodies. 

      The strength of this paper is the cloning of multiple IgA from the nasal mucosae (n=99) and the periphery (n=114) post-SARS-CoV-2 i.n. vaccination to examine the clonal relationship of this IgA with other sites, including the spleen. This analysis provides novel insights into the nature of the mucosal antibody response at the site where the host would encounter the virus, and whether this IgA response disseminates to other

      tissues. 

      There were also some weaknesses: 

      (1) The finding that multimerization improves binding and neutralization is not surprising as this was observed before by Wang and Nussenzweig for anti-SARS-CoV-2 IgA (authors should cite Enhanced SARS-CoV-2 neutralization by dimeric IgA. Wang et al., Sci. Transl. Med 2021, 13:3abf1555). 

      We have cited the paper, and the relevant sentence has been modified as follows (line 51-53); Recent studies have demonstrated that multimeric IgA is more effective and provides greater cross-protection than IgG and M-IgA (Okuya et al., 2020b) (Asahi et al., 2002) (Dhakal et al., 2018) (Asahi-Ozaki et al., 2004) (Wang et al., 2021).

      In addition, as far as I can tell we cannot ascertain the purity of fractions from the size exclusion chromatography thus I wasn't sure whether the input material used in Fig. 4 was a mixed population of dimer/trimer/tetramer?  

      The S-IgAs used in the SPR analysis in Fig. 4 consist of a mixture of dimers, trimers, and tetramers. The observed values indicate the average affinity of the S-IgAs. Please refer to the revised version (line 278280).

      (2) The flow cytometric assessment of the IgA+ clones from the nasal mucosae was difficult to interpret (Fig. 1B). It was hard for me to tell what they were gating on and subsequently analyzing without an IgA-negative population for reference. 

      We have updated FACS plots to illustrate the presence of IgA+ plasma cells in Fig. 1B, and the detailed gating strategy is outlined in Fig. 1B legend. Please find the relevant statements (line 115-120).

      (3) While the i.n. study itself is large and challenging, it would have been interesting to compare an i.m. route and examine the breadth of SARS-CoV-2 variant S1 binding for IgGs as in Fig. 2A. Are the IgA responses derived from the mucosae of greater breadth than systemic IgG responses? Alternatively, and easier, authors could do some comparisons with well-characterized IgG mAb for affinity and cross-reactivity as a benchmark to compare with the IgAs they looked at. Overall the authors did a good job of looking at a large range of systemic vs mucosal S1-specific antibodies in the context of an intra-nasal vaccination and this provides additional evidence for the utility of mucosal vaccination approaches for reducing person-to-person transmission. 

      I appreciate your consideration. Recent reports indicate that some M-IgA monomers possess neutralizing activity that is equivalent to or less than that of IgGs. However, the opposite phenomenon has also been observed. These results suggest that the Fc does not merely correlate with the degree of increase in antibody reactivity or functionality. We believe the discrepancies in previous studies are due to variations in the binding modes between the epitope and paratope of each antibody clone. Nevertheless, oligomerization enhances the functionality of most monomeric antibody clones, suggesting that the multivalent S-IgA enables a mode of action that is challenging to achieve with a monomeric antibody. Please refer to the revised version (line 399-403).

      Alternatively, and easier, authors could do some comparisons with well-characterized IgG mAb for affinity and cross-reactivity as a benchmark to compare with the IgAs they looked at. Overall the authors did a good job of looking at a large range of systemic vs mucosal S1-specific antibodies in the context of an intra-nasal vaccination and this provides additional evidence for the utility of mucosal vaccination approaches for reducing person-to-person transmission. 

      We have summarized the characteristics of the four types of nasal IgAs in Fig.7 and in the Discussion. Please refer to the revised version (line 405-422).

      Reviewer #2 (Public Review): 

      Summary: 

      This research demonstrates the breadth of IgA response as determined by isolating individual antigenspecific B cells and generating mAbs in mice following intranasal immunization of mice with SARS-CoV2 Spike protein. The findings show that some IgA mAb can neutralize the virus, but many do not. Notable immunization with Wuhan S protein generates a weak response to the omicron variant. 

      Strengths: 

      Detailed analysis characterizing individual B cells with the generation of mAbs demonstrates the response's breadth and diversity of IgA responses and the ability to generate systemic immune responses. 

      Weaknesses: 

      The data presentation needs clarity, and results show mAb ability to inhibit SARS-CoV2 in vitro. How IgA functions in vivo is uncertain. 

      We conducted an additional experiment using a hamster model and confirmed that S-IgAs can protect against SARS-CoV-2 infection. Please refer to the revised version (line 349-373 and 431-438).

      Reviewer #1 (Recommendations For The Authors): 

      (1) Figure 1A shows antibody titers in nasal lavage fluid and serum of mice post intranasal vaccination with SARS-CoV-2 Spike protein. The Y-axis of this figure is labeled as "U/mg" however these units are not clearly defined. 

      The antibody titers are expressed as optical density (OD450) value per total protein in nasal lavage fluids or serum. Please find the relevant statements (line 113-114).

      Furthermore, what do antibody titers in the nasal lavage fluid and serum look like post-intramuscular vaccination with the same vaccine and dose? Comparison of titers to the intramuscular route as well as to the PBS control would make this data more impactful. 

      We appreciate your consideration. We have not conducted experiments comparing the effects of intramuscular and intranasal administration using the same dosage and adjuvant. Cholera toxin has primarily been used as an adjuvant for nasal immunization, but it is seldom applied for intramuscular injection. We are interested in its impact on the immune compartment when using cholera toxin as an adjuvant for intramuscular injection. We plan to conduct further experiments in the future.

      Lastly, in Figure 1B, the detection of nasal IgG is not shown even though the authors assess nasally-derived IgG in the spleen further into the study.  

      Since the number of lymphocytes that can be collected from the nasal mucosa is limited, there is an insufficient capacity to isolate IgG+ plasma cells after collecting IgA+ plasma cells. Therefore, conducting such an experiment on mice is technically challenging. A larger animal, such as rats, will be necessary to perform this experiment. Further investigation is needed to determine whether antigen-specific IgG+ plasma cells, sharing V-(D)-J with nasal IgA, can be detected in the nasal mucosa.

      (2) There appears to be something amiss with the IgA stain. It is smushed up against the X-axis. Better flow cytometry profiles should be shown. Likewise in Supplemental Fig. 1A, their IgA stain appears to not be working. This must be addressed using positive and negative controls. 

      We have updated FACS-polts to show the IgA+ plasma cell in Fig.1B, and the detailed gating strategy is outlined in the Fig.1B legend. Please find the relevant statements on line 115-120.

      (3) We do not know the purity of the samples that were subjected to SPR and since the legend of Fig. 4 is partially incorrect, it was difficult to know how this experiment was done. 

      The S-IgA used in the SPR analysis shown in Figure 4 is a mixture of dimers, trimers, and tetramers, and the observed values are believed to reflect the affinity of the S-IgA in the nasal mucosa. Please refer to the revised version (line 278-280).

      (4) Fig. 5 results need to compare with some of the well-characterized mAb (IgG) to understand the biological significance of these neutralizing titres. 

      We have summarized the characteristics of the four types of nasal IgA in Fig.7 and in the Discussion. Please refer to the revised version (page 405-422).

      Communication of results: 

      (1) Authors could improve the communication of their results by introducing the vaccination protocol in the results section accompanied by a diagram of the vaccination strategy (nature of the Ag, route, and frequency). This could be Fig. 1A .  

      A schematic diagram of the vaccination protocol is presented in Fig.1.

      (2) Care should be taken with some of the terminology. Intranasal is the accepted term but authors sometimes use "internasal". The term "immunosuppression" on page 2 could be misleading as it means something different to other audiences. The distinction when speaking about "protection from harmful pathogens" should be made between protection against infection (ie sterilizing immunity) vs protection against disease (ie morbidity and mortality). Instead of "nose", one should say "nasal". Nose-related could be rephrased as "potentially nasal-derived". P.5, line 2 didn't make sense: "IgG+ plasma cells that express nose-related IgA"...

      In many places, Spike is missing it's "e".  

      We have made the correction accordingly.

      (3) Page 3: The lumping of the human and animal SARS-CoV-2 intranasal studies together is a bit misleading. Very little has worked for intranasal vaccination against SARS-CoV-2 in humans at this point in time (although hopefully that will change soon!). Authors should specify which studies were done in animals and which were done in humans. 

      The manuscript has been revised to include two citations on line 73-75 (Ewer et al., 2021 and Zhu et al., 2023).

      (4) What is ER-tracker? It comes out of nowhere and should be explained why it was used to the reader (as well as why they used the other markers) to sort for Spike-specific PC. 

      ER-Tracker is a fluorescent dye that is highly selective for the endoplasmic reticulum of living cells. Because plasma cells have an expanded endoplasmic reticulum for properly folding and secreting large quantities of antibodies, using ER-Tracker along with anti-CD138 facilitates the isolation of plasma cells from lymphocytes without the need for additional antibodies. Please refer to the revised version for details. (ine 130-134).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:  

      Reviewer #1 (Public review):  

      Summary:  

      Goal: Find downstream targets of cmk-1 phosphorylation, identify one that also seems to act in thermosensory habituation, test for genetic interactions between cmk-1 and this gene, and assess where these genes are acting in the thermosensory circuit during thermosensory habituation.  

      Methods: Two in vitro analyses of cmk-1 phosphorylation of C. elegans proteins. Thermosensory habituation of cmk-1 and tax-6 mutants and double mutants was assessed by measuring the rate of heat-evoked reversals (reversal probability) of C. elegans before and after 20s ISI repeated heat pulses over 60 minutes.  

      Conclusions: cmk-1 and tax-6 act in separate habituation processes, primarily in AFD, that interact complexly, but both serve to habituate the thermosensory reversal response. They found that cmk-1 primarily acts in AFD and tax-6 primarily acts in RIM (and FLP for naïve responses). They also identified hundreds of potential cmk-1 phosphorylation substrates in vitro.  

      Strengths:  

      The effect size in the genetic data is quite strong and a large number of genetic interaction experiments between cmk-1 and tax-1 demonstrate a complex interaction.  

      Thanks a lot for these positive remarks.

      Weaknesses:  

      The major concern about this manuscript is the assumption that the process they are observing is habituation. The two previously cited papers using this (or a very similar) protocol, Lia and Glauser 2020 and Jordan and Glauser 2023, both use the word 'adaptation' to describe the observed behavioral decrement. Jordan and Glauser 2023 use the words 'habituation' or 'habituation-like' 10 times, however, they use 'adaptation' over 100 times. It is critical to distinguish habituation from sensory adaptation (or fatigue) in this thermal reversal protocol. These processes are often confused/conflated, however, they are very different; sensory adaptation is a process that decreases how much the nervous system is activated by a repeated stimulus, therefore it can even occur outside of the nervous system. Habituation is a learning process where the nervous system responds less to a repeated stimulus, despite (at least part of the nervous system) the nervous system still being similarly activated by the stimulus. Habituation is considered an attentional process, while adaptation is due to the fatigue of sensory transduction machinery. Control experiments such as tests for dishabituation (where the application of a different stimulus causes recovery of the decremented response) or rate of spontaneous recovery (more rapid recovery after short inter-stimulus intervals) are required to determine if habituation or sensory adaptation are occurring. These experiments will allow the results to be interpreted with clarity, without them, it isn't actually clear what biological process is actually being studied.  

      Thanks for the comment. As this reviewer points out, “adaptation” and “habituation” are often conflated. Many scientists (maybe not the majority though) use a less stringent definition for the word habituation, than the one presented by this reviewer. More particularly, the term habituation is used in human pain research to refer solely to the reduction of response to repeated stimuli, in the absence of a detailed assessment of the more stringent criteria mentioned here (see, e.g.,  PMID: 22337205 ; PMID: 18947923 ; PMID: 17258858; PMID: 20685171 ; PMID: 15978487). In addition to the practice in pain research, the main reason why we steered toward ‘habituation’ from our previous publication is because it immediately conveys the idea of a response reduction, whereas ‘adaptation’ could in principle be either an up-regulation or a downregulation of the response (again, based on various definitions). But we agree that using the word “habituation” came at the cost of triggering a confusion about the exact nature of the process, for those considering the stricter definition of the word “habituation” and those not in the narrower field of pain research. In the revised manuscript, we have thus changed this terminology to “adaptation”. Also following suggestions from Reviewer 2, we have strengthened the description of the protocol in the Result section and clarified, why the adaptation phenomenon is not a ‘thermal damage’ effect or ‘fatigue’ effect in the neuro-muscular circuit controlling reversal. One of the most convincing piece of evidence it cannot be solely explained by “damages” or “exhaustion” is simply the existence of non-adapting mutants (like cmk-1(lf)) or pharmacological treatments (Cyclosporin A) blocking the adaptation effect and enabling worm to continuously reverse for hours without any problems.  

      While the discrepancy between the in vitro phosphorylation experiments and the in silico predictions was discussed, the substantial discrepancy (over 85% of the substrates in the smaller in vitro dataset were not identified in the larger dataset) between the two different in vitro datasets was not discussed. This is surprising, as these approaches were quite similar, and it may indicate a measure of unreliability in the in vitro datasets (or high false negative rates).

      Thanks for the comment. This is an important aspect which we now more extensively cover in the Discussion section.

      The strong consistency of the CMK-1 recognition consensus sequences across the two in vitro dataset speaks against the unreliability of the analyses. Instead, there are a few points to highlight that explain the somewhat low degree of overlap between the two datasets, which indeed relate to the false negative rates as this reviewer suggests.

      (1) In the peptide library analysis, Trypsin cleavage prior to kinase treatment will leave a charged N-term or C- terminus and in addition remove part of the protein context required for efficient kinase recognition. This will have a variable effect across the different substrates in the peptide library, depending on the distance between the cleavage site and the phosphosite, but will not affect the native protein library. This effect increases the false negative rate in the peptide library.

      (2) The number and distribution of “available substrate phosphosites” diverge in the two libraries. Indeed, the peptide library is expected to contain a markedly larger diversity of potential CMK-1 substrate sites than the protein library (because the Trypsin digestion will reveal substrates that are normally buried in a native protein), but the depth of MS analysis is the same for the two libraries. In somewhat simplistic terms, the peptide-library analysis is prone to be saturated with abundant phosphorylated peptides, which prevent detecting all phosphosites. If the peptide analysis could have been made deeper, we would probably have increased the overlap (at the cost of increasing the number of false positive too).

      (3) We have chosen quite strict criteria and applied them separately to define each hit list; therefore, we know we have many false negatives in each list, which will naturally reduce the expected overlap.

      We now extended the discussion of the limited overlap of the two dataset in a dedicated paragraph in the discussion. We also clarify that we tend to give more trust to the protein-library dataset (since substrates are in a configuration closer to that in vivo), with those hits also present in the peptide dataset (like TAX-6 was) as the most convincing hits, as they could be validated in a second type of experiment.

      Additionally, the rationale for, and distinction between, the two separate in vitro experiments is not made clear.  

      We reasoned that both substrate types have their own benefits and limitations (as discussed in the manuscript), so it was an added value to run both. We proposed that the subset of targets present in both datasets to be the most solid list of candidates. We have reinforced this point in the discussion.  

      Line 207: After reporting that both tax-6 and cnb-1 mutants have high spontaneous reversals, it is not made clear why cnb-1 is not further explored in the paper. Additionally, this spontaneous reversal data should be in a supplementary figure.  

      We kept the focus of the article primarily on TAX-6, because it was identified as CMK-1 target in vitro; CNB-1 was not. Moreover, we didn’t have cnb-1(gf) mutants to pursue the analysis with, and we were stuck by the cnb-1(lf) constitutive high reversal rate for any further follow up. We have added a supplementary file to present the spontaneous reversals rates.

      Figure 3 -S1: This model doesn't explain why the cmk-1(gf) group and the cmk-1(gf) +cyclo A group cause enhanced response decrement (presumably by reducing the inhibition by tax-6) but the +cyclo A group (inhibited tax-6) showed weaker response decrement, as here there is even further weakened inhibition of tax-6 on this process. Also, the cmk-1(lf) +cyclo A group is labeled as constitutive habituation, however, this doesn't appear to be the case in Figure 3 (seems like a similar initial level and response decrement phenotype to wildtype).  

      Thanks a lot for the comment. We are glad that the presentation of our complex dataset was clear enough to bring the reader to that level of detailed reflection and interpretation on the proposed model. To address the two points raised in this reviewer comment, we made modifications to the model presentation and provide additional clarifications below, where we use the term adaptation instead of habituation (as in the revised Figure):

      Regarding the first point, “why the cmk-1(gf) group and the cmk-1(gf) +cyclo A group cause enhanced response decrement … but the +cyclo A group showed weaker response decrement”. This is really a very good point, that cannot be easily explained if all the branches (arrows) in the model have the same weight or work as ON/OFF switches. We tried to convey the relative importance of the regulation effect via the thickness of the arrow lines (which we have now clarified in the legend in the revised ms). The main ‘quantitative’ nuances to take into consideration here originate from 2 assumptions of the model (which we have clarified in the revised ms):

      Assumption 1: the inhibitory effect of TAX-6 on the CMK-1 antiadaptation branch and the inhibitory effect of TAX-6 on the CMK-1 pro-adaptation branch are not of the same magnitude (we have further enhanced the line thickness differences in the revised model, top left panel for wild type).

      Assumption 2: the two antagonistic direct effects of CMK-1 on adaptation are not of the same magnitude, most strikingly in the context of CMK-1(gf) mutants.

      In our model, the cyclosporin A treatment alone (bottom left panel) causes a strong boost on the CMK-1 inhibitory branch and a less marked boost on the CMK-1 activator branch (following assumption 1). This causes an imbalance between the two antagonist direct CMK-1-dependent drives, which reduces (but doesn’t fully block) adaptation. Indeed, we don’t observe a total block of adaptation with cyclosporin A in wild type, the effect being significantly milder than the totally nonadapting phenotypes seen, e.g., in TAX-6(gf) mutants. From there, the question is what happen in CMK-1(gf) background that would mask the anti-adaptation effect of Cyclosporin A? Here assumption 2 is relevant, and the CMK-1(gf) pro-adaptation direct branch is always prevalent and imbalances the regulation toward faster adaptation (the role of TAX-6 becoming negligible in the CMK-1(gf) background and ipso facto that of Cyclosporin A).

      Regarding the second point, “the cmk-1(lf) +cyclo A group is labeled as constitutive habituation”. We regret a confusing word choice in the first version of the manuscript; we intended to mean “normal habituation phenotype” but in the joint absence of antagonistic CMK-1 and TAX-6 regulatory signaling (so the regulation is not like in wild-type, but the phenotype ends up like in wild type). We have modified the label to “normal adaptation” and left a note in the legend that an apparently normal adaptation phenotype seems to be the default situation when the two antagonistic regulatory pathways are shut off.

      More discussion of the significance of the sites of cmk-1 and tax-6 function in the neural circuit should take place. Additionally, incorporating the suspected loci of cmk-1 and tax-6 in the neural circuit into the model would be interesting (using proper hypothetical language). For example, as it seems like AFD is not required for the naïve reversal response but just its reduction, cmk-1 activity in AFD might be generating inhibition of the reversal response by AFD. It certainly would be understandable if this isn't workable, given extrasynaptic signaling and other unknowns, but it potentially could also be helpful in generating a working model for these complex interactions. For example, cmk1 induces AIZ inhibition of AVA (AIZ is electrically coupled to AFD), and tax-6 reduces RIM activation of AVA (these neurons are also electrically coupled according to the diagram). RIM is also a neuropeptide-rich neuron, so this could allow it to interact with the cmk-1-related process(es) in AFD. Some discussion of possibilities like this could be informative.  

      Thanks for the comment. These hypothetical inter-cellular communication pathways are indeed nice possibilities. On the other hand, we could envision several additional pathways. While RIM is indeed a neuropeptide-rich neurons, all these neurons actually express neuropeptides. Following this helpful suggestion, we have slightly expanded the discussion of hypothetical cellular pathways that can be modulated downstream of CMK-1 in AFD. We also slightly lengthened the discussion to mention hypothetical post-synaptic target of TAX-6 within interneurons based on the literature.

      Provide an explanation for why some of the experiments in Figure 4 have such a high N, compared to other experiments.  

      The conditions with the highest n correspond to conditions which we have also used as ‘control’ condition for other type of experiments in the lab and as part of side projects, but which could be gathered for the present article. We have been working with cmk-1(lf) and tax-6(gf) mutants for many years… and the robust non-adapting phenotype was a reference point and a quality control when analyzing other nonadapting mutants.

      Because the loss of function and gain of function mutations in cmk-1 have a similar effect, it is likely that this thermosensory plasticity phenotype is sensitive to levels of cmk-1 activity. Therefore, it is not surprising that the cmk-1 promoter failed to rescue very well as these plasmid-driven rescues often result in overexpression. Given this and that the cmk-1p rescue itself was so modest, these rescue experiments are not entirely convincing (and very hard to interpret; for example, is the AFD rescue or the ASER rescue more complete? The ASER one is actually closer to the cmk-1p rescue). Given the sensitivity to cmk-1 activity levels, a degradation strategy would be more likely to deliver clear results (or perhaps even the overactivation approach used for tax-6).  

      Thanks for the comment. We respectfully disagree with this reviewer’s statement “the loss of function and gain of function mutations in cmk-1 have a similar effect”. We suspect a confusion here, because our data clearly show that these two mutant types have an opposite phenotype. That being said, we interpret the weak rescue effect with cmk-1p as a probable result of overexpression or incomplete/imbalanced expression across neurons (as the promoter used might not include all the relevant regulatory regions). We dedicated considerable efforts to establish an endogenous CMK-1::degron knock in, for tissue-specific auxin-induced degradation (AID), but we were unfortunately not able to obtain consistent results. Unfortunately, the only useful data regarding CMK-1 place-of-action are the cell-specific rescue data already included in the report.

      Reviewer #2 (Public review):  

      Summary:  

      The reduction in a response to a specific stimulus after repeated exposures is called habituation. Alterations in habituation to noxious stimuli are associated with chronic pain in humans, however, the underlying molecular mechanisms involved are not clear. This study uses the nematode C. elegans to study genes and mechanisms that underlie habituation to a form of noxious stimuli based on heat, termed thermo-noxious stimuli. The authors previously showed that the Calcium/Calmodulin-dependent protein kinase (CMK-1) regulates thermo-nociceptive habituation in the nematode C. elegans. Although CMK-1 is a kinase with many known substrates, the downstream targets relevant for thermo-nociceptive habituation are not known. In this study, the authors use two different kinase screens to identify phosphorylation targets of CMK-1. One of the targets they identify is Calcineurin (TAX-6). The authors show that CMK-1 phosphorylates a regulatory domain of Calcineurin at a highly conserved site (S443). In a series of elegant experiments, the authors use genetic and pharmacological approaches to increase or decrease CMK-1 and Calcineurin signaling to study their effects on thermo-nociceptive habituation in C. elegans. They also combine these various approaches to study the interactions between these two signaling proteins. The authors use specific promoters to determine in which neurons CMK-1 and Calcineurin function to regulate thermonociceptive habituation. The authors propose a model based on their findings illustrating that CMK-1 and Calcineurin act mostly in different neurons to antagonistically regulate habituation to thermo-nociceptive stimuli in a complex manner.  

      Strengths:  

      (1) Given the conservation of habituation across phylogeny, identifying genes and mechanisms that underlie nociceptive habituation in C. elegans may be relevant for understanding chronic pain in humans.  

      (2) The identification of canonical CaM Kinase phosphorylation motifs in the substrates identified in the CMK-1 substrate screen validates the screen.  

      (3) The use of loss and gain of function approaches to study the effects of CMK-1 and Calcineurin on thermo-nociceptive responses and habituation is elegant.  

      (4) The ability to determine the cellular place of action of CMK-1 and Calcineurin using neuron-specific promoters in the nematode is a clear strength of the genetic model system.  

      Thanks a lot for these positive remarks.

      Weaknesses:  

      (1) The manuscript begins by identifying Calcineurin as a direct substrate of CMK-1 but ends by showing that CMK-1 and Calcineurin mostly act in different neurons to regulate nociceptive habituation which disrupts the logical flow of the manuscript.  

      We understand this point and we have carefully considered and (reconsidered) the way to articulate the report. However, we could not present the story much differently as we would have no justification to investigate the role of TAX-6 and its interaction with CMK-1, if we would not have first identified it as phospho-target in vitro. Carefully considering this point, we found that the abstract of the first manuscript version was probably too cursory and susceptible to trigger wrong expectations among readers. We have thus extensively revised the abstract to clarify this point. Furthermore, we have reinforced this point in the last paragraph of the introduction and in the conclusion paragraph of the Discussion.

      (2) The physiological relevance of CMK-1 phosphorylation of Calcineurin is not clear.

      We do agree and have explicitly mentioned this aspect in the abstract, in the end of the introduction, and in the discussion section.

      (3) It is not clear if Calcineurin is already a known substrate of CaM Kinases in other systems or if this finding is new.  

      We are not aware of any study having shown Calcineurin is a direct target of CaM kinase I. But it was found to be substrate of CaM kinase II as well as of other kinases, as we explicitly presented in the discussion section. We have complemented the text mentioning we are not aware of Calcineurin having so far been reported to be a CaM kinase I substrate.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):  

      (1) The authors might consider reorganizing the results, so that the substrate phosphorylation analysis follows the cmk-1 habituation data, as it may not be clear to the reader why you are looking for substrates downstream of cmk-1 at that point. Or the authors could mention the previous habituation data for cmk-1 at the beginning of the results.  

      Thank you. This is something that we considered while (re-)writing. However, we prefer to keep CMK-1 data side-by-side with TAX-6 data, regarding the result section. Nevertheless, we have modified the last paragraph of intro to better transition and justify the specific interest of searching for CMK-1 targets in the context of the present study.

      (2) Line 209: 'controls' is too strong a word. 'regulates' would be better, and it should be stated that this is for 'spontaneous reversal behavior'.  

      Thank you. This was modified.

      (3) Line 359: we suspect that these reflect functional enrichments.  

      We don’t see what would exactly be wrong with the original sentence. The proposed change (if it is a proposed change) would completely obliterate the intended meaning of our sentence. We rewrote the sentence to be as clear as possible, as follows: ”Even if we cannot rule out an actual inclination of the CaM kinase pathway to regulate these processes, we suspect that these GO term enrichments rather reflect an analytical bias toward abundant proteins.”

      (4) Line 563: In this subsection, it is not made clear when the T0 and T60 heat pulses are given, in relation to the 20s ISI heat pulses given for 60 minutes. Are they the first and last pulse, or given some time before or after this train of heat pulses?  

      Thanks for spotting this poor description, which we have improved in the revised manuscript. The heat pulse recording is given immediately before and immediately after the 60 min of repeated stimulation. After the T0 heat pulse recording there is a period of about 30 s (period of post stimuli recording + transfer from the recording device (INFERNO) to the habituation device (ThermINATOR)).  For the T60 acquisition, there is a lag of about 50 s between the last ‘habituation’ stimuli and the recording stimuli (time needed to move the plate between the habituation device and the recording device + 40 s of baseline reversal recording in the absence of heat stimuli).

      Reviewer #2 (Recommendations for the authors):  

      (1) There appears to be little to no connection between the phosphorylation site discovered in Calcineurin (S443) and the behavioral phenotypes being studied. What is the thermo-nociceptive response if phosphorylation of S443 in Calcineurin is blocked (using a S443A mutation) and/or combined with CMK-1 gain of function?  

      Thanks for the suggestion. The suggested analysis is complicated by several factors. First, the tax-6(lf) is not directly suitable for rescue analysis (until we would have identified a way to restore baseline reversal), so we cannot use a S443A-carrying rescue transgene. Second, the truncated TAX-6(GF) mutant lacks the C-terminal part, including S443, so we cannot introduce a S443A in this context. The left approach would be to modify the endogenous locus. This again is complicated by the fact that S443 exists in two different isoforms (with conserved RxxS motifs in two different alternative exons). It will be very difficult to perform these experiments until we know more about the expression pattern and function of the respective isoforms. This is work in progress, but this analysis will need to await a future publication.

      (2) The authors should state clearly if Calcineurin is a novel substrate of CaM Kinase or if this is already known in the field.  

      We have complemented the text mentioning we are not aware of Calcineurin having so far been reported to be a CaM kinase I substrate.

      (3) The logical flow of the manuscript could be improved given that CMK-1 and Calcineurin appear to act in different cells to regulate nociceptive habituation.  

      As detailed above, we have considered this point carefully and modified the introduction and the abstract. The discussion about the two places of action was also improved.

      (4) More detail about the experimental methods used for the heat-evoked reversals should be included in the Results section.  

      Thanks for the suggestion. We have improved the description in the Method section and expanded the partial description in the result section, so readers could hopefully proceed without needing to go back and forth with the methods.

      (5) Check for typos. For example: line 197 - fix typo "...to a series repeated heat stimulation...".  

      Thank you. We have carefully read the revised manuscript to correct remaining typos.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript described a structure-guided approach to graft important antigenic loops of the neuraminidase to a homotypic but heterologous NA. This approach allows the generation of well-expressed and thermostable recombinant proteins with antigenic epitopes of choice to some extent. The loop-grafted NA was designated hybrid.

      Strengths:

      The hybrid NA appeared to be more structurally stable than the loop-donor protein while acquiring its antigenicity. This approach is of value when developing a subunit NA vaccine which is difficult to express. So that antigenic loops could be potentially grafted to a stable NA scaffold to transfer strain-specific antigenicity.

      Weaknesses:

      However, major revisions to better organize the text, and figure and make clarifications on a number of points, are needed. There are a few cases in which a later figure was described first, data in the figures were not sufficiently described, or where there were mismatched references to figures.

      More importantly, the hybrid proteins did not show any of the advantages over the loop-donor protein in the format of VLP vaccine in mouse studies, so it's not clear why such an approach is needed to begin with if the original protein is doing fine.

      We thank the reviewer for their helpful comments. We have incorporated feedback from the authors to improve the manuscript. Please see our point-by-point response.

      The purpose of loop-grafting between H5N1/2021 (a high-expressor) and the PR8 virus was not to improve the expression of PR8, which is already a good expressing NA. Instead, the loop-grafting and the in vivo experiments were done to show the loop-specific protection following a lethal PR8 virus challenge.

      Reviewer #2 (Public review):

      In their manuscript, Rijal and colleagues describe a 'loop grafting' strategy to enhance expression levels and stability of recombinant neuraminidase. The work is interesting and important, but there are several points that need the author's attention.

      Major points

      (1) The authors overstress the importance of the epitopes covered by the loops they use and play down the importance of antibodies binding to the side, the edges, or the underside of the NA. A number of papers describing those mAbs are also not included.

      We have discussed the distribution of epitopes on NA molecule in the Discussion section "The distribution of epitopes in neuraminidase" (new line number 350). In Supplementary Figures 1 and 2, we have compiled the epitopes reported by polyclonal sera and mAbs via escape virus selection or crystal structural studies. There are 45 residues examples of escape virus selection, and we found that approximately 90% of the epitopes are located within the top loops (Loops 01 and Loops 23, which include the lateral sides and edges of NA). We have also included the epitopes of underside mAbs NDS.1 and NDS.3 in Supplementary Figure 2. Some of the interactions formed by these mAbs are also within the L01 and L23 loops. All relevant references are cited in Supplementary Figures 1 and 2.

      A new figure has been added [Figure 1b (ii)] to illustrate the surface mapping of epitopes on NA.

      (2) The rationale regarding the PR8 hybrid is not well described and should be described better.

      We described the rationale for the PR8 hybrid (new lines 247-250). For clarity, we have added the following sentence within the section "Loop transfer between two distant N1 NAs:...."

      (new lines 255-258):

      "mSN1 showed sufficient cross-reactivity to N1/09 to protect mice against virus challenge. Therefore, we performed loop transfer between mSN1 and PR8N1, which differ by 18 residues within the L01 and L23 loops and show no or minimal cross-reactivity, to assess the loop-specific protection."

      (3) Figure 3B and 6C: This should be given as numbers (quantified), not as '+'.

      We have included the numerical data in Supplementary Figure 6. The data is presented in semi-quantitative manner for simplification. To improve clarity, we have now added the following sentence to the Figure 3c legend: "Refer to Supplementary Figure 6 for binding titration data".

      (4) Figure 5A and 7A: Negative controls are missing.

      A pool of Empty VLP sera was included as a negative control, showing no inhibition at 1:40 dilution. In the figure legends, we have stated "Pooled sera to unconjugated mi3 VLP was negative control and showed no inhibition at 1:40 dilution (not included in the graphs)"

      (5) The authors claim that they generate stable tetramers. Judging from SDS-PAGE provided in Supplementary Figure 3B (BS3-crosslinked), many different species are present including monomers, dimers, tetramers, and degradation products of tetramers. In line 7 for example there are at least 5 bands.

      Tetrameric conformation of soluble proteins is evidenced by the size-exclusion chromatographs shown in Figures 3a and 6b. The BS3 crosslinked SDS-PAGE are only suggestive data, indicating that the protein is a tetramer if a band appears at ~250 kDa. However, depending on the reaction conditions, lower molecular weight bands may also be observed if crosslinking is incomplete.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Specific comments:

      - Description of Figure 2 on page 3 should go before Figure 3 lines 87-105 or swap the order of the two figures.

      We have moved lines 91-96, which refer to Figure 3, to appear after Figure 2.

      - Figure 3a, an EC50 should be calculated for both NA activity assay.

      Figure 3a has been updated to include the EC50 and AUC (Area under curve) values for both NA activity assays. The same update has also been made for Figure 6b.

      - Line 150, I'm not sure it's appropriate to cite a manuscript that was in preparation but not published. I'm referring to the two mAbs AG7C and AF9C that were claimed to bind to the L01 and L23 loops but not.

      We have changed the "manuscript in preparation" to "personal communication with Dr. Yan Wu, Capital Medical University".

      - The description in Figure 4a is lacking.

      We have added a detailed description for Figure 4a.

      - Figure 4c, sufficient description is needed. For example, the cavity should be outlined and annotated, what is the role of Val149? Why the first monomer is assigned a number of II and the second monomer with a number of I.

      We have added a detailed description for Figure 4c and amended the figure as per the reviewer’s suggestions.

      - Figure 5a, in addition to ELLA data to mSN1 and N1/09, ELLA data to N1/19 should also be measured and shown. Figure S7, please show IC50 instead of curves for better comparison.

      We included IC50 for mSN1 and N1/09 as we intended to associate the loops with protection.  Graphs for N1/19 have not been reported, but the IC50 titres from pooled sera are shown in Supplementary Figure 7 as a representation. Due to the limited sera sample sourced from tail vein bleed, these assays were performed using pooled sera, which represent the total response (established in numbers of experiments).

      - Line 234-238, the author made a statement about the data shown in Figure 7b "These results mirrored several studies in the literature which showed that immunization with the 2009 N1 could provide at least partial protection in mice and ferrets to the avian H5N1 challenge". The data did not reflect that. In Figure 5b, mSN1 protects as well as other proteins. In fact, there was no advantage of N109 and N109 hybrid over mSN1 in protection against the homologous H1N109. Although higher levels of NAI antibodies were induced with the homologous protein in Figure 5a. The protection could be contributed by non-NAI antibodies, so the authors should measure binding antibodies. The author may increase the challenge dose from 200 LD50 to 1000 LD50 to see a difference due to the strong immunogenicity of the nanoparticles vaccine plus addavax. Otherwise, it looks like loop grafting is not necessary as heterologous NA could broadly protect.

      We agree that msN1, despite its low NAI titres, was equally protective as homologous NA or its hybrid NA against H1N1/09 virus challenge at 200 LD50. There may be additional protective components, including non-NAI antibodies in homologous groups that may have contributed to the protection.

      We assessed sera binding to H1N1/2009 and found that the binding antibody levels were also lower in the msN1 group. The corresponding graph has now been added in Figure S7d. It was difficult to determine the NAI titre required to confer protection in this experiment. For this reason, we later chose PR8 as the challenge virus to demonstrate loop-specific protection.

      We are uncertain whether a 1000 LD50 challenge would have helped establish a correlation between protection and NAI IC50 titres, as the dose used is already lethal for DBA/2 mice.

      - Why would the authors separate work with N1/09 and N1/19 from PR8 N1? To this reviewer's understanding, they are all the same strategies with increasing numbers of dissimilar residues from N1/09 (12) to N1/19 (16) and to PR8 (18). They are all characterized by the same approaches in vitro and in vivo.

      We had two different goals for making hybrids with N1/09 and PR8 N1, therefore, we have presented these results separately.

      (1) For N1/09 and N1/19, we showed that loop-grafting improved protein yield and stability. Additionally, we showed that the N1/09 hybrid can be as protective as the homologous protein.

      (2) PR8 N1 is a high-yielding protein, so loop grafting did not significantly increase its yield. However, the PR8 virus challenge confirmed loop-specific protection.

      - For in vivo study testing the PR8 construct, although PR8 and PR8 hybrid protect better than the heterologous mSN1, the hybrid again did not show any advantages over the PR8 original proteins.

      That's correct - the PR8 hybrid was not advantageous over the original PR8 protein. However, the purpose of this experiment was to demonstrate loop specific protection. The PR8 hybrid (PR8 loops - mS scaffold) protected 6/6 mice, whereas mS hybrid (mS loops - PR8 scaffold) provided no protection.

      - Line 243-249, lack of reference to figures.

      References to Supplementary Figure 7b,c and Figure 2 has been added.

      - What was the reason that the challenge was one by 200 LD50 for 2009 H1N1 and 1000 LD50 for PR8.

      Viruses were titrated in the BALB/c strain for PR8 virus and the DBA/2 strain for X-179A (H1N1/2009) virus. These doses were selected based on their lethality and the time required to reach the endpoint (~20% weight loss) post-infection, which is 5-6 days. Most studies in the literature have used 10 LD50 or higher; thus the virus doses we used are relatively high.

      - Line 268, there is no Figure 5C.

      This was a mistake and has been corrected to Figure 6c.

      - Line 275 what are the readers supposed to see in supplementary Figure 5a? There is not enough description for the referred figures.

      A sentence has been added to Fig S5a description, to make a point about recognition of the NA scaffold by mAb CD6. "Binding by mAb CD6 is predominantly scaffold dependent and occurs across two protomers"

      - The discussion is very long and some of it is not relevant to the study. For example, the role of the tetramerization domain and the basis for structurally stable tetramer formation, were not the focuses of this study.

      We felt it was important to discuss the tetramerisation domain and the basis for stable tetramer formation. A previous study by Ellis et al.  used the VASP tetramerisation domain and introduced multiple NA interface mutations to achieve a more stable closed conformation. In contrast, NA proteins used in our study required the tetrabrachion tetramerisation domain to form a properly assembled tetramer.

      In lines 382-383, there is one unfinished sentence.

      This is corrected.

      The definition of the loops is also confusing. Line 381, the author stated that in the N1/19 hybrid design, residue N200S, could have been considered as part of the loop B2L23, and was it not?

      The designation of loop ends should not be rigid but rather based on multiple factors such as, their proximity to antigenic epitopes, charge, and hydrophobicity. This is discussed in the " Definition of loops" section.

      - Figure 1a and Figure S2, please provide sufficient descriptions, what do the blocks in different colors mean?

      We have updated the Figure 1a legend to indicate the colours.

      The descriptions for Figures S1 and S2 have also been revised for clarity.

      Reviewer #2 (Recommendations for the authors):

      Minor points

      (1) Line 37: Should be 'Influenza virus neuraminidase'.

      This is corrected.

      (2) Line 65: https://pubmed.ncbi.nlm.nih.gov/35446141/, https://pubmed.ncbi.nlm.nih.gov/33568453/ and https://pubmed.ncbi.nlm.nih.gov/28827718/ indicate that protective mAbs bind all over the NA head domain.

      We have discussed the epitopes on the NA head in detail in the section "The distribution of epitopes on Neuraminidase". In Supplementary Figures 1 and 2, we compiled several studies, including those on polyclonal sera and mAbs epitopes, emphasizing that loops 01 and 23 are the predominant antibody targets (~90%). Some antibodies also bind to the underside of NA. We have discussed and referenced these studies accordingly.

      A new figure has been added [Figure 1b (ii)] to illustrate the surface mapping of epitopes on NA.

      The first reference has been included in both our discussion and Supplementary figure 1.

      The NA epitopes discussed in the second reference have also been incorporated into our discussion and Supplementary figures 1 and 2. Note that, the E258K mutation generated on the NA underside was not relevant to mAbs and was generated randomly by passaging of H3N2 A/New York/PV190/2017 virus. 

      The third reference pertains to murine mAbs against influenza B virus NA.

      (3) Lines 71, 72, and throughout: 'et al.' should be in italics.

      All "et al." have been italicised.

      (4) Many abbreviations are not defined including CHO, SDS-PAGE, MUNANA, mi3, HEPES, BSA, TPCK, MWCO, HRP, PBS, TMB, TCID50, LD50, MES, PEG, PGA, MME, PGA-LM.

      The text has been amended to define these abbreviations.

      (5) Line 209: Shouldn't this be ID50 instead of IC50? Also, it is not defined.

      IC50 has been defined.

      (6) Line 210, line 346, line 581-582: No need to capitalize letters at the beginning of words mid-sentence.

      This is amended.

      (7) Line 227: Is 2009 H1N1 NA meant?

      This has been changed to "H1N1/2009 neuraminidase"

      (8) Line 310: Is this really quantitatively true? (see major comment 1).

      Based on the compilation of epitopes from published NA mAbs and polyclonal sera (via escape mutagenesis and NA-Fabs crystal structures), it is accurate to state that the protective epitopes are primarily located within loops 01 and 23.

      Please also refer to our response to minor point 2. 

      (9) Line 352 and throughout the manuscript: 'in vitro' should be in italics.

      This is amended.

      (10) Line 355: https://pubmed.ncbi.nlm.nih.gov/35446141/https://pubmed.ncbi.nlm.nih.gov/33568453/ and https://pubmed.ncbi.nlm.nih.gov/28827718/ should be included here.

      Studies reporting epitopes on Influenza A neuraminidase have been compiled in Supplementary Figures 1 and 2 and cited appropriately.

      (11) Line 365: https://pubmed.ncbi.nlm.nih.gov/35446141/ and https://pubmed.ncbi.nlm.nih.gov/33568453/ also describe epitopes on the underside of the NA.

      Please refer to the above response to point 10.

      (12) Line 365: Reference https://pubmed.ncbi.nlm.nih.gov/37506693/ is missing here.

      The reference has been added.

      (13) Line 369-371: Is it really a minority?

      In terms of the protective response, the majority of the antibody response is directed towards loops 01 and 23, which form the top antigenic surface. The term 'lateral' is used in some literature to describe NA mAb epitopes; loops 01 and 23 also encompass the lateral regions.

      To clarify this, we have added the following sentence to the Discussion section - "The distribution of epitopes on neuraminidase"

      "It is important to note that loops 01 and 23 include a portion of epitopes that have been described in the literature as side, lateral, or underside (see mAbs NDS.1, NDS.3, and CD6 in Supplementary Fig. 2)"

      Additionally in our studies in mice, we showed that protection is mediated by antibodies targeting the loops (Figure 7). We are uncertain about the binding response to the NA underside, but the NA inhibiting and protective response to the underside appears to be minimal.

      Furthermore Lederhof et al. showed that among the 'underside' mAbs, NDS.1 protected mice against virus challenge, whereas NDS.3 did not. In our analysis (Supplementary Figure 2), NDS.1 makes eight-residue contacts with B4L01 and B5L01, whereas NDS.3 make five-residue contacts with B3L01 and B4L01.

      (14) Line 530: The A in ELLA already stands for assay.

      This is corrected.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      This manuscript by Kremer et al. characterizes the tissue-specific responses to changes in TFAM levels and mtDNA copy number in prematurely aging mice (polg mutator model). The authors find that overexpression of TFAM can have beneficial or detrimental effects depending on the tissue type. For instance, increased TFAM levels increase mtDNA copy number in the spleen and improve spleen homeostasis but do not elevate mtDNA copy number in the liver and impair mtDNA expression.

      Similarly, the consequences of reduced TFAM expression are tissue-specific. Reduced TFAM levels improve brown adipocyte tissue function while other tissues are unaffected. The authors conclude that these tissue-specific responses to altered TFAM levels demonstrate that there are tissue-specific endogenous compensatory mechanisms in response to the continuous mutagenesis produced in the prematurely aging mice model, including upregulation of TFAM expression, elevated mtDNA copy number, and altered mtDNA gene expression. Thus, the impact of genetically manipulating global TFAM expression is limited and there must be other determinants of mtDNA copy number under pathological conditions beyond TFAM. 

      Strengths: 

      Overall, this is an interesting study. It does a good job of demonstrating that given the multi-functional role of TFAM, the outcome of manipulating its activity is complex. 

      Weaknesses: 

      No major weaknesses were noted. We have minor suggestions for improving the clarity of the manuscript that are detailed in the "recommendations for the authors" section. 

      We thank the reviewer for the suggestions and addressed them as described in the "recommendations for the authors" section.

      Reviewer #2 (Public review): 

      Summary: 

      This study by Kremer et al. investigates the impact of modulation of expression of TFAM, a key protein involved in mitochondrial DNA (mtDNA) packaging and expression, in mtDNA mutator mice, which carry random mtDNA mutations. While previous research suggested that increasing TFAM could counteract the pathological effects of mtDNA mutations, this study reveals that the effects of TFAM modulation are tissue-specific. These findings highlight the complexity of mtDNA copy number regulation and gene expression, emphasizing that TFAM alone is not the sole determinant of mtDNA levels in contexts where oxidative phosphorylation is impaired. Other factors likely play a significant role, underscoring the need for nuanced approaches when targeting TFAM for therapeutic interventions. 

      Strengths: 

      The data presented in the manuscript is of high quality and supports major conclusions. 

      Weaknesses: 

      The statistical methods used are not clearly described, and some marked nonsignificant results appear visually significant, which raises concerns about data analysis. 

      Data presentation requires improvement. 

      We thank the reviewer for the comments. We updated the text in the Materials and Methods section to state the statistical methods and improved the figures as described in detail in the "recommendations for the authors" section.

      Recommendations for the authors:

      (1) Please include testis data in Figure 2 given previous work by authors showing that elevated mtDNA copy number can improve testis function. It would be interesting to compare the changes in mtDNA copy number in testis to these other tissues.

      We measured mtDNA copy number in testis using the CytB probe and added it as Supplementary figure 2 A.

      (2) The clarity of Table 1 could be improved. It is difficult to know whether the changes in the TFAM to mtDNA ratio are driven by changes in TFAM levels or mtDNA copy number. A suggestion is to include the TFAM and mtDNA values in parenthesis next to each listed ratio.

      We updated Table 1 and included the values of the normalized TFAM and mtDNA levels in parentheses.

      (3) The authors should consider showing TFAM western blot data in Figure 1.

      We thank the reviewer for the suggestion but would like to keep the TFAM western blot data with the other western blot data for the respective tissue.

      (4) The graphs for qPCR data (e.g. Figure 2) show mRNA or mtDNA levels relative to the control, which is always set to 1. Why, then, does the control group display error bars?

      For the normalization of the data to the WT group, we first calculate the average of the values from all the samples of the WT group. We then divide all values from the samples of all groups, including the WT group, by that average value. By doing so, we set the average value of the WT group to 1 and express all values from all samples of all groups, including the WT group, relative to this average value. Differences between the samples of the WT group are hence retained and allow for error calculations and the display of error bars.  

      (5) Page 3 second sentence to the last: overexpression of TFAM leads to...? Did the author mean mtDNA?

      We updated the text to “Heterozygous knockout of Tfam in wild-type mice results in ~50% decrease of mtDNA levels, whereas moderate overexpression of Tfam leads to ~50% increase in mtDNA levels25,26”

      (6) The sentence "In summary, mtDNA copy number regulation is more complex than previously assumed and the TFAM-to-mtDNA ratio seems to be finely tuned in a tissue-specific manner" - not clear who assumed (references?) and based on what data, please rephrase.

      We updated the text and it now reads “In summary, mtDNA copy number regulation is more complex than suggested by previous studies23–27 and the TFAM-to-mtDNA ratio seems to be finely tuned in a tissue-specific manner.”

      (7) The significant increase in complex II activity under TFAM overexpression (Figure 3) warrants additional discussion.

      We updated the Results section and it now reads “We detected increased levels of the complex II subunit Succinate Dehydrogenase Complex Iron Sulfur Subunit B (SDHB). Complex II is exclusively nuclear encoded and a compensatory increase upon impaired mitochondrial gene expresson has been observed before32.

      We proceeded to measure the enzyme activities of individual OXPHOS complexes in liver mitochondria (Fig. 3C). The complex I and complex IV activities were reduced to about 50% in Polg-/mut; Tfam+/+ mice in comparison with wild-type mice (Fig. 3C). However, we did not see any further alteration of the reduced enzyme activities induced by TFAM overexpression or reduced TFAM expression (Fig. 3C). Interestingly, we detected a significant increase in complex II and complex II + complex III activity upon TFAM overexpression, which can partially be explained by the increased complex II protein levels we oberseved in Polg-/mut; Tfam+/OE mice (Fig. 3, B and C).”

      (8) The statistical methods used should be explicitly stated. Some results marked as non-significant appear visually significant, for example, mt-Cytb in Figure 2C, Supplementary Figure 2B).

      We updated the text in the Materials and Methods section to state the statistical methods and it now reads “Statistical analysis and generation of graphs were performed with GraphPad Prism v9 software except for quantitative mass spectrometry data which was analyzed and plotted using R as described above. Statistical comparisons were performed using one-way analysis of variance (ANOVA), and post hoc analysis was conducted with Dunnett’s multiple comparisons test. Values of P < 0.05 were considered statistically significant.”

      Minor points: 

      (1) Replace numerical indications of significance with asterisks for consistency.

      We replaced all numerical indications of significance with asterisks.

      (2) Abbreviations SKM and BAT are not defined.

      We removed the mentioning of SKM (skeletal muscle) as the data from this tissue was not included. The Introduction reads “In contrast, in brown adipose tissue (BAT), a decrease in TFAM levels normalized Uncoupling protein 1 (Ucp1) expression.”

      (3) Use uniform scales across bar graphs in Figure 2 to improve clarity.

      We updated Figure 2 to have uniform scales.

      (4) Remove or increase the transparency of data points in Figure 1A to make group averages more discernible.

      We removed the data points in Figure 1A.

      (5) Add a Y-axis title to Figure 1C.

      We added the Y-axis title “Heart / body weight” to Figure 1C.

      (6) Size of the font used in some figures (4?) is not appropriate.

      We increased the font size for the figures.

      (7) All figure legend titles need work. Insert "expression" after TFAM in the Figure 2 title, Change the title to "Modulation of TFAM expression..." in Figure 4. 

      The figure legends now read as follows:

      “Figure 2: Modulation of TFAM expression affects mtDNA copy number in a tissue-specific manner.”

      “Figure 4: Alteration of TFAM expression does not affect the heart phenotype of mtDNA mutator mice.”

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this paper Kawasaki et al describe a regulatory role for the PIWI/piRNA pathway in rRNA regulation in Zebrafish. This regulatory role was uncovered through a screen for gonadogenesis defective mutants, which identified a mutation in the meioc gene, a coiled-coil germ granule protein. Loss of this gene leads to redistribution of Piwil1 from germ granules to the nucleolus, resulting in silencing of rRNA transcription.

      Strengths:

      Most of the experimental data provided in this paper is compelling. It is clear that in the absence of meioc, PiwiL1 translocates in to the nucleolus and results in down regulation of rRNA transcription. the genetic compensation of meioc mutant phenotypes (both organismal and molecular) through reduction in PiwiL1 levels are evidence for a direct role for PiwiL1 in mediating the phenotypes of meioc mutant.

      Weaknesses:

      Questions remain on the mechanistic details by which PiwiL1 mediated rRNA down regulation, and whether this is a function of Piwi in an unperturbed/wildtype setting. There is certainly some evidence provided in support of the natural function for piwi in regulating rRNA transcription (figure 5A+5B). However, the de-enrichment of H3K9me3 in the heterozygous (Figure 6F) is very modest and in my opinion not convincingly different relative to the control provided. It is certainly possible that PiwiL1 is regulating levels through cleavage of nascent transcripts. Another aspect I found confounding here is the reduction in rRNA small RNAs in the meioc mutant; I would have assumed that the interaction of PiwiL1 with the rRNA is mediated through small RNAs but the reduction in numbers do not support this model. But perhaps it is simply a redistribution of small RNAs that is occurring. Finally, the ability to reduce PiwiL1 in the nucleolus through polI inhibition with actD and BMH-21 is surprising. What drives the accumulation of PiwiL1 in the nucleolus then if in the meioc mutant there is less transcription anyway?

      Despite the weaknesses outlined, overall I find this paper to be solid and valuable, providing evidence for a consistent link between PIWI systems and ribosomal biogenesis. Their results are likely to be of interest to people in the community, and provide tools for further elucidating the reasons for this link.

      The amount of cytoplasmic rRNA in piwi+/- was increased by 26% on average (figure 5A+5B), the amount of ChiP-qPCR of H3K9 was decreased by about 26% (Figure 6F), and ChiP-qPCR of Piwil1 was decreased by 35% (Figure 6G), so we don't think there is a big discrepancy. On the other hand, the amount of ChiP-qPCR of H3K9 in meioc<sup>mo/mo</sup> was increased by about 130% (Figure 6F), while ChiP-qPCR of Piwil1 was increased by 50%, so there may be a mechanism for H3K9 regulation of Meioc that is not mediated by Piwil1. As for what drives the accumulation of Piwil1 in the nucleolus, although we have found that Piwil1 has affinity for rRNA (Fig. 6A), we do not know what recruits it. Significant increases in the 18-35nt small RNA of 18S, 28S rRNAs and R2 were not detected in meioc<sup>mo/mo</sup> testes enriched for 1-8 cell spermatogonia, compared with meioc<sup>+/mo</sup> testes. The nucleolar localization of Piwil1 has revealed in this study, which will be a new topic for future research.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors report that Meioc is required to upregulate rRNA transcription and promote differentiation of spermatogonial stem cells in zebrafish. The authors show that upregulated protein synthesis is required to support spermatogonial stem cells' differentiation into multi-celled cysts of spermatogonia. Coiled coil protein Meioc is required for this upregulated protein synthesis and for increasing rRNA transcription, such that the Meioc knockout accumulates 1-2 cell spermatogonia and fails to produce cysts with more than 8 spermatogonia. The Meioc knockout exhibits continued transcriptional repression of rDNA. Meioc interacts with and sequesters Piwil1 to the cytoplasm. Loss of Meioc increases Piwil1 localization to the nucleolus, where Piwil1 interacts with transcriptional silencers that repress rRNA transcription.

      Strengths:

      This is a fundamental study that expands our understanding of how ribosome biogenesis contributes to differentiation and demonstrates that zebrafish Meioc plays a role in this process during spermatogenesis. This work also expands our evolutionary understanding of Meioc and Ythdc2's molecular roles in germline differentiation. In mouse, the Meioc knockout phenocopies the Ythdc2 knockout, and studies thus far have indicated that Meioc and Ythdc2 act together to regulate germline differentiation. Here, in zebrafish, Meioc has acquired a Ythdc2-independent function. This study also identifies a new role for Piwil1 in directing transcriptional silencing of rDNA.

      Weaknesses:

      There are limited details on the stem cell-enriched hyperplastic testes used as a tool for mass spec experiments, and additional information is needed to fully evaluate the mass spec results. What mutation do these testes carry? Does this protein interact with Meioc in the wildtype testes? How could this mutation affect the results from the Meioc immunoprecipitation?

      Stem cell-enriched hyperplastic testes came from wild-type adult sox17::GFP transgenic zebrafish. Sperm were found in these hyperplastic testes, and when stem cells were transplanted, they self-renewed and differentiated into sperm. It is not known if the hyperplasias develop due to a genetic variant in the line. We added the following comment in L201-204.

      “The SSC-enriched hyperplastic testes, which are occasionally found in adult wildtype zebrafish, contain cells at all stages of spermatogenesis. Hyperplasia-derived SSCs self-renewed and differentiated in transplants of aggregates mixed with normal testicular cells.”

      Reviewer #3 (Public review):

      Summary:

      The paper describes the molecular pathway to regulate germ cell differentiation in zebrafish through ribosomal RNA biogenesis. Meioc sequesters Piwil1, a Piwi homolog, which suppresses the transcription of the 45S pre-rDNA by the formation of heterochromatin, to the perinuclear bodies. The key results are solid and useful to researchers in the field of germ cell/meiosis as well as RNA biosynthesis and chromatin.

      Strengths:

      The authors nicely provided the molecular evidence on the antagonism of Meioc to Piwil1 in the rRNA synthesis, which supported by the genetic evidence that the inability of the meioc mutant to enter meiosis is suppressed by the piwil1 heterozygosity.

      Weaknesses:

      (1) Although the paper provides very convincing evidence for the authors' claim, the scientific contents are poorly written and incorrectly described. As a result, it is hard to read the text. Checking by scientific experts would be highly recommended. For example, on line 38, "the global translation activity is generally [inhibited]", is incorrect and, rather, a sentence like "the activity is lowered relative to other cells" is more appropriate here. See minor points for more examples.

      Thank you for pointing that out. I corrected the parts pointed out.

      (2) In some figures, it is hard for readers outside of zebrafish meiosis to evaluate the results without more explanation and drawing.

      We refined Figure 1A and added explanation about SSC, sox17::egfp positive cells, and the SSC-enriched hyperplastic testis in L155-158.

      (3) Figure 1E, F, cycloheximide experiments: Please mention the toxicity of the concentration of the drug in cell proliferation and viability.

      When testicular tissue culture was performed at 0.1, 1, 10, 100, 250, and 500mM, abnormal strong OP-puro signals including nuclei were found in cells at 10mM or more. We added the results in the Supplemental Figure S2G. In addition, at 1mM, growth was perturbed in fast-growing 32≤-cell cysts of spermatogonia, but not in 1-4-cell spermatogonia, as described in L127-130.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I don't have any recommendations for improvement. While I have outlined some of the weaknesses of the paper above. I don't see addressing these questions as pertinent for publication of this paper.

      Reviewer #2 (Recommendations for the authors):

      (1) The manuscript uses the terms 1-2 cell spermatogonia, GSC, and SSC throughout the figures and text. For example, 1-2 cell spermatogonia is used in Figure 1C, GSC is used in Figure 1F, and SSC is used in Figure 1 legend. The use of all three terms without definitions as to how they each relate with one another is confusing, particularly to those outside the zebrafish spermatogenesis field. It would be best to only use one term if the three terms are used interchangeably or to define each term if they represent different populations.

      GSC is a writing mistake. In this study, sox17-positive cells, which have been confirmed to self-renew and differentiate (Kawasaki et al., 2016), are considered SSCs. On the other hand, a comparison of meioc and ythdc2 mutants revealed differences in the composition of each cyst, so we describe the number of cysts confirmed. We added new data that 1-2 cell spermatogonia are sox17-positive in Supplemental Figure S3 (L157-158).

      (2) Figure 1B: What does the "SC" label represent in these figure panels?

      We added the explanation in the Figure legend.

      (3) Fig 7B and S7B show incongruent results, and the text implies that Fig S7B data better reflects in vivo biology. It is not clear how the authors interpret the different results between 7B and S7B.

      Thank you for pointing that out. Fig 7A and 7B were obtained by isolating sox17-positive cells. Because it was difficult to detect nucleoli in the isolated cells, probably due to the isolation procedure, we added S7B, which was analyzed in sectioned tissues. As this reviewer pointed out, S7B reflects the in vivo state better, so we changed S7B to 7B and 7B to S7B.

      Reviewer #3 (Recommendations for the authors):

      Minor points:

      (1) For general readers, it is nice to add a scheme of zebrafish spermatogenesis (lines 77-78) together with Figure 1A.

      As mentioned above, we refined Figure 1A.

      (2) Line 28, silence: the word "silence" is too strong here since rDNA is transcribed in some levels to ensure the cell survival.

      Thank you for your comment. We changed "silence" to "maintain low levels."

      (3) Line 60, YTDHC2: Please explain more about what protein YTDHC2 is.

      We added a description of Ythdc2 in the introduction.

      (4) Line 69, Piwil1: Please explain more about what protein Piwil1 is.

      We added a description of Piwil1 in the introduction.

      (5) Figure 1B, sperm: Please show clearly which sperms are in this figure using arrows etc.

      We represented sperm using arrowheads in Fig 1B.

      (6) Figure 1C, SC: Please show what SC is in the legend.

      We added the explanation in the Figure legend.

      (7) Line 83, meiotic makers: should be "meiotic prophase I makers".

      Thank you for pointing out the inaccurate expression description. We revised it.

      (8) Line 84, phosphor-histone H3: Should be "histone H3 phospho-S10 "

      We revised it.

      (9) Figure S1A, PH3: Please add PH3 is "histone H3 phospho-S10 ".

      We revised it.

      (10) Figure S1A, moto+/-: this heterozygous mutant showed an increased apoptosis. If so, please mention this in the text. If not, please remove the data.

      Thank you for pointing that out. The heterozygous mutant did not increase apoptosis, so we removed the data.

      (11) Line 88, no females developed: This means all males in the mutant. If so, what Figure S1B shows? These cells are spermatocytes? No "oocytes" developed is correct here?

      All meioc<sup>mo/mo</sup> zebrafish were males, and the meioc<sup>mo/mo</sup> cells in Fig. S1B are spermatogonia. No spermatocytes or oocytes were observed. To show this, we added "no oocytes" in L90.

      (12) Line 89, initial stages: What do the initial stages mean here? Please explain.

      The “initial stages” was changed to the pachytene stage.

      (13) Figure S1C: mouse Meioc rectangle lacks a right portion of it. Please explain two mutations encode a truncated protein in the main text.

      I apologize. It seems that the portion was missing during the preparation of the manuscript. We corrected it. In addition, we added a description of the protein truncation in L100-101.

      (14) Line 99: What "GRCz11" is.

      GRCz11 refers to the version of the zebrafish reference genome assembly. We added this.

      (15) Figure S2A: Dotted lines are cysts. If so, please mention it in the legend.

      We corrected the figure legend.

      (16) Figure S2B and C:, B1-4, C1-7: Rather use spermatogonia etc as a caption here.

      We corrected the figure and figure legend.

      (17) Line 113, hereafter, wildtype: Should be "wild type" or "wild-type".

      We corrected them.

      (18) Figure 1C: Please indicate what dotted lines mean here.

      We added “Dotted lines; 1-2 cell spermatogonia.”

      (19) Line 113, de novo: Please italicize it.

      We corrected it.

      (20) Line 113-116: Figure 1D shows two populations in the protein synthesis (low and high) in the 1-2-cell stage. Please mention this in the text.

      We added mention of two population.

      (21) Line 121, in vitro: Please italicize it.

      We corrected it.

      (22) Line 138-139, Figure 2A: Please indicate two populations in the rRNA concentrations (low and high) in the 1-2-cell stage. How much % of each cell is?

      We added mention of two population and % of each cell.

      (23) Figure 2B, cytes: Please explain the rRNA expression in spermatocytes (cytes) in the text.

      The decrease in rRNA signal intensity in spermatocytes was added.

      (24) Figure 2A, lines 147, low signals: Figure 2A did not show big differences between wild type and the mutant. What did the authors mean here? Lower levels of rRNAs in the mutant than in wild type. If so, please write the text in that way.

      We think that it is important to note that we were unable to find cells with upregulated rRNA signals, and therefore changed to “could not find cells with high signals of rRNAs and Rpl15 in meioc<sup>mo/mo</sup> spermatogonia”.

      (25) Figure 2E: Please add a schematic figure of a copy of rDNA locus such as Fig. S3A right.

      We added a schema of rDNA locus and primer sites such as Figure S3A right (now Figure 2F) in Figure 2E.

      (26) Figure S3A: This Figure should be in the main Figure. The quantification of Northern blots should be shown as a graph with statistical analysis.

      We added the quantification and transfer to the main Figure (Figure 2F).

      (27) Figure 4A: Please show single-color images (red or green) with merged ones.

      We added single-color images in the Figure 4A.

      (28) Line 198, Piwil1: Please explain what Piwil1 is briefly.

      We are sorry, but we could not quite understand the meaning of this comment. To show that Piwil1 is located in the nucleolus, we indicated it as (Figure 4A, arrowhead) in L209.

      (29) Line 198, Ddx4-positive: What is "Ddx4-positive"? Explain it for readers.

      Ddx4 is a marker for germinal granules, and the description was changed to reflect this.

      (30) Line 209, Fig. S4D-G: Please mention the method of the detection of piRNA briefly.

      We have described that we have sequenced small RNAs of 18-35 nt. Accordingly, we changed the term piRNA to small RNA.

      (31) Line 217: Please mention piwil1 homozygous mutant are inviable.

      We added that piwil1-/- are viable in L231.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Recommendations for the authors):

      (1) Storyline and Narrative Flow:

      Consider revising the manuscript to create a more coherent and consistent narrative. Clarify how each section of the study-particularly the transition from multi-omics data integration to single-cell RNA-seq validation-contributes to the overall research question. This will help readers better understand the logical flow of the study.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      We have modified some text, including the connections between different sections in the results part and the objectives and roles of various analyses in each section, thus enhancing the coherence between the contexts and clarifying the objectives and functions of each analysis, We believe this will help readers better understand the main content of the entire text.

      (2) Immune Cell Activity Analysis:

      Reevaluate the methods used to assess immune cell activities within the context of the tumor microenvironment. Consider providing additional justification for the relevance of using the cancer cell model for this analysis. If necessary, explore alternative methods or models that might offer more meaningful insights into immune-tumor interactions.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      Using RNA-Bulk data, we evaluated the tumor immune microenvironment through various methods to assess immune infiltration levels and responses to immunotherapy. We found that the results were largely consistent with those presented in the manuscript, providing strong support for our viewpoints. We also acknowledge the limitations of findings from bioinformatics analysis. In our upcoming research, we plan to develop organoid models with gene expression patterns of both CS1 and CS2 subtypes, using these models as a foundation for studying the tumor immune microenvironment.

      (3) Single-Cell RNA-Seq Validation:

      Expand the validation of your findings using single-cell RNA-seq data. This could include more in-depth analyses that explore the heterogeneity within the subtypes and confirm the robustness of your classification method at the single-cell level. This would strengthen the support for your claims about the relevance of the identified subtypes.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      In this manuscript, we employed the NTP algorithm to classify malignant cells identified by the CopyKAT algorithm using characteristic genes of CS1 and CS2 subtypes. This approach is similar to previous method that analyzed patients in the ICGC cohort with the same subtype genes. We consider this classification method valid.

      After classifying the malignant cells, we performed metabolic and cell communication analyses on the CS1 and CS2 subtype cells, revealing significant differences in biological pathways enriched by differential genes, metabolic levels, and cell signaling patterns. These differences align with variations observed in prior classifications and analyses based on RNA-Bulk data.

      We also acknowledge that validating the classification method solely with the single-cell dataset from this study is insufficient. We analyzed GSE202642 using the same processes and methods as GSE229772, finding that the results were generally consistent, indicating that our classification method exhibits a degree of robustness at the single-cell level.

      (4) Methodological Justification:

      Provide a more detailed rationale for the selection of machine learning algorithms and integration strategies used in the study. Explain why the chosen methods are particularly well-suited for this research, and discuss any potential limitations they might have.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      We have updated the methodology section to enhance readers' understanding of the fundamental principles involved. This analysis has two key features: first, it combines 10 machine learning algorithms to generate 101 models and ultimately selects the prognostic prediction model with the highest C-index from these 101 algorithms; second, it utilizes the LOOCV method to analyze the training and validation sets. Compared to the conventional method of randomly dividing the training and validation sets by a fixed ratio, this approach significantly minimizes the bias and randomness introduced by the splitting process. Therefore, we believe this analysis can leverage the characteristic genes of the CS1 and CS2 subtypes, combined with existing clinical data from public databases, to yield results that are more accurate and reliable than the commonly used prognostic models in previous literature, such as COX regression and Lasso regression, as well as other individual algorithms. While this analysis presents advantages over some previous modeling methods, it is essential to recognize that it remains based on analyses conducted using public databases, which may obscure certain factors that might be clinically relevant to patient prognosis due to the mathematical logic of the algorithms.

      (5) Figures and Visualizations:

      Improve the clarity of your figures by addressing the following:

      a) Figure 3A: Cluster the pathways to make the comparisons clearer and more meaningful.

      b) Figure 4A: Clearly explain the significance of the blue bar.

      c) Figure 4B: Ensure this figure is discussed in the main text to justify its inclusion.

      d) Figure 7C: Enhance the figure legend to provide more informative details.

      Additionally, ensure that figure descriptions go beyond the captions and provide detailed explanations that help the reader understand the significance of each figure.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      Figure 3A: We clustered the samples based on CS1 and CS2 subtypes and displayed the immune-related cell scores of each sample as a heatmap.

      Figure 4A: The blue bars in the figure represent the average C-index of this algorithm combination in the training dataset TCGA and the validation dataset ICGC, which we have supplemented in the corresponding sections of the text.

      Figure 4B: We described this figure in the results section, which primarily aims to validate whether our prognostic prediction model can predict patient outcomes in the TCGA cohort. The results showed that after performing prognostic risk scoring on patients based on the prediction model and categorizing them into high-risk and low-risk groups, the two groups exhibited significant prognostic differences, with the high-risk group showing worse outcomes compared to the low-risk group. This indicates that our prognostic prediction model can effectively distinguish the prognostic risk differences among patients in the TCGA-LIHC cohort. We also discussed these findings in the discussion section.

      Figure 7C: We used both point color and size to visualize the levels of metabolic scores, resulting in two dimensions in the legend, which actually represent the same information. Therefore, we removed the results that used point size to indicate the levels of metabolic scores.

      (6) Supplementary Materials:

      Consider including more detailed supplementary materials that provide additional validation data, extended methodological descriptions, and any other information that would support the robustness of your findings.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      In the subsequent version of the record, we will upload the important results obtained during the research to GitHub, and in this revision, we have updated some figures that may better explain the results or the robustness of the findings as supplementary materials.

      (7) Recent Literature:

      a) Incorporate more recent studies in your discussion, especially those related to HCC subtypes and the application of machine learning in oncology. This will provide a more current context for your work and help position your findings within the broader field.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      We have reviewed several studies related to HCC subtype classification and the application of machine learning in this field. In the discussion section, we summarize the significance and limitations of these studies. Additionally, we discuss the characteristics of our study in comparison to previous research in this field.

      (8) Data and Code Availability:

      Ensure that all data, code, and materials used in your study are made available in line with eLife's policies. Provide clear links to repositories where readers can access the data and code used in your analyses.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      We have examined the relevant data, code, and materials. We confirm that we have indicated the sources of the data and tools used in the analysis within the manuscript. Moreover, these data and tools are accessible via the websites or references we have provided.

      Reviewer #2 (Recommendations for the authors):

      (1) While the computational findings are robust, further experimental validation of the two subtypes, particularly the role of the MIF signaling pathway, would strengthen the biological relevance of the findings. In vitro or in vivo validation could confirm the proposed mechanisms and their influence on patient prognosis.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      We intend to verify our findings in future studies using tumor cell line models and animal models. We aim to identify and intervene with key molecules in the MIF signaling pathway. We will investigate how the MIF signaling pathway affects tumor sensitivity to treatment in both cell line and animal models, along with the underlying mechanisms.

      (2) Consider testing the model on additional independent cohorts beyond the TCGA and ICGC datasets to further demonstrate its generalizability and applicability across different patient populations.

      We thank the reviewer’s suggestion, which have highlighted the deficiencies in this area, and we have made appropriate modifications:

      We analyzed the GSE14520 study recorded in the GEO database, which uploaded a cohort consisting of 209 HCC patients and their corresponding RNA sequencing data. We validated the prognostic model obtained in this study using this cohort, and found that the model effectively distinguishes patients into high-risk and low-risk prognostic categories. Furthermore, there is a significant prognostic difference between the high-risk and low-risk patient groups. This is consistent with the results we obtained previously.

      (3) Review the manuscript for long or complex sentences, which can be broken down into shorter, more readable parts.

      We have made revisions to the long and complex sentences in the manuscript without compromising its academic integrity and rationality, with the hope that this will help readers better understand the content of this study.

      During the revision process, in addition to addressing the reviewer comments, we conducted a thorough review of the analysis. In the course of this review, we identified a few errors in the data usage and have since corrected the relevant data and figures:

      Figure 4: Due to space constraints, we adjusted the composition of the figures after incorporating the validation results from the GSE14520 dataset.

      Figure 5A: We rechecked the regression coefficients included in the model, updated several more recent prognostic models, and calculated the C-index for 20 prognostic models in the TCGA and ICGC cohorts using a method consistent with previous studies.

      Figure 5C-D: We adjusted the clarity of the figures.

      Figure 8: We reclassified the selected malignant cells and updated the subtypes results. Subsequently, based on the repeatedly confirmed typing results, we comprehensively updated the analysis results of the subsequent cell communication network construction, ensuring that the entire analysis process remains consistent with previous findings. We also adjusted the composition of the figure and presented the images that could not be conveniently merged due to space constraints as Figure 9.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      This manuscript describes a series of experiments documenting trophic egg production in a species of harvester ant, Pogonomyrmex rugosus. In brief, queens are the primary trophic egg producers, there is seasonality and periodicity to trophic egg production, trophic eggs differ in many basic dimensions and contents relative to reproductive eggs, and diets supplemented with trophic eggs had an effect on the queen/worker ratio produced (increasing worker production).

      The manuscript is very well prepared and the methods are sufficient. The outcomes are interesting and help fill gaps in knowledge, both on ants as well as insects, more generally. More context could enrich the study and flow could be improved.

      We thank the reviewer for these comments. We agree that the paper would benefit from more context. We have therefore greatly extended the introduction.

      Reviewer #2 (Public Review):

      The manuscript by Genzoni et al. provides evidence that trophic eggs laid by the queen in the ant Pogonomyrmex rugosis have an inhibitory effect on queen development. The authors also compare a number of features of trophic eggs, including protein, DNA, RNA, and miRNA content, to reproductive eggs. To support their argument that trophic eggs have an inhibitory effect on queen development, the authors show that trophic eggs have a lower content of protein, triglycerides, glycogen, and glucose than reproductive eggs, and that their miRNA distributions are different relative to reproductive eggs. Although the finding of an inhibitory influence of trophic eggs on queen development is indeed arresting, the egg cross-fostering experiment that supports this finding can be effectively boiled down to a single figure (Figure 6). The rest of the data are supplementary and correlative in nature (and can be combined), especially the miRNA differences shown between trophic and reproductive eggs. This means that the authors have not yet identified the mechanism through which the inhibitory effect on queen development is occurring. To this reviewer, this finding is more appropriate as a short report and not a research article. A full research article would be warranted if the authors had identified the mechanism underlying the inhibitory effect on queen development. Furthermore, the article is written poorly and lacks much background information necessary for the general reader to properly evaluate the robustness of the conclusions and to appreciate the significance of the findings.

      We thank the reviewer for these comments. We agree that the paper would benefit by having more background information and more discussion. We have followed this advice in the revision.

      Reviewer #3 (Public Review):

      In "Trophic eggs affect caste determination in the ant Pogonomyrmex rugosus" Genzoni et al. probe a fundamental question in sociobiology, what are the molecular and developmental processes governing caste determination? In many social insect lineages, caste determination is a major ontogenetic milestone that establishes the discrete queen and worker life histories that make up the fundamental units of their colonies. Over the last century, mechanisms of caste determination, particularly regulators of caste during development, have remained relatively elusive. Here, Genzoni et al. discovered an unexpected role for trophic eggs in suppressing queen development - where bi-potential larvae fed trophic eggs become significantly more likely to develop into workers instead of gynes (new queens). These results are unexpected, and potentially paradigm-shifting, given that previously trophic eggs have been hypothesized to evolve to act as an additional intracolony resource for colonies in potentially competitive environments or during specific times in colony ontogeny (colony foundation), where additional food sources independent of foraging would be beneficial. While the evidence and methods used are compelling (e.g., the sequence of reproductive vs. trophic egg deposition by single queens, which highlights that the production of trophic eggs is tightly regulated), the connective tissue linking many experiments is missing and the downstream mechanism is speculative (e.g., whether miRNA, proteins, triglycerides, glycogen levels in trophic eggs is what suppresses queen development). Overall, this research elevates the importance of trophic eggs in regulating queen and worker development but how this is achieved remains unknown.

      We thank the reviewer for these comments and agree that future work should focus on identifying the substances in trophic eggs that are responsible for caste determination.  

      Reviewer #1 (Recommendations For The Authors):

      Introduction:

      The context for this study is insufficiently developed in the introduction - it would be nice to have a more detailed survey of what is known about trophic eggs in insects, especially social insects. The end of the introduction nicely sets up the hypothesis through the prior work described by Helms Cahan et al. (2011) where they found JH supplementation increased trophic egg production and also increased worker size. I think that the introduction could give more context about egg production in Pogonomyrmex and other ants, including what is known about worker reproduction. For example, Suni et al. 2007 and Smith et al. 2007 both describe the absence of male production by workers in two different harvester ants. Workers tend to have underdeveloped ovaries when in the presence of the queen. Other species of ants are known to have worker reproduction seemingly for the purpose of nutrition (see Heinze and Hölldober 1995 and subsequent studies on Crematogaster smithi). Because some ants, including Pogonomyrmex, lack trophallaxis, it has been hypothesized that they distribute nutrients throughout the nest via trophic eggs as is seen in at least one other ant (Gobin and Ito 2000). Interestingly, Smith and Suarez (2009) speculated that the difference in nutrition of developing sexual versus worker larvae (as seen in their pupal stable isotope values) was due to trophic egg provisioning - they predicted the opposite as was found in this study, but their prediction was in line with that of Helms Cahan et al. (2011). This is all to say that there is a lot of context that could go into developing the ideas tested in this paper that is completely overlooked. The inclusion of more of what is known already would greatly enrich the introduction.

      We agree that it would be useful to provide a larger context to the study. We now provide more information on the life-history of ants and explained under what situations queens and workers may produce trophic eggs. We also mentioned that some ants such as Crematogaster smithi have a special caste of “large workers” which are morphologically intermediate between winged queens and small workers and appear to be specialized in the production of unfertilized eggs. We now also mention the study of Goby and Ito (200) where the authors show that trophic eggs may play an important role in food distribution withing the colony, in particular in species where trophallaxis is rare or absent.

      Methods:

      L49: What lineage is represented in the colonies used? The collection location is near where both dependent-lineage (genetic caste determining) P. rugosus and "H" lineage exist. This is important to know. Further, depending on what these are, the authors should note whether this has relevance to the study. Not mentioning genetic caste determination in a paper that examines caste determination is problematic.

      This is a good point. We have now provided information at the very beginning of the material and method section that the queens had been collected in populations known not to have dependentlineage (genetic caste determining) mechanisms of caste determination.

      L63 and throughout: It would be more efficient to have a paragraph that cites R (must be done) and RStudio once as the tool for all analyses. It also seems that most model construction and testing was done using lme4 - so just lay this out once instead of over and over.

      We agree and have updated the manuscript accordingly.

      L95: 'lenght' needs to be 'length' in the formula.

      Thanks, corrected.

      L151: A PCA was used but not described in the methods. This should be covered here. And while a Mantel test is used, I might consider a permANOVA as this more intuitively (for me, at least) goes along with the PCA.

      We added the PCA description in the Material and Method section.

      Results:

      I love Fig. 3! Super cool.

      Thanks for this positive comment.

      Discussion:

      It would be good to have more on egg cannibalism. This is reasonably well-studied and could be good extra context.

      We have added a paragraph in the discussion to mention that egg cannibalism is ubiquitous in ants.

      Supp Table 1: P. badius is missing and citations are incorrectly attributed to P. barbatus.

      P. badius was present in the Table but not with the other Pogonomyrmex species. For some genera the species were also not listed in alphabetic order. This has been corrected.

      Reviewer #2 (Recommendations For The Authors):

      COMMENTS ON INTRODUCTION:

      The introduction is missing information about caste determination in ants generally and Pogonomyrmex rugosis specifically. This is important because some colonies of Pogonomyrmex rugosis have been shown to undergo genetic caste determination, in which case the main result would be rendered insignificant. What is the evidence that caste determination in the lineages/colonies used is largely environmentally influenced and in what contexts/environmental factors? All of this should be made clear.

      This is a good point. We have expanded the introduction to discuss previous work on caste determination in Pogonomyrmex species with environmental caste determination and now also provide evidence at the beginning of the Material and Method section that the two populations studied do not have a system of genetic caste determination.

      Line 32 and throughout the paper: What is meant exactly by 'reproductive eggs'? Are these eggs that develop specifically into reproductives (i.e., queens/males) or all eggs that are non-trophic? If the latter, then it is best to refer to these eggs as 'viable' in order to prevent confusion.

      We agree and have updated the manuscript accordingly.

      Figure 1/Supp Table 1: It is surprising how few species are known to lay trophic eggs. Do the authors think this is an informative representation of the distribution of trophic egg production across subfamilies, or due to lack of study? Furthermore, the branches show ant subfamilies, not families. What does the question mark indicate? Also, the information in the table next to the phylogeny is not easy to understand. Having in the branches that information, in categories, shown in color for example, could be better and more informative. Finally, having the 'none' column with only one entry is confusing - discuss that only one species has been shown to definitely not lay trophic eggs in the text, but it does not add much to the figure.

      Trophic eggs are probably very common in ants, but this has not been very well studied. We added a sentence in the manuscript to make this clear.

      Thanks for noticing the error family/subfamily error. This has been corrected in Figure 1 and Supplementary Table 1.

      The question mark indicates uncertainty about whether queens also contribute to the production of trophic eggs in one species (Lasius niger). We have now added information on that in the Figure legend.

      We agree with the reviewer that it would be easier to have the information on whether queens and workers produce trophic on the branches of the Tree. However, having the information on the branches would suggest that the “trait” evolved on this part of the tree. As we do not know when worker or queen production of trophic eggs exactly evolved, we prefer to keep the figure as it is.

      Finally, we have also removed the none in the figure as suggested by the reviewer and discussed in the manuscript the fact that the absence of trophic eggs has been reported in only one ant species (Amblyopone silvestrii: Masuko 2003).

      COMMENTS ON MATERIALS AND METHODS:

      Why did they settle on three trophic eggs per larva for their experimental setup?

      We used three trophic eggs because under natural conditions 50-65% of the eggs are trophic. The ratio of trophic eggs to viable eggs (larvae) was thus similar natural condition.

      Line 50: In what kind of setup were the ants kept? Plaster nests? Plastic boxes? Tubes? Was the setup dry or moist? I think this information is important to know in the context of trophic eggs.

      We now explain that colonies were maintained in plastic boxes with water tubes.

      Line 60: Were all the 43 queens isolated only once, or multiple times?

      Each of the 43 queens were isolated for 8 hours every day for 2 weeks, once before and once after hibernation (so they were isolated multiple times). We have changed the text to make clear that this was done for each of the 43 queens.

      Could isolating the queen away from workers/brood have had an effect on the type of eggs laid?

      This cannot be completely ruled out. However, it is possible to reliably determine the proportion of viable and trophic eggs only by isolating queens. And importantly the main aim of these experiments was not to precisely determine the proportion viable and trophic eggs, but to show that this proportion changes before and after hibernation and that queens do not lay viable and trophic eggs in a random sequence.

      Since it was established that only queens lay trophic eggs why was the isolation necessary?

      Yes this was necessary because eggs are fragile and very difficult to collect in colonies with workers (as soon as eggs are laid they are piled up and as soon as we disturb the nest, a worker takes them all and runs away with them). Moreover, it is possible that workers preferentially eat one type of eggs thus requiring to remove eggs as soon as queens would have laid them. This would have been a huge disturbance for the colonies.

      Line 61: Is this hibernation natural or lab induced? What is the purpose of it? How long was the hibernation and at what temperature? Where are the references for the requirement of a diapause and its length?

      The hibernation was lab induced. We hibernated the queens because we previously showed that hibernation is important to trigger the production of gynes in P. rugosus colonies in the laboratory (Schwander et al 2008; Libbrecht et al 2013). Hibernation conditions were as described in Libbrecht et al (2013).  

      Line 73: If the queen is disturbed several times for three weeks, which effect does it have on its egg-laying rate and on the eggs laid? Were the eggs equally distributed in time in the recipient colonies with and without trophic eggs to avoid possible effects?

      It is difficult to respond what was the effect of disturbance on the number and type of eggs laid. But again our aim was not to precisely determine these values but determine whether there was an effect of hibernation on the proportion of trophic eggs. The recipient colonies with and without trophic eggs were formed in exactly the same way. No viable eggs were introduced in these colonies, but all first instar larvae have been introduced in the same way, at the same time, and with random assignment. We have clarified this in the Material and Method section.

      Line 77: Before placing the freshly hatched larvae in recipient colonies, how long were the recipient colonies kept without eggs and how long were they fed before giving the eggs? Were they kept long enough without the queen to avoid possible effects of trophic eggs, or too long so that their behavior changed?

      The recipient colonies were created 7 to 10 days before receiving the first larvae and were fed ad libitum with grass seeds, flies and honey water from the beginning. Trophic eggs that would have been left over from the source colony should have been eaten within the first few days after creating the recipient colonies. However, even if some trophic eggs would have remained, this would not influence our conclusion that trophic eggs influence caste fate, given the fully randomized nature of our treatments and the considerable number of independent replicates. The same applies to potential changes in worker behavior following their isolation from the queen.

      Line 77: Is it known at what stage caste determination occurs in this species? Here first instar larvae were given trophic eggs or not. Does caste-determination occur at the first instar stage? If not, what effect could providing trophic eggs at other stages have on caste-determination?

      A previous study showed that there is a maternal effect on caste determination in the focal species (Schwander et al 2008). The mechanism underlying this maternal effect was hypothesized to be differential maternal provisioning of viable eggs. However, as we detail in the discussion, the new data presented in our study suggests that the mechanism is in fact a different abundance of trophic eggs laid by queens. There is currently no information when exactly caste determination occurs during development

      COMMENTS ON RESULTS:

      Line 65: How does investigating the order of eggs laid help to "inform on the mechanisms of oogenesis"?

      We agree that the aim was not to study the mechanism of oogenesis. We have changed this sentence accordingly: “To assess whether viable and trophic eggs were laid in a random order, or whether eggs of a given type were laid in clusters, we isolated 11 queens for 10 hours, eight times over three weeks, and collected every hour the eggs laid”

      Figure 2: There is no description/discussion of data shown in panels B, C, E, and F in the main text.

      We have added information in the main text that while viable eggs showed embryonic development at 25 and 65 hours (Fig 12 B, C) there was no such development for trophic eggs (Fig. 2 E,F).

      Line 172: Please explain hibernation details and its significance on colony development/life cycle.

      We have added this information in the Material and Method section.

      Figure 6: How is B plotted? How could 0% of gynes have 100% survival?

      The survival is given for the larvae without considering caste. We have changed the de X axis of panel B and reworded the Figure legend to clarify this.

      Is reduced DNA content just an outcome of reduced cell number within trophic eggs, i.e., was this a difference in cell type or cell number? Or is it some other adaptive reason?

      It is likely to be due to a reduction in cell number (trophic eggs have maternal DNA in the chorion, while viable eggs have in addition the cells from the developing zygote) but we do not have data to make this point.

      Is there a logical sequence to the sequence of egg production? The authors showed that the sequence is non-random, but can they identify in what way? What would the biological significance be?

      We could not identify a logical sequence. Plausibly, the production of the two types of eggs implies some changes in the metabolic processes during egg production resulting in queens producing batches of either viable or trophic eggs. This would be an interesting question to study, but this is beyond the scope of this paper.

      Figure 6b is difficult to follow, and more generally, legends for all figures can be made clearer and more easy to follow.

      We agree. We have now improved the legends of Fig 6B and the other figures.

      Lines 172-174: "The percentage of eggs that were trophic was higher before hibernation...than after. This higher percentage was due to a reduced number of reproductive eggs, the number of trophic eggs laid remained stable" - are these data shown? It would be nice to see how the total egglaying rate changes after hibernation. Also, is the proportion of trophic eggs laid similar between individual queens?

      No the data were not shown and we do not have excellent data to make this point. We have therefore removed the sentence “This higher percentage was due to a reduced number of reproductive eggs, the number of trophic eggs laid remained stable” from the manuscript.

      Figure 6B: Do several colonies produce 100% gynes despite receiving trophic eggs? It would be interesting if the authors discussed why this might occur (e.g., the larvae are already fully determined to be queens and not responsive to whatever signal is in the trophic eggs).

      The reviewer is correct that 4 colonies produced 100% gynes despite receiving trophic eggs. However, the number of individuals produced in these four colonies was small (2,1,2,1, see supplementary Table 2). So, it is likely that it is just by chance that these colonies produced only gynes.

      Figure 5: Why a separation by "size distribution variation of miRNA"? What is the relevance of looking at size distributions as opposed to levels?

      We did that because there many different miRNA species, reflected by the fact that there is not just one size peak but multiple one. This is why we looked at size distribution

      Figure 2: The image of the viable embryo is not clear. If possible, redo the viable to show better quality images.

      Unfortunately, we do not anymore have colonies in the laboratory so this is not possible.

      COMMENTS ON DISCUSSION:

      Lines 236-247: Can an explanation be provided as to why the effect of trophic eggs in P. rugosus is the opposite of those observed by studies referenced in this section? Could P. rugosus have any life history traits that might explain this observation?

      In the two mentioned studies there were other factors that co-varied with variation in the quantity of trophic eggs. We mentioned that and suggested that it would be useful to conduct experimental manipulation of the quantity of trophic eggs in the Argentine ant and P. barbatus (the two species where an effect of trophic eggs had been suggested).

      The discussion should include implications and future research of the discovery.

      We made some suggestions of experiments that should be performed in the future

      The conclusion paragraph is too short and does not represent what was discussed.

      We added two sentences at the end of the paragraph to make suggestions of future studies that could be performed.

      Lines 231 to 247: Drastically reduce and move this whole part to the introduction to substantiate the assumption that trophic eggs play a nutritional role.

      We moved most of this paragraph to the introduction, as suggested by the reviewer.

      Reviewer #3 (Recommendations For The Authors):

      I would like to commend the authors on their study. The main findings of the paper are individually solid and provide novel insight into caste determination and the nature of trophic eggs. However, the inferences made from much of the data and connections between independent lines of evidence often extend too far and are unsubstantiated.

      We thank the reviewer for the positive comment. We made many changes in the manuscript to improve the discussion of our results.

    1. Author response:

      We thank the editors and the reviewers for their valuable comments. In response to these suggestions, we will add rigorous statistical measures and extend the experimental support of our findings in a revised version. Indeed, as we will show, doing so strengthens all the main claims. Specifically:

      Concerning Reviewer 1:

      - It is important to emphasise that the advantage of deriving shape measures q<sub>p</sub> from Minkowski tensors is their robustness and stability, that is well-established from extensive, rigorous mathematical analyses. Introducing q<sub>p</sub> without this connection to revised Minkowski tensors would not allow to claim this stability property for the considered measures.

      - Even though for a polygon the vertex positions contain the whole geometric information, using q<sub>p</sub> and γ<sub>p</sub> lead to different results, see Fig. 6 for an example.

      - We wholeheartedly agree that our statement on independence of values of q<sub>2</sub> and q<sub>6</sub> can be extended and more quantitatively established by rigorous statistical measures. This is exactly what we will do in the revised version, not only providing statistical measures on the presented data, but also extending our analyses to the published data from Armengol-Collado JM, Carenza LN, Eckert J, Krommydas D, Giomi L. Epithelia are multiscale active liquid crystals. Nature Physics. 2023; 19:1773–1779. As we shall show these analyses further strengthen this claim, unequivocally establishing the independence of q<sub>2</sub> and q<sub>6</sub> in two different models (active vertex model and multiphase-field model), as well as two different sets of experiments (the ones in the original manuscript, and the published one from Armengol-Collado JM, Carenza LN, Eckert J, Krommydas D, Giomi L. Epithelia are multiscale active liquid crystals. Nature Physics. 2023; 19:1773–1779).

      Concerning Reviewer 2:

      To fully address this point, we have extended our analyses to explore the published data of Armengol-Collado JM, Carenza LN, Eckert J, Krommydas D, Giomi L. Epithelia are multiscale active liquid crystals. Nature Physics. 2023; 19:1773–1779. As we shall show in the revised manuscript, the crossover between nematic and hexatic is only specific to the use of γ<sub>p</sub> for characterizing the shape and coarse-graining of the associated order. Using q<sub>p</sub> as the shape measure this crossover disappears. Therefore, this analyses concretely demonstrate that the crossover is not a robust physical feature of the system and is dependent on the method used to define shape characteristics.

      Concerning Reviewer 3:

      We respectfully note a misunderstanding from the referee: The briefly mentioned approaches of other groups, turn out to be not measuring shape but connections between cells. Conceptually these approaches are therefore related to bond order parameters. We already comment at the end of the section introducing Minkowski tensors that bond order parameters cannot quantify the shape of a cell. The same argumentation also holds for other such approaches. In our revised version we will further clarify this distinction, to avoid any confusion or misinterpretation.

    1. Author response:

      As a short response to the public reviews, we would like to outline the following planned revisions:

      (1) Address the antibody concerns as indicated by reviewer 1

      (2) Assess the role of tensin (and possibly KANK), as suggested by reviewers 2 and 3, respectively.

      (3) Validate our main experimental findings using alternative super-resolution approaches, including STED to avoid potential blinking artefacts associated to standard STORM, and most possibly DNA-PAINT as a more quantitative technique, as suggested by reviewer 3.

      (4) Implement alternative analytical strategies to DBSCAN, including Voronoi tessellation as suggested by reviewer 3.

      (5) Expanded discussion on the main findings of our work and biological significance.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In the manuscript entitled "Rtf1 HMD domain facilitates global histone H2B monoubiquitination and regulates morphogenesis and virulence in the meningitis-causing pathogen Cryptococcus neoformans" by Jiang et al., the authors employ a combination of molecular genetics and biochemical approaches, along with phenotypic evaluations and animal models, to identify the conserved subunit of the Paf1 complex (Paf1C), Rtf1, and functionally characterize its critical roles in mediating H2B monoubiquitination (H2Bub1) and the consequent regulation of gene expression, fungal development, and virulence traits in C. deneoformans or C. neoformans. Specially, the authors found that the histone modification domain (HMD) of Rtf1 is sufficient to promote H2B monoubiquitination (H2Bub1) and the expression of genes related to fungal mating and filamentation, and restores the fungal morphogenesis and pathogenicity defects caused by RTF1 deletion.

      Strengths:

      The manuscript is well-written and presents the findings in a clear manner. The findings are interesting and contribute to a better understanding of Rtf1-mediated epigenetic regulation of fungal morphogenesis and pathogenicity in a major human fungal pathogen, and potentially in other fungal species, as well.

      Weaknesses:

      A major limitation of this study is the absence of genome-wide information on Rtf1-mediated H2B monoubiquitination (H2Bub1), as well as a lack of detail regarding the function of the Plus3 domain. Although overexpression of HMD in the rtf1Δ mutant restored global H2Bub1 levels, it did not rescue certain critical biological functions, such as growth at 39 °C and melanin production (Figure 4C-D). This suggests that the precise positioning of H2Bub1 is essential for Rtf1's function. A comprehensive epigenetic landscape of H2Bub1 in the presence of HMD or full-length Rtf1 would elucidate potential mechanisms and shed light on the function of the Plus3 domain.

      We thank the reviewer (and other reviewers) for this excellent suggestion. We have conducted CUT&Tag assays with WT, _rtf1_Δ mutant, and complementary strains with the full length Rtf1 and only HMD domain cultured under 30 and 39 °C. We indeed found that the epigenetic landscape of H2Bub1 in the presence of HMD or full-length Rtf1 has variations. This results strongly suggest that the distribution of H2Bub1 is regulated by Rtf1, and H2B modifications at specific loci in the chromosome may contribute to thermal tolerance in C. neoformans. These new findings from CUT&Tag assays shed lights on understanding the mechanism of thermal tolerance, and we decided not to include these results in the current manuscript.

      Reviewer #2 (Public Review):

      Summary:

      The authors set out to determine the role of Rtf1 in Cryptococcal biology, and demonstrate that Rtf1 acts independently of the Paf1 complex to exert regulation of Histone H2B monoubiquitylation (H2Bub1). The biological impact of the loss of H2Bub1 was observed in defects in morphogenesis, reduced production of virulence factors, and reduced pathogenic potential in animal models of cryptococcal infection.

      Strengths:

      The molecular data is quite compelling, demonstrating that the Rtf1-depednent functions require only this histone modifying domain of Rtf1, and are dependent on nuclear localization. A specific point mutation in a residue conserved with the Rtf1 protein in the model yeast demonstrates the conservation of that residue in H2Bub1 modification. Interestingly, whereas expression of the HMD alone suppressed the virulence defect of the rtf1 deletion mutant, it did not suppress defects in virulence factor production.

      Weaknesses:

      The authors use two different species of Cryptococcus to investigate the biological effect of Rtf1 deletion. The work on morphogenesis utilized C. deneoformans, which is well-known to be a robust mating strain. The virulence work was performed in the C. neoformans H99 background, which is a highly pathogenic isolate. The study would be more complete if each of these processes were assessed in the other strain to understand if these biological effects are conserved across the two species of Cryptococcus. H99 is not as robust in morphogenesis, but reproducible results assessing mating and filamentation in this strain have been performed. Similarly, C. deneoformans does produce capsule and melanin.

      We thank the reviewer for the suggestion. We have conducted assays to quantify both capsule and melanin production in both C. neoformans and C. deneoformans strain background. We found that capsule production was affected in the same pattern in these two serotypes. Interestingly, we found the cell size was significantly affected by deletion of RTF1 in both serotypes. In addition, melanin production was reduced due to the deletion of RTF1 in both serotypes; However, complementation with Plus3 or mutated alleles of HMD gave different phenotypes in these two serotypes. These new findings were included Figure 4 in the revised manuscript.

      There are some concerns with the conclusions related to capsule induction. The images reported in Figure B are purported to be grown under capsule-inducing conditions, yet the H99 panel is not representative of the induced capsule for this strain. Given the lack of a baseline of induction, it is difficult to determine if any of the strains may be defective in capsule induction. Quantification of a population of cells with replicates will also help to visualize the capsular diversity in each strain population.

      We thank the reviewer for raising this concern. We have tested capsule production under capsule-inducing condition on 10% fetal bovine serum (FBS) agar medium [1]. Under this condition, the capsule layers surrounding the cells were obvious. We also included noncapsule-producing control in our assay to help the visualization of capsule. In addition, we quantified the ratio between diameters of capsule layer and cell body to show the capsular diversity in each strain population. The results were included in the Figure 4 in the revised manuscript.

      The authors demonstrate that for specific mating-related genes, the expression of the HMD recapitulated the wild-type expression pattern. The RNA-seq experiments were performed under mating conditions, suggesting specificity under this condition. The authors raise the point in the discussion that there may be differences in Rtf1 deposition on chromatin in H99, and under conditions of pathogenesis. The data that overexpression of HMD restores H2Bub1 by western is quite compelling, but does not address at which promoters H2Bub1 is modulating expression under pathogenesis conditions, and when full-length Rtf1 is present vs. only the HMD.

      We thank the reviewer for raising these concerns. Please see our response to Reviewer #1.

      Reviewer #3 (Public Review):

      Summary:

      In this very comprehensive study, the authors examine the effects of deletion and mutation of the Paf1C protein Rtf1 gene on chromatin structure, filamentation, and virulence in Cryptococcus.

      Strengths:

      The experiments are well presented and the interpretation of the data is convincing.

      Weaknesses:

      Yet, one can be frustrated by the lack of experiments that attempt to directly correlate the change in chromatin structure with the expression of a particular gene and the observed phenotype. For example, the authors observed a strong defect in the expression of ZNF2, a known regulator of filamentation, mating, and virulence, in the rtf1 mutant. Can this defect explain the observed phenotypes associated with the RTF1 mutation? Is the observed defect in melanin production associated with altered expression of laccase genes and altered chromatin structure at this locus?

      We completely agree with the reviewer. We have conducted CUT&Tag assay, and checked the Rtf1-mediated H2Bub1 at these particular gene loci. We found that the distribution of H2Bub1 at the promoter region of ZNF2 and the gene body of laccase-encoding gene varied possibly due to RTF1 mutation. We would like to save those preliminary findings for another story and not to include in this manuscript as we mentioned in the response to Reviewer #1.

      (1) Jang, E.-H., et al., Unraveling Capsule Biosynthesis and Signaling Networks in Cryptococcus neoformans. Microbiology Spectrum, 2022. 10(6): p. e02866-22.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors show for the first time that deleting GLS from rod photoreceptors results in the rapid death of these cells. The death of photoreceptor cells could result from loss of synaptic activity because of a decrease in glutamate, as has been shown in neurons, changes in redox balance, or nutrient deprivation.

      Strengths:

      The strength of this manuscript is that the author shows a similar phenotype in the mice when Gls was knocked out early in rod development or the adult rod. They showed that rapid cell death is through apoptosis, and there is an increase in the expression of genes responsive to oxidative stress.

      We thank the reviewer for their time reviewing the manuscript and their comments regarding the potential mechanism(s) by which rod photoreceptors rapidly degenerate upon knockout of GLS.

      Weaknesses:

      In this manuscript, the authors show a "metabolic dependency of photoreceptors on glutamine catabolism in vivo". However, there is a potential bias in their thinking that glutamine metabolism in rods is similar to cancer cells where it feeds into the TCA cycle. They should consider that as in neurons, GLS1 activity provides glutamate for synaptic transmission. The modest rescue shown by providing α-ketoglutarate in the drinking water suggests that glutamine isn't a key metabolic substrate for rods when glucose is plentiful. The ERG studies performed on the iCre-Glsflox/flox mice showed a large decrease in the scotopic b wave at saturating flashes which could indicate a decrease in glutamate at the rod synapse as stated by the authors. While EM micrographs of wt and iCre-Glsflox/flox mice were shown for the outer retina at p14, the synapse of the rods needs to be examined by EM.

      We agree with the reviewer that in the presence of sufficient glucose, it appears a lack of GLS-driven glutamine (Gln) catabolism does not drastically alter the levels of TCA cycle metabolites or mitochondrial function as we demonstrated in Figure 4, and supplementation with alpha-ketoglutarate improved outer nuclear layer thickness by only a small amount as observed in Figure 5e. Hence, as we stated in the Results and Discussion, at least in the mouse where Gls is selectively deleted from rod photoreceptors by crossing Gls<sup>fl/fl</sup> mice with Rho-Cre mice (Gls<sup>fl/fl</sup>; Rho-Cre<sup>+</sup>, cKO), Gln’s role in supporting the TCA cycle is not the major mechanism by which rod photoreceptors utilize Gln to suppress apoptosis.

      With regards to GLS-driven Gln catabolism providing glutamate (Glu) for synaptic transmission, we again agree with the reviewer that Glu is an important excitatory neurotransmitter, but it is also a key metabolite necessary for the synthesis of glutathione, amino acids, and proteins. As noted and discussed at length in the manuscript, a lack of GLS-driven Gln catabolism in rod photoreceptors leads to reduced levels of oxidized glutathione (Figure 4D) possibly signaling an overall reduction in the biosynthesis of glutathione as Glu is directly and indirectly responsible for its synthesis. Furthermore, Gln and GLS-derived Glu play a central role in the biosynthesis of several nonessential amino acids and proteins. To this end, we see a reduction in the level of Glu, which is the product of the GLS reaction and further confirms the loss of GLS function. We also noted a significant decrease in aspartate (Asp), which can be constructed from the carbons and nitrogens of Gln as discussed at length in the manuscript (Figure 6A). Finally, we noted a significant decrease in global protein synthesis in the cKO retina as compared to the wild-type animal as well (Figure 6E). Therefore, the data suggest that GLS-driven Gln catabolism is critical for amino acid metabolism and protein synthesis and to some degree redox balance; although, the small but statistically significant changes in oxidized glutathione, NADP/NADPH, and redox gene expression may not fully account for the rapid and complete photoreceptor degeneration observed. Future studies are necessary to shed light on the role of redox imbalance in this novel transgenic mouse model.

      Glu also plays a role in synaptic transmission, and we considered this scenario as described in Figure 1 – figure supplement 5. Here, the synaptic connectivity between photoreceptors and the inner retina did not demonstrate significant differences in the labeling of photoreceptor synaptic membranes in the outer plexiform layer nor alterations in the labeling of a key protein (Bassoon) in ribbon synapses. These data suggest that the synaptic connectivity between photoreceptors and second-order neurons was unaltered at P14 in the cKO retina, which is the time just prior to rapid photoreceptor degeneration when Glu was shown to be decreased (Figure 6A).

      With regards to the ERG changes noted in Figure 2, we agree with the reviewer that a large decrease was noted in the scotopic b-wave at P21 and P42 in the cKO. We also agree, that to obtain greater insight into these ERG changes, the ribbon synapse in EM images can be examined. The EM images shown in Figure 1 – figure supplement 4 are from P21, which coincide with the age at which the ERG changes were first noted and when significant photoreceptor degeneration has already occurred. These images were utilized to assess the ribbon synapse for the revised version of the manuscript. As now shown in Figure 1 – figure supplement 4D, ribbon synapses are intact in WT animals as denoted by the yellow boxes. Similarly, the ribbons (yellow arrows) appear structurally intact in the photoreceptors that remain in the P21 cKO retina. These results are in accordance with the lack of significant differences in the labeling of photoreceptor synaptic membranes in the outer plexiform layer as well as the lack of alterations in the labeling of a key protein (Bassoon) in ribbon synapses (Figure 1-figure supplement 5A and B).  While we cannot fully rule out that the decrease in glutamate is altering synaptic transmission, our structural data suggests the synapses remain intact. These data have been added to the revised manuscript.

      However, an even larger reduction in the scotopic a-wave was noted at these ages as well. In animal models that disrupt photoreceptor synaptic function (Dick et al. Neuron. 2003; Johnson et al. J Neuroscience. 2007; Haeseleer et al. Nature Neuroscience. 2004; Chang et al. Vis Neurosci. 2006), a more negative ERG pattern is typically observed with the b-wave altered to a much larger degree than the a-wave. Additionally, in these models that disrupt photoreceptor synaptic transmission, the overall structure of the retina with respect to thickness is maintained (Dick et al. Neuron. 2003) or noted to have modest changes in the outer plexiform layer within the first two months of age with the outer nuclear layer not significantly altered until 8-10 months of age (Haeseleer et al. Nature Neuroscience. 2004). In contrast, a rapid decline in the outer nuclear layer thickness was observed in the cKO retina after P14 likely contributing to the ERG changes noted in Figure 2. Also, Gln is catabolized to Glu primarily by GLS as suggested by the approximately 50% reduction in Glu levels in the cKO retina (Figure 6A), but other enzymes are also capable of catabolizing Gln to Glu, so Glu levels in the rod photoreceptors are unlikely to be zero. Coupling this with the fact that rods are equipped with a self-sufficient Glu recollecting system at their synaptic terminals (Hasegawa et al. Neuron. 2006; Winkler et al. Vis Neurosci. 1999) and that GLS activity is at least two-fold higher in the photoreceptor inner segments, which support energy production and metabolism, than any other layer in the retina (Ross et al. Brain Res. 1987) suggests that altered synaptic transmission secondary to reduced levels of Glu likely does not account in full for the rapid and robust photoreceptor degeneration observed in the cKO retina.

      The authors note that the outer segments are shorter but they do not address whether there is a decrease in the number of cones.

      We have adjusted Figure 2E by removing the GLS staining to better highlight the secondary degeneration of cone outer segments, the main point of the Figure, as we had already shown that GLS was cleanly knocked out of rod photoreceptors in Figure 1. Furthermore, qualitatively the number of cones appears the same at P14, P21, and P42 between the WT and cKO, which is consistent with other retinal degeneration models, like rd1 and rd10, where cones do not begin to die until all the rods have degenerated (Xue et al. eLife. 2021).

      Rod-specific Gls ko mice with an inducible promoter were generated by crossing the Pde6g-CreERT2 and homozygous for either the WT or floxed Gls allele (IND-cKO). In Figure 3 the authors document that by western blots and antibody labeling the GLS1 expression is lost in the IND-cKO 10 days post tamoxifen. OCT images show a decrease in the thickness of the outer nuclear layer between 17 and 38 days post-TAM. Ergs should be performed on the animals at 10 and 30 days post TAM, before and after major structural changes in rod photoreceptor cells, to determine if changes in light-stimulated responses are observed. These studies could help to parse out the cause of photoreceptor cell death.

      We agree with the reviewer that the IND-cKO is a useful tool to help parse out the cause of photoreceptor cell death in this model as well as shed light on the role of GLS-driven Gln catabolism in photoreceptor synaptic transmission as discussed at length above. Hence, ERG analyses were performed 10 days post TAM, before major structural changes in the ONL are observed. Interestingly, ERG demonstrated statistically significant reductions in the IND-cKO scotopic a- and b-waves as compared to the WT 10 days post TAM. Similarly, photopic ERG demonstrated statistically significant decreases in the b-wave of the IND-cKO retina. These data suggest that GLS-driven Gln catabolism plays a significant role not only in rod photoreceptor survival but their function as well. This data has been added to Figure 3H-I and discussed in the corresponding manuscript text.

      To this end, as discussed below and added to Figure 6 – figure supplement 1, amino acid levels, including glutamate (Glu), are already reduced 10 days post TAM. Reductions in the level of Glu may impact synaptic transmission and as a result, the scotopic b-wave. However, as noted above, altered synaptic transmission secondary to reduced levels of Glu likely does not account in full for the rapid and robust photoreceptor degeneration observed in the cKO retina as the b-wave to a-wave ratio is not significantly altered in the IND-cKO retina as compared to the WT retina, suggesting GLS-driven Gln catabolism is impairing both to a similar degree.

      Additionally, Pde6g is expressed by rods to a significant degree but also by cones (GSE63473, scRNAseq data). Therefore, the IND-cKO mouse likely knocks out GLS from both rods and cones, which is in accordance with the immunofluorescence image in Figure 3B where GLS is not observed in rod or cone inner segments unlike in Figure 1B where GLS remains in cones. Hence, the reduction in photopic b-wave may be demonstrating that GLS-driven Gln catabolism in cones impairs synaptic transmission. As noted in our reply to reviewer #3’s comments, we have generated mice lacking GLS in cone photoreceptors specifically and are currently elucidating the role of GLS in cone photoreceptor metabolism, function, and survival. These results will be published in a separate manuscript.

      The studies in Figure 4 were all performed on iCre-Glsflox/flox and control mice at p14, why weren't the IND-cKO mice used for these studies since the findings would not be confounded by development?

      To gain further insight into the role of GLS-driven Gln catabolism in the maintenance of rod photoreceptors as compared to their development/maturation, we conducted a targeted metabolomic analysis on IND-cKO and WT retinas 10 days post TAM. For the purpose of this manuscript, we have included data regarding changes in amino acid levels in Figure 6 – figure supplement 1. Specifically, levels of glutamate, aspartate and asparagine are all significantly decreased in the IND-cKO retina prior to PR degeneration, which demonstrates that similar to the GLS cKO mouse (i.e. iCre-Gls flox/flox), GLS-driven Gln catabolism is critical for amino acid biosynthesis in mature rod PRs as well.

      In all rescue studies, the endpoint was an ONL thickness, which only addressed rod cell death. The authors should also determine whether there are small improvements in the ERG, which would distinguish the role of GLS in preventing oxidative stress.

      Optical coherence tomography (OCT) provides a sensitive in vivo method to detect small changes in retinal thickness without potential artifacts incurred through histological processing. Considering the Gls cKO retina demonstrates significant and rapid photoreceptor degeneration, we wanted to assess pathways that may be critical to photoreceptor survival downstream of GLS-driven Gln catabolism using rescue experiments with pharmacologic treatment or metabolite supplementation. That said, disruption of GLS-driven Gln catabolism may also significantly alter rod photoreceptor function beyond that which is secondary to photoreceptor cell death as we have demonstrated in the IND-cKO animal for the revised version of this manuscript and discussed in a response above. Therefore, the IND-cKO model provides a unique tool to assess the impact of rescue studies on photoreceptor function as the functional changes occur prior to significant degeneration. Also, unlike the GLS cKO mouse (i.e. iCre-Gls flox/flox) where photoreceptor degeneration starts very early, impairing our ability to capture reliable and robust ERG measurements, the IND-cKO mice are older at the time of functional changes allowing for robust ERG measurements. While the rate of photoreceptor degeneration in both mouse models is similar and the levels of key amino acids are altered similarly in both models, the mechanisms of cell death in developing/maturing photoreceptors may be different than that in mature photoreceptors. Hence, before we can assess if similar rescue experiments impact photoreceptor function via ERG in the IND-cKO mouse, we need to thoroughly examine how these photoreceptors are dying. These experiments and results will be published in a separate manuscript in the future.

      Reviewer #2 (Public Review):

      Summary:

      Photoreceptor neurons are crucial for vision, and discovering pathways necessary for photoreceptor health and survival can open new avenues for therapeutics. Studies have shown that metabolic dysfunction can cause photoreceptor degeneration and vision loss, but the metabolic pathways maintaining photoreceptor health are not well understood. This is a fundamental study that shows that glutamine catabolism is critical for photoreceptor cell health using in vivo model systems.

      Strengths:

      The data are compelling, and the consideration of potential confounding factors (such as glutaminase 2 expression) and additional experiments to examine the synaptic connectivity and inner retina added strength to this work. The authors were also careful not to overstate their claims, but to provide solid conclusions that fit the results and data provided in their study. The findings linking asparagine supplementation and the inhibition of the integrated stress response to glutamine catabolism within the rod photoreceptor cell are intriguing and innovative. Overall, the authors provide convincing data to highlight that photoreceptors utilize various fuel sources to meet their metabolic needs, and that glutamine is critical to these cells for their biomass, redox balance, function, and survival.

      We greatly appreciate the reviewer’s thoughtful comments and time spent reviewing this manuscript.

      Weaknesses:

      Recent studies have explored the metabolic "crosstalk" that exists within the mammalian retina, where metabolites are transferred between the various retinal cells and the retinal pigment epithelium. It would be of interest to test whether the conditional knockout mice have changes in metabolism (via qPCR such as shown in Figure 4 - Supplemental Figure 1) within the retinal pigment epithelium that may be contributing to the authors' findings in the neural retina. Additionally, the authors have very compelling data to show that inhibition of eIF2a or supplementation with asparagine can delay photoreceptor death via OCT measurements in their conditional knockout mouse model (Figure 6G, H). However, does inhibition of eIF2a or asparagine adversely impact the WT retina? It would also be impactful to know whether this has a prolonged effect, or if it is short-term, as this would provide strength to potential therapeutic targeting of these pathways to maintain photoreceptor health.

      We agree with the reviewer that metabolic communication in the outer retina is crucial to the function and survival of both photoreceptors and RPE. Therefore, we have performed qRT-PCR on eyecups from cKO and WT mice at P14, prior to photoreceptor degeneration. These data, now included in Figure 4 – figure supplement 2, show no significant changes in genes related to glycolysis, pyruvate metabolism and the TCA cycle in eyecups from cKO mice compared to WT mice at P14. The only exception is a significant decrease in Pdk4 in cKO mouse eyecups compared to WT, which was not observed in retina samples.

      Additionally, we have added data demonstrating that systemic treatment with ISRIB does not adversely impact the anatomy of the wild-type retina. Specifically, we performed OCT after 21 days of ISRIB treatment via intraperitoneal delivery in WT mice and show that total retinal, ONL and inner segment/outer segment thickness is unchanged compared to vehicle. These data are now included in Figure 6 – figure supplement 2A. We have also included data to suggest that the effect of ISRIB extends beyond P21 in the cKO mouse. This data, presented in Figure 6 – figure supplement 2B, shows that at P28, ISRIB continues to statistically significantly increase ONL thickness compared to vehicle in cKO animals.

      Reviewer #3 (Public Review):

      Summary:

      The authors explored the role of GLS, a glutaminase, which is an enzyme that catalyzes the conversion of glutamine to glutamate, in rod photoreceptor function and survival. The loss of GLS was found to cause rapid autonomous death of rod photoreceptors.

      Strengths:

      Interesting and novel phenotype. Two types of cre-lines were rigorously used to knockout the Gls gene in rods. Both of the conditional knockouts led to a similar phenotype, i.e. rod death. Histology and ERG were carefully done to characterize the loss of rods over specific ages. A necessary metabolomic study was performed and appreciated. Some rescue experiments were performed and revealed possible mechanisms.

      We thank the reviewer for their comments and appreciation of the methods utilized herein to address the role of GLS-driven Gln catabolism in rod photoreceptors.

      Weaknesses:

      No major weaknesses were identified. The mechanism of GLS-loss-induced rod death seems not fully elucidated by this study but could be followed up in the future, and the same for GLS's role in cones.

      We agree with the reviewer that the downstream metabolic and molecular mechanisms by which Gln catabolism impacts rod photoreceptor health are not fully elucidated. Defining these mechanisms will advance our understanding of photoreceptor metabolism and identify therapeutic targets promoting photoreceptor resistance to stress. Future studies are underway to uncover these mechanisms. Additionally, while outside the scope of the current manuscript, we have generated mice lacking GLS in cone photoreceptors specifically and are currently elucidating the role of GLS in cone photoreceptor metabolism, function, and survival. These results will be published in a separate manuscript.

      Reviewer #1 (Recommendations For The Authors):

      (1) The results could start at line 135, but the first paragraph isn't necessary. The data is published and could be referred to in the introduction.

      We appreciate the reviewer’s suggestion to shorten the beginning of the Results section; however, we believe the supplementary data, which is described in these lines, confirms the scRNAseq gene expression data, while adding GLS expression and localization data within the retina. The scRNAseq data and its publication was noted in the introduction, so we removed the sentence in line 117-119 that restates these results to shorten this section. We also reduced redundancy by removing an introductory sentence to the second Results paragraph.

      (2) "However, like other metabolically-demanding cells, recent work has demonstrated that PRs have the flexibility to utilize fuel sources beyond glucose to meet their metabolic needs (Adler et al., 2014; Du, Cleghorn, Contreras, Linton, et al., 2013; Grenell et al., 2019; Joyal et al., 2016; Xu et al., 2020)." The paper by Daniele et al. demonstrated that glucose is essential for maintaining the viability of rod photoreceptor cells.

      We thank the reviewer for highlighting published literature, which we apologetically overlooked. The reference for Daniele et al. has now been included.

      (3) "Single-cell RNA sequencing data has demonstrated that Gls is expressed throughout the human and mouse retina and much greater than Gls2 (Voigt et al., 2020). The authors should indicate the specific databases searched in Spectacle.

      We appreciate the reviewer’s attention to detail and have now included the references in the Introduction for GSE63473 from Macosko et al. and GSE142449 from Voigt et al., which were the databases we used in Spectacle to assess Gls levels in the mouse and human retina, respectively.

      References:

      (1) Macosko EZ, Basu A, Satija R, Nemesh J, Shekhar K, Goldman M, Tirosh I, Bialas AR, Kamitaki N, Martersteck EM, Trombetta JJ, Weitz DA, Sanes JR, Shalek AK, Regev A, McCarroll SA. Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets. Cell. 2015 May 21;161(5):1202-1214. doi: 10.1016/j.cell.2015.05.002. PMID: 26000488; PMCID: PMC4481139.

      (2) Voigt AP, Binkley E, Flamme-Wiese MJ, Zeng S, DeLuca AP, Scheetz TE, Tucker BA, Mullins RF, Stone EM. Single-Cell RNA Sequencing in Human Retinal Degeneration Reveals Distinct Glial Cell Populations. Cells. 2020 Feb 13;9(2):438. doi: 10.3390/cells9020438. PMID: 32069977; PMCID: PMC7072666.

      (4) The immunolabeling in Figure 2 looks like the images are overexposed, and the Gls antibody is labeling the outer segment, not just the inner segment of photoreceptors.

      We thank the reviewer for their comments regarding our immunofluorescence data. There was background staining of the outer segment in both the WT and cKO retina with decreased GLS staining in the inner segment of the cKO rod photoreceptors at P14 demonstrating loss of GLS in rod photoreceptors similar to Figure 1B.  For Figure 2E, we have provided adjusted images with PNA staining only that better represent the secondary cone degeneration that occurs in the rod photoreceptor-specific Gls cKO, which is the take home point of Figure 2E.

      (5) The authors could use a glutamate antibody to compare it to Gls KO mice as done in Davanger, S., Ottersen, O.P. and Storm-Mathisen, J. (1991), Glutamate, GABA, and glycine in the human retina: An immunocytochemical investigation. J. Comp. Neurol., 311: 483-494. https://doi.org/10.1002/cne.903110404

      We appreciate the reviewer’s suggestion to assess glutamate levels in the wild-type and Gls KO retina via antibody labeling. Our targeted metabolomics studies in Figure 6A provide quantitative evidence that glutamate, the product of the GLS-catalyzed reaction, is decreased as one would expect in that Gls KO retina. The antibody would add to these data by providing the localization of glutamate in the retina. With a rod photoreceptor-specific genetic KO, we would expect glutamate levels to be decreased in these cells. The antibody may also show that glutamate is not only decreased in the rod photoreceptor inner segment, where GLS predominates, but also in the synaptic terminal in accordance with the reviewer’s concerns regarding the impact of GLS KO on synaptic transmission. We have addressed this concern at length above, adding TEM images of the ribbon synapses in the GLS KO retina, and ERG analyses from the IND-cKO animals prior to significant degeneration. In the end, we agree with the reviewer that reduced Glu levels in the GLS cKO retina may impact synaptic transmission to a degree, but the synapses remain intact based on immunofluorescence and TEM analyses and a negative ERG pattern is not observed in the GLS cKO (i.e. iCre-Gls flox/flox) or IND-cKO mouse. As noted above, the structure of the retina in models that disrupt photoreceptor synaptic transmission is maintained (Dick et al. Neuron. 2003) or noted to have modest changes within the first two months of age with the outer nuclear layer not significantly altered until 8-10 months of age (Haeseleer et al. Nature Neuroscience. 2004). So, the impact of the reduced Glu levels on synaptic transmission in the GLS KO retina are unlikely to account in full for the rapid and profound photoreceptor degeneration observed. That said, the IND-cKO mouse, which allows us to assess photoreceptor function prior to significant degeneration unlike the GLS cKO mouse (i.e. iCre-Gls flox/flox), demonstrates GLS-driven Gln catabolism plays a significant role in photoreceptor function but still does not demonstrate a negative ERG pattern. Therefore, assessing Glu localization in this mouse model 10 days post TAM will be informative as to how GLS-driven Gln catabolism impacts photoreceptor function prior to degeneration. The IND-cKO mouse model is currently being extensively characterized for future publication.

      Reviewer #2 (Recommendations For The Authors):

      Main Concerns:

      (1) The authors checked for Gls2 compensation at P14 in the mouse retina. However, this data would be more compelling with an additional timepoint, particularly at P21 which is used in many of their figures throughout the study.

      We thank the reviewer for their suggestion. Figure 1-figure supplement 1D demonstrates no change in Gls2 gene expression at P14 between the WT and cKO retina. With regards to the reviewer’s concern, in Figure 1-figure supplement 1E of the original submission, we demonstrate that the expression of GLS2 is not increased in the cKO retina at P21 via immunofluorescence.

      (2) Recent studies have explored the metabolic "crosstalk" that exists within the mammalian retina, where metabolites are transferred between the various retinal cells and the retinal pigment epithelium. It would be compelling to see whether the cKO mice have changes in metabolism (via qPCR such as shown in Supplementary Figure 1 for Figure 4) within the RPE that may be contributing to their findings in the neural retina. Additionally, mention of this crosstalk and how it may impact their results should be added to the discussion.

      We appreciate the reviewer’s concern for metabolism changes in the RPE of Gls cKO mice. In agreement with reviewer 2, we performed qRT-PCR on eyecups from cKO and WT mice at P14, prior to photoreceptor degeneration. These data, now included in Figure 4 – figure supplement 2, show no significant changes in genes related to glycolysis, pyruvate metabolism and the TCA cycle in eyecups from cKO mice compared to WT mice at P14. The only exception is a significant decrease in Pdk4 in cKO mouse eyecups compared to WT, which was not observed in retina samples.

      (3) The authors use a tamoxifen-inducible cKO model to support their findings in developed rods. However, in Figure 3A it appears that this model has a greater reduction in GLS compared to the Rho-cre mouse model. Can the authors discuss this? Is this cre more efficient at targeting rods or is it leaky and may have affected other retinal cells?

      We thank the reviewer for pointing out this interesting result associated with using the Pde6g-Cre-ERT2 mouse line. Pde6g is expressed by rods to a significant degree but also by cones (GSE63473, scRNAseq data). Therefore, the IND-cKO mouse likely knocks out GLS from both rods and cones upon the TAM induction. To this end, the immunofluorescence image in Figure 3B shows GLS is knocked out in both rod or cone inner segments unlike in Figure 1B where GLS remains in cones when using the rod photoreceptor-specific, Gls<sup>fl/fl</sup> Rho-Cre<sup>+</sup> mouse. As such, as the astute reviewer noted, the fact that Western blot demonstrates greater reduction in GLS protein content fits with the protein being knocked out of both rods and cones. We have added this note about the mouse model in the corresponding text.

      (4) The authors have very compelling data to show that inhibition of eIF2a can delay photoreceptor death via OCT measurements in their cKO mouse model (Figure 6G). However, does ISRIB adversely impact the WT retina? WT vehicle and ISRIB should be shown. It would also be compelling to know whether this has a prolonged effect, or if it is short-term (i.e. would the effect still be present at P42)?

      We appreciate the reviewer’s comments regarding antagonizing the effects of p-eIF2a to prolong photoreceptor survival in the Gls cKO retina. As described above, we have data demonstrating systemic treatment with ISRIB does not adversely impact the anatomy of the wild-type retina (Figure 6-figure supplement 2A). Specifically, we treated WT animals with daily intraperitoneal ISRIB starting at P5 and performed OCT at P21 to show that total retinal, ONL and the inner segment/outer segment thickness is unchanged compared to vehicle-treated WT animals. Additionally, we have included data demonstrating the photoreceptor neuroprotective effect of ISRIB treatment in the Gls cKO mouse extends beyond P21 in the cKO mouse (Figure 6-figure supplement 2B).

      (5) For Figure 6H, same as point #4.

      While we have not specifically assessed potential retinal toxicity secondary to systemic Asn supplementation, oral Asn supplementation (up to 100mg/kg/day) was provided to patients for 24 months and found to be well-tolerated (PMID:31123592). Allometric scaling of this dose to the mouse would yield a mouse dose of 1234 mg/kg/day, which is much greater than the 200mg/kg/day dose provided here (PMID: 27057123). Additionally, a 90-day toxicity study of Asn in rats demonstrated a no observed adverse effect level of 1.62g/kg bodyweight/day in males and 1.73g/kg bodyweight/day in females (PMID: 18508175). The lower dose in that study equates to a mouse dose of 3.2g/kg bodyweight/day, well above the mouse dose utilized in this report. As such, future studies should focus on a dose-response relationship with Asn supplementation, and as the reviewer suggested, determining the duration of effect with Asn supplementation.

      (6) Some of the results section belongs in the introduction or discussion and can be moved.

      We have addressed the reviewer’s concern by moving some of the results to the discussion and removing statements in the results that were either noted in the Introduction or conferred in the Discussion.

      Minor Concerns:

      (1) Scale bar mentions in the figure legends use plural when only one is present, or in some cases are missing. A scale bar should be added to the OCT images if possible.

      We appreciate the reviewer’s attention to detail, and information regarding scale bars has been updated in the figure legends.

      (2) For Figures 1I and J, the sample size changes when J is a quantification of I. Please correct.

      We have corrected the sample size to be consistent between Figures 1I and J.

      (3) In Figure 1 - Figure Supplement 3 the P42 timepoint is not mentioned in the legend. Please correct.

      We have now included the P42 timepoint in the legend for in Figure 1 – Figure Supplement 3 as well as the manuscript text.

      (4) In Figure 1 - Figure Supplement 5 the wrong P value is mentioned in the legend. Please correct.

      We have corrected the P value in the legend for Figure 1 – Figure Supplement 5.

      (5) Can the authors double-check their ERG light intensity settings? They seem high. Please confirm if they are correct.

      We appreciate the reviewer’s concern for ERG light intensity settings and have confirmed the settings used in the study were 32 cd*s/m<sup>2</sup> and 100 cd*s/m<sup>2</sup> for scotopic and photopic ERG recordings, respectively.

      (6) The legend key in Figure 2A would be more helpful if the axis were present by the representative traces.

      We thank the reviewer for the suggestion of adding axes to the ERG traces. Figure 2A has been updated to reflect this modification.

      (7) Can the authors check that the error bars are present in Figure 5E?

      We appreciate the reviewer’s concern for error bars in Figure 5E, which are included in the figure. The standard error in this experiment is so small that the symbols overlap with the error bars.

      Reviewer #3 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data, or analyses.

      (1) Figure 6: ISRIB seems to give the most dramatic rescue of cKO GLS in P21 rods. Does it completely prevent rod death? i.e. What's the ONL thickness of P21 WT control? What's the ISRIB rescue of an older cKO animal, say P35?

      The ONL thickness of P21 WT control is on average 0.06 mm (Figure 1E), while the ONL thickness of the Gls cKO retina with ISRIB treatment at P21 is on average 0.044 mm. Therefore, rod death is not completely prevented with ISRIB but rather, rod photoreceptor survival is prolonged. As noted above, we have provided data to demonstrate that the photoreceptor neuroprotective effect of ISRIB lasts beyond P21 (Figure 6-figure supplement 2B).

      (2) What's the mechanistic link between ISR and GLS beyond current speculation? Does GLS have other unknown functions beyond converting glutamine to glutamate? Any novel insights from GLS protein structure?

      We thank the reviewer for this thoughtful question. It is certainly possible that GLS has other functions outside of its role in glutaminolysis. It is well known that other metabolic enzymes have moonlighting functions including hexokinase 2, which has been shown to be important in preventing intrinsic apoptosis through blocking the binding of pro-apoptotic proteins to the mitochondria. While not directly related to ISR, a single report suggests GLS functions non-canonically in Gln-deprived states, promoting mitochondrial fusion to suppress ROS production (PMID: 29934617). Investigating the moonlighting functions of metabolic enzymes is part of our ongoing research program and GLS is included in these studies.

      (3) Just curious about GLS cKO in cones. Any similar phenotype?

      We appreciate the reviewer’s curiosity regarding Gls cKO in cones and this study is currently ongoing with a poster presented at ARVO 2024 (Subramanya et al; Glutaminase-driven glutamine catabolism supports cone photoreceptor metabolism, function, and structure. Invest. Ophthalmol. Vis. Sci. 2024;65(7):193) and a manuscript in preparation. As discussed above, GLS knock out in cones likely impacts their function, in accordance with the data presented at ARVO 2024.

      Recommendations for improving the writing and presentation.

      (1) In the Discussion, lines 458-466, it's incorrect to compare the importance of glucose metabolism to GLS-dependent pathway to photoreceptors in this way. An alternative explanation: glucose metabolism is so important that the system has many redundancies, e.g. HK1 exists in addition to HK2, thus single gene KO leads to no phenotype. The only fair comparison is nutrient deprivation, e.g. taking out glucose or glutamine from retina explants (Punzo et al., 2009).

      The reviewer makes an excellent point. While we do not see an upregulation of GLS2 in the retina or rod PRs upon GLS knockout (Figure 1-figure supplement 1 D and E), loss of Gls in rod PRs does alter the expression of many metabolism-related genes (Figure 4-figure supplement 1).  We alluded to these data and the reviewer’s point in the second paragraph of the discussion: “In any of these transgenic mouse models, PRs may use other transporters to take up fatty acids or glucose or rewire their metabolism to maintain metabolic homeostasis and stave off degeneration (Subramanya et al., 2023; Wubben et al., 2017). Our data show that any metabolic reprogramming that is occurring in the cKO mouse retina appears unable to significantly circumvent the significant and rapid PR degeneration suggesting the importance of Gln catabolism in rod PRs. Furthermore, inducing GLS knockdown in mature PRs also demonstrated rapid PR degeneration (Figure 3).”

      In the revised article, we have amended these sentences to include the importance of metabolic redundancies. “In any of these transgenic mouse models, PRs may use other transporters to take up fatty acids or glucose, rewire their metabolism, or utilize metabolic redundancies to maintain metabolic homeostasis and stave off degeneration (Subramanya et al., 2023; Wubben et al., 2017). Our data show that any metabolic reprogramming that is occurring in the cKO mouse retina appears unable to significantly circumvent the significant and rapid PR degeneration suggesting the importance of Gln catabolism in rod PRs. Furthermore, inducing GLS knockdown in mature PRs also demonstrated rapid PR degeneration (Figure 3).”

      (2) Please discuss the mosaic activity of Rho-cre used in this study, as described in the original study (Le et al 2006). Line 221 (Li et al 2005) seems to be a different Rho-Cre created by a different group. Please make sure the citation is correct and consistent.

      We apologize for the confusion and have corrected the reference on line 221 to Le et al, 2006. The reviewer is correct that the original report (Le at al. 2006) demonstrated a mosaic of Cre-mediated recombination in rod photoreceptors and rod bipolar cells in the mouse line that had the shorter (0.2 kb) mouse opsin promoter-controlled Cre. In contrast, this same report showed only Cre-mediated recombination in rod photoreceptors in another line that utilized a long (4.1 kb) mouse opsin promoter-controlled Cre. We have published using this latter promoter-controlled Cre recombinase in at least 5 different mouse models (Wubben et al. 2017; Weh et al. 2020; Weh et al. 2023; Subramanya et al. 2023; the current report), and in all these models, we observe clear and consistent knockout by immunofluorescence only in rod photoreceptors with residual protein in cones and no significant change in protein expression in the INL where bipolar cells reside. Western blots confirm the reduction in protein expression.

      (3) The authors should provide representative images of retina cross-sections for key rescue data (Figure 6G&H).

      As requested by Reviewer 3, representative histology images of retina cross-sections for the ISRIB and Asn rescue experiments in Gls cKO mice at P21 are now included in the manuscript in Figure 6 – figure supplement 3.

      Minor corrections to the text and figures.

      (1) Spell out Gln in the Abstract when used for the first time.

      We have included glutamine (Gln) in the abstract upon first use.

      (2) Line 433, Figure 6G should be 6H.

      Thank you for the correction, the manuscript has been updated.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      The study aimed to investigate the significant impact of criterion placement on the validity of neural measures of consciousness, examining how different standards for classifying a stimulus as 'seen' or 'unseen' can influence the interpretation of neural data. They conducted simulations and EEG experiments to demonstrate that the Perceptual Awareness Scale, a widely used tool in consciousness research, may not effectively mitigate criterion-related confounds, suggesting that even with the PAS, neural measures can be compromised by how criteria are set. Their study challenged existing paradigms by showing that the construct validity of neural measures of conscious and unconscious processing is threatened by criterion placement, and they provided practical recommendations for improving experimental designs in the field. The authors' work contributes to a deeper understanding of the nature of conscious and unconscious processing and addresses methodological concerns by exploring the pervasive influence of criterion placement on neural measures of consciousness and discussing alternative paradigms that might offer solutions to the criterion problem.

      The study effectively demonstrates that the placement of criteria for determining whether a stimulus is 'seen' or 'unseen' significantly impacts the validity of neural measures of consciousness. The authors found that conservative criteria tend to inflate effect sizes, while liberal criteria reduce them, leading to potentially misleading conclusions about conscious and unconscious processing. The authors employed robust simulations and EEG experiments to demonstrate the effects of criterion placement, ensuring that the findings are well-supported by empirical evidence. The results from both experiments confirm the predicted confounding effects of criterion placement on neural measures of unconscious and conscious processing.

      The results are consistent with their hypotheses and contribute meaningfully to the field of consciousness research.

      We would like to thank reviewer 1 for their positive words and for taking the time to evaluate our manuscript.

      Reviewer #2 (Public review):

      Summary:

      The study investigates the potential influence of the response criterion on neural decoding accuracy in consciousness and unconsciousness, utilizing either simulated data or reanalyzing experimental data with post-hoc sorting data.

      Strengths:

      When comparing the neural decoding performance of Target versus NonTarget with or without post-hoc sorting based on subject reports, it is evident that response criterion can influence the results. This was observed in simulated data as well as in two experiments that manipulated subject response criterion to be either more liberal or more conservative. One experiment involved a two-level response (seen vs unseen), while the other included a more detailed four-level response (ranging from 0 for no experience to 3 for a clear experience). The findings consistently indicated that adopting a more conservative response criterion could enhance neural decoding performance, whether in conscious or unconscious states, depending on the sensitivity or overall response threshold.

      Weaknesses:

      (1) In the realm of research methodology, conducting post-hoc sorting based on subject reports raises an issue. This operation leads to an imbalance in the number of trials between the two conditions (Target and NonTarget) during the decoding process. Such trial number disparity introduces bias during decoding, likely contributing to fluctuations in neural decoding performance. This potential confounding factor significantly impacts the interpretation of research findings. The trial number imbalance may cause models to exhibit a bias towards the category with more trials during the learning process, leading to misjudgments of neural signal differences between the two conditions and failing to accurately reflect the distinctions in brain neural activity between target and non-target states. Therefore, it is recommended that the authors extensively discuss this confounding factor in their paper. They should analyze in detail how this factor could influence the interpretation of results, such as potentially exaggerating or diminishing certain effects, and whether measures are necessary to correct the bias induced by this imbalance to ensure the reliability and validity of the research conclusions.

      We would like to thank reviewer 2 for their positive words and for taking the time to evaluate our manuscript. In response to this asserted weakness, we would like to point out that the issue of trial imbalances was already comprehensively addressed in the manuscript. No trial imbalances are present in the analyzed data for any of the conditions, so that none of our reported results could have been impacted by this. This was done through the following set of measures:

      (1) Training data (method section): “a linear discriminant analytic (LDA) classifier was trained for each participant using all trials from all sessions (3 sessions in Experiment 1, 2 sessions in Experiment 2) to discriminate target from no-target trials based on EEG data, irrespective of seen/unseen responses and irrespective of the response criterion. To maximize signal-to-noise ratio, we applied a leave-one-person-out cross validated decoding scheme by using all classifiers from all participants except the participants that was being tested (separately for Experiment 1 and for Experiment 2). This leave-one-person-outcross validation procedure maximized the available data for training without requiring k-foldingon subsets of cells with low response counts, so that all test sets were classified by the same fully independent classifiers. A single time series of classification performance across time was obtained for every participant (every testing set) by averaging classification performance across all classifiers that tested that set (see Methods and supplementary Figure S2 for details).”<br /> This leave-one-person-outcross validation scheme made surre that no trial selection needed to be performed to analyze conservative or liberal conditions. Both conditions were classified using the same classifier, consisting of all data from the other participants.

      (2) Testing data (methods section): “To ensure that differences resulting from post hoc sorting could not be explained by differences in signal-to-noise ratio resulting from disparities in trial counts in the testing set, we equated trial counts between the liberal and conservative condition within each participant by randomly selecting the same number of trials from overrepresented cells (for Experiment 1, this was done at the level of ‘seen’ and ‘unseen’ responses, for experiment 2 the trial counts were equated at eachof the PAS levels, see methods for details). As a result, response-contingent conditions in the liberal and conservative conditions had identical input for all classification analyses. Although different trial counts in the testing set might affect the precision with which AUC is estimated in a decoding analysis, it does not affect the size of AUC itself. Trial count equation was merely performed tomake sure the liberal and conservative condition were as comparable as possible.”

      Indeed, we also report at the end of this section that running the same analyses without selecting trials in the test set yielded qualitatively identical results: “Analyzing the data without equating trial counts resulted in qualitatively identical results.”

      To remove any lack of clarity about this, we now also briefly report in the beginning of the discussion section that the results cannot be explained by unequal trial counts:

      “We found that in both experiments, criterion shifts modulated effect size in neural measures of ‘unconscious’ (unseen) and/or ‘conscious’ (seen) processing, and that this happens even though the conservative and liberal condition used the same independent training data (identical classifiers), and even though the trial counts in the test sets were equated for the conservative and liberal condition.”

      Reviewer #3 (Public review):

      Summary:

      Fahrenfort et al. investigate how liberal or conservative criterion placement in a detection task affects the construct validity of neural measures of unconscious cognition and conscious processing. Participants identified instances of "seen" or "unseen" in a detection task, a method known as post hoc sorting. Simulation data convincingly demonstrate that, counterintuitively, a conservative criterion inflates effect sizes of neural measures compared to a liberal criterion. While the impact of criterion shifts on effect size is suggested by signal detection theory, this study is the first to address this explicitly within the consciousness literature. Decoding analysis of data from two EEG experiments further shows that different criteria lead to differential effects on classifier performance in post hoc sorting. The findings underscore the pervasive influence of experimental design and participant reports on neural measures of consciousness, revealing that criterion placement poses a critical challenge for researchers.

      Strengths and Weaknesses

      One of the strengths of this study is the inclusion of the Perceptual Awareness Scale (PAS), which allows participants to provide more nuanced responses regarding their perceptual experiences. This approach ensures that responses at the lowest awareness level (selection 0) are made only when trials are genuinely unseen. This methodological choice is important as it helps prevent the overestimation of unconscious processing, enhancing the validity of the findings.

      The authors also do a commendable job in the discussion by addressing alternative paradigms, such as wagering paradigms, as a possible remedy to the criterion problem (Peters & Lau, 2015; Dienes & Seth, 2010). Their consideration of these alternatives provides a balanced view and strengthens the overall discussion.

      Our initial review identified a lack of measures of variance as one potential weakness of this work. However we agree with the authors' response that plotting individual datapoints for each condition is indeed a good visualization of variance within a dataset.

      Impact of the Work:

      This study effectively demonstrates a phenomenon that, while understood within the context of signal detection theory, has been largely unexplored within the consciousness literature. Subjective measures may not reliably capture the construct they aim to measure due to criterion confounds. Future research on neural measures of consciousness should account for this issue, and no-report measures may be necessary until the criterion problem is resolved.

      We thank reviewer 3 for their positive words and for taking the time to evaluate our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      (1) The rationale for performing genomics, transcriptional, and proteomics work in 293T cells is not discussed. Further, there are no functional readouts mentioned in the 293T cells with expression of the fusion-oncogenes. Did these cells have any phenotypes associated with fusion-oncogene expression (proliferation differences, morphological changes, colony formation capacity)? Further, how similar are the gene expression signatures from RNA-seq to rhabdomyosarcoma? This would help the reader interpret how similar these cell models are to human disease.

      We appreciate the reviewer’s comments and understand the limitation of HEK293T cell culture. HEK293T cells were used as a surrogate system that enabled us to systemically examine and compare the transcriptional activation mechanisms between VGLL2-NCOA2/TEAD1-NCOA2 and YAP/TAZ. HEK293T cells have previously been used as a model system to study the signaling and transcriptional mechanisms of the Hippo/YAP pathway (1,2). Our data also showed that the ectopic expression of VGLL2-NCOA2 and TEAD1-NCOA2 in HEK293 cells can promote proliferation (Figure 1-figure supplement 1B), consistent with their potential oncogenic function.

      (2) TEAD1::NCOA2 fusion-oncogene model was not credentialed past H&E, and expression of Desmin. Is the transcriptional signature in C2C12 or 293T similar to a rhabdomyosarcoma gene signature?

      We understand the reviewer’s concern. VGLL2-NCOA2 in vivo tumorigenesis model generated by C2C12 cell orthotopic transplantation has recently been reported, and it exhibits similar characteristics with zebrafish transgenic tumors as well as human scRMS samples that carry the VGLL2-NCOA2 fusion (3). Due to the similar transcriptional and oncogenic mechanisms employed by both VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins, we expect that the TEAD1-NCOA2 dependent C2C12 transplantation model will closely resemble that induced by VGLL2-NCOA2.

      (3) For the fusion-oncogenes, did the HA, FLAG, or V5 tag impact fusion-oncogene activity? Was the tag on the 3' or 5' of the fusion? This was not discussed in the methods.

      To address the reviewer’s concern, we carefully compared the transcriptional activity of the fusion proteins with the HA tag at the 5’ end or FLAG and V5 tag at the 3’ end. We found that neither the tag type nor its location significantly affects the ability of VGLL2-NCOA2 and TEAD1-NCOA2 to induce downstream gene transcription, measured by qPCR. The data is summarized in Figure 1-figure supplement 1 G-H.

      (4) Generally, the lack of details in the figures, figure legends, and methods make the data difficult to interpret. A few examples are below:

      a. Individual data points are not shown for figure bar plots (how many technical or biological replicates are present and how many times was the experiment repeated?).

      As requested, we have added the individual data points to the bar plots. The Method section now includes information on the number of biological replicates and the times the experiments were repeated.

      b. What exons were included in the fusion-oncogenes from VGLL2 and NCOA2 or TEAD1 and NCOA2?

      We have now included the exon structure organization of VGLL2-NCOA2 or TEAD1-NCOA2 fusions in Figure 1-figure supplement 1A.

      c. For how long were the colony formation experiments performed? Two weeks?

      We have included more detailed information about the colony formation assay in the Methods section.

      d. In Figure 2D, what concentration of CP1 was used and for how long?

      The CP1 concentration and treatment duration information has now been included in the figure legend and Methods section.

      e. How was A485 resuspended for cell culture and mouse experiments, what is the percentage of DMSO?

      The Methods section now includes detailed information on how A485 is prepared for in vitro and in vivo experiments.

      f. How many replicates were done for RNA-seq, CUT&RUN, and ATACseq experiments?

      RNA-seq was done with three biological replicates and CUT&RUN and ATAC-seq were performed with two biological replicates. This information is now included in the Methods section for clarification.

      Reviewer #2 (Public Review):

      In the manuscript entitled "VGLL2 and TEAD1 fusion proteins drive YAP/TAZ-independent transcription and tumorigenesis by engaging p300", Gu et al. studied two Hippo pathway-related gene fusion events (i.e., VGLL2-NCOA2, TEAD1-NCOA2) in spindle cell rhabdomyosarcoma (scRMS) and showed that their fusion proteins can activate Hippo downstream gene transcription independent of YAP/TAZ. Using the BioID-based mass spectrometry analysis, the authors revealed histone acetyltransferase CBP/p300 as specific binding proteins for VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins. Pharmacologically targeting p300 inhibited the fusion proteins-induced Hippo downstream gene transcription and tumorigenic events.

      Overall, this study provides mechanistic insights into the scRMS-associated gene fusions in tumorigenesis and reveals potential therapeutic targets for cancer treatment. The manuscript is well-written and easy to follow.

      Here, several suggestions are made for the authors to improve their study.

      Main points

      (1) The authors majorly focused on the Hippo downstream gene transcription in this study, while a significant portion of genes regulated by the VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins are non-Hippo downstream genes (Figure 3). The authors should investigate whether the altered Hippo pathway transcription is essential for VGLL2-NCOA2 and TEAD1-NCOA2-induced cell transformation and tumorigenesis. Specifically, they should test if treatment with the TEAD inhibitor can reverse the cell transformation and tumorigenesis caused by VGLL2-NCOA2 but not TEAD1-NCOA2. In addition, it is important to examine whether YAP-5SA expression can rescue the inhibitory effects of A485 on VGLL2-NCOA2 and TEAD1-NCOA2-induced colony formation and tumor growth. This will help clarify whether Hippo downstream gene transcription is important for the oncogenic activities of these two fusion proteins.

      We thank the reviewer for the comments. Although we have not tested the small molecular TEAD inhibitor on VGLL2-NCOA2 or TEAD1-NCOA2-induced cell transformation and tumorigenesis, we expect that TEAD inhibition will block VGLL2-NCOA2- but not TEAD1-NCOA2-induced oncogenic activity. It is because TEAD1-NCOA2 does not contain the auto-palmitoylation sites and the hydrophobic pocket in the C-terminal YAP-binding domain of TEAD1 that the TEAD small molecule inhibitor occupies (4). We also appreciate the reviewer’s suggestion of YAP5SA rescue experiments. However, due to its strong oncogenic activity, YAP5SA itself can induce robust downstream transcription and cell transformation with or without A485 treatment, as shown in Figure 5. Thus, it will be unlikely to address whether non-Hippo downstream genes induced by the fusions are important for cell transformation and tumorigenesis. Because of the distinct nature of transcriptional and chromatin landscapes controlled by VGLL2-NCOA2/TEAD-NCOA2 and YAP, we speculate that both Hippo and non-Hippo-related downstream genes contribute to the oncogenic activation and tumor phenotypes induced by the fusion proteins.

      (2) Rationale for selecting CBP/p300 for functional studies needs to be provided. The BioID-MS experiment identified many interacting proteins for VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins (Table S4). The authors should explain the scoring system used to identify the high-interacting proteins for VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins. Was CEP/p300 the top candidates on the list? Providing this information will help justify the focus on CBP/p300 and validate their importance in this study.

      We appreciate the reviewer’s point. CBP/P300 is among the top hits in our proteomics screens of both VGLL2-NCOA2 and TEAD1-NCOA2. Our focus on CBP/P300 is mainly due to the well-established interactions between CBP/P300 and the NCOA family transcriptional co-activators, in which the CBP/P300-NCOA complex plays a central role in mediating nuclear receptors-induced transcriptional activation (5). In addition, our data is consistent with another re-current Vgll2 fusion identified in scRMS, VGLL2-CITED2 (6) that has a C-term fusion partner from CITED2, which is a known CBP/P300 interacting protein (7).

      (3) p300 was revealed as a key driver for the VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins-induced transcriptome alteration and tumorigenesis. To strengthen the point, the authors should identify the p300 binding region on VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins. Mutants with defects in p300 binding/recruitment should be generated and included as a control in the related q-PCR and tumorigenic studies. This work will help confirm the crucial role of p300 in mediating the oncogenic effects of these two fusion proteins.

      We thank the reviewer for the suggestion. We have performed the co-immunoprecipitation assay using the deletion mutant form of VGLL2-NCOA2. We have performed additional co-immunoprecipitation experiments and demonstrated that the C-term NCOA2 part of the fusion is responsible for mediating the interaction between the fusion protein and CBP/P300. These results are now included in the new Figure 5A and are consistent with the reported structural analysis of CBP/P300-NCOA complex (8). In addition, our new data showed the inability of the VGLL2-NCOA2 ∆NCOA2 mutant to induce gene transcription (Figure 1-figure supplement 1D). Furthermore, our data using the small molecular CBP/P300 inhibitor clearly demonstrated that CBP/P300 is required to mediate cell transformation and tumorigenesis induced by the two fusion proteins in vitro and in vivo (Figure 5 and 6).

      (4) Another major issue is the overexpression system extensively used in this study. It is important to determine whether the VGLL2-NCOA2 and TEAD1-NCOA2 fusion genes are also amplified in cancer. If not, the expression levels of the VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins should be adjusted to endogenous levels to assess their oncogenic effects on gene transcription and tumorigenesis. This approach would make the study more relevant to the pathological conditions observed in scRMS cancer patients.

      We appreciate the reviewer’s input and acknowledge the limitation of the HEK293T and C2C12 cell-based models that rely on ectopic expression of VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins. It is currently unclear whether the VGLL2-NCOA2 and TEAD1-NCOA2 fusion genes are also amplified in sarcoma. As mentioned before, these surrogate cell culture systems allowed us to systemically compare the transcriptional regulation by the fusion proteins and YAP/TAZ and elucidate the molecular mechanism underlying the Hippo/YAP-independent oncogenic transformation induced by VGLL2-NCOA2 and TEAD1-NCOA2.

      References:

      (1) Genes Dev . 2007 Nov 1;21(21):2747-61. doi: 10.1101/gad.1602907. Inactivation of YAP oncoprotein by the Hippo pathway is involved in cell contact inhibition and tissue growth control

      (2) Genes Dev . 2010 Jan 1;24(1):72-85. doi: 10.1101/gad.1843810. A coordinated phosphorylation by Lats and CK1 regulates YAP stability through SCF(beta-TRCP)

      (3) VGLL2-NCOA2 leverages developmental programs for pediatric sarcomagenesis. Watson S, LaVigne CA, Xu L, Surdez D, Cyrta J, Calderon D, Cannon MV, Kent MR, Cell Rep. 2023 Jan 31;42(1):112013.

      (4) Lats1/2 Sustain Intestinal Stem Cells and Wnt Activation through TEAD-Dependent and Independent Transcription. Cell Stem Cell. 2020 May 7;26(5):675-692.e8.

      (5) Yi, P., Yu, X., Wang, Z., and O’Malley, B.W. (2021). Steroid receptor-coregulator transcriptional complexes: new insights from CryoEM. Essays Biochem. 65, 857–866.

      (6) A Molecular Study of Pediatric Spindle and Sclerosing Rhabdomyosarcoma: Identification of Novel and Recurrent VGLL2-related Fusions in Infantile Cases. Am J Surg Pathol . 2016 Feb;40(2):224-35. doi: 10.1097/

      (7) CITED2 and the modulation of the hypoxic response in cancer. Fernandes MT, Calado SM, Mendes-Silva L, Bragança J.World J Clin Oncol. 2020 May 24;11(5):260-274.

      (8) Yu, X., Yi, P., Hamilton, R.A., Shen, H., Chen, M., Foulds, C.E., Mancini, M.A., Ludtke, S.J., Wang, Z., and O’Malley, B.W. (2020). Structural insights of transcriptionally active, full-length Androgen receptor coactivator complexes. Mol. Cell 79, 812–823.e4.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Giménez-Orenga et al. investigate the origin and pathophysiology of myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) and fibromyalgia (FM). Using RNA microarrays, the authors compare the expression profiles and evaluate the biomarker potential of human endogenous retroviruses (HERV) in these two conditions. Altogether, the authors show that HERV expression is distinct between ME/CFS and FM patients, and HERV dysregulation is associated with higher symptom intensity in ME/CFS. HERV expression in ME/CFS patients is associated with impaired immune function and higher estimated levels of plasma cells and resting CD4 memory T cells. This work provides interesting insights into the pathophysiology of ME/CFS and FM, creating opportunities for several follow-up studies.

      Strengths:

      (1) Overall, the data is convincing and supports the authors' claims. The manuscript is clear and easy to understand, and the methods are generally well-detailed. It was quite enjoyable to read.

      (2) The authors combined several unbiased approaches to analyse HERV expression in ME/CFS and FM. The tools, thresholds, and statistical models used all seem appropriate to answer their biological questions.

      (3) The authors propose an interesting alternative to diagnosing these two conditions. Transcriptomic analysis of blood samples using an RNA microarray could allow a minimally invasive and reproducible way of diagnosing ME/CFS and FM.

      Weaknesses:

      (1) The cohort analysed in this study was phenotyped by a single clinician. As ME/CFS and FM are diagnosed based on unspecific symptoms and are frequently misdiagnosed, this raises the question of whether the results can be generalised to external cohorts.

      Thank you for your comment. Surely the study of larger cohorts will determine the external validity of these results in a clinical scenario. However, this pilot study, first of its kind, was designed to maximize homogeneity across participants which seemed primarily ensured by the study of females only and diagnosis by a single experienced observer.

      (2) The analyses performed to unravel the causes and effects of HERV expression in ME/CFS and FM are solely based on sequencing data. Experimental approaches could be used to validate some of the transcriptomic observations.

      Certainly, experimental approaches may add robustness to the implication of HERVs in ME/CFS. We indeed consider taking this avenue to deepen in the findings presented here for future work. However, the limited knowledge of HERV-mediated physiological functions may hamper the obtention of prompt results towards revealing causes and effects of HERV expression in ME/CFS and FM.

      Reviewer #2 (Public review):

      Summary:

      Giménez-Orenga carried out this study to assess whether human endogenous retroviruses (HERVs) could be used to improve the diagnosis of Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) and Fibromyalgia (FM). To this end, they used the HERV-V3 array developed previously, to characterize the genome-wide changes in the expression of HERVs in patients suffering from ME/CFS, FM, or both, compared to controls. In turn, they present a useful repertoire of HERVs that might characterize ME/CFS and FM. For the most part, the paper is written in a manner that allows a natural understanding of the workflow and analyses carried out, making it compelling. The figures and additional tables present solid support for the findings. However, some statements made by the authors seem incomplete and would benefit from a more thorough literature review. Overall, this work will be of interest to the medical community seeking in better understanding of the co-occurrence of these pathologies, hinting at a novel angle by integrating HERVs, which are often overlooked, into their assessment.

      Strengths:

      (1) The work is well-presented, allowing the reader to understand the overall workflow and how the specific aims contribute to filling the knowledge gap in the field.

      (2) The analyses carried out to understand the potential impact on gene expression mediated by HERVs are in line with previous works, making it solid and robust in the context of this study.

      Weaknesses:

      (1) The authors claim to obtain genome-wide HERV expression profiles. However, the array used was developed using hg19, while the genomic analysis of this work are carried out using a liftover to hg38. It would improve the statement and findings to include a comparison of the differences in HERVs available in hg38, and how this could impact the "genome-wide" findings.

      This is an important point. However, the low number of probes (less than 100) that were excluded from our analysis by lack of correspondence with hg38 among the 1,290,800 probesets was interpreted as insignificant for "genome-wide" claims. An aspect that will be explained in the revised version of this manuscript.

      (2) The authors in some points are not thorough with the cited literature. Two examples are:

      a) Lines 396-397 the authors say "the MLT1, usually found enriched near DE genes (Bogdan et al., 2020)". I checked the work by Bogdan, and they studied bacterial infection. A single work in a specific topic is not sufficient to support the statement that MLT1 is "usually" in close vicinity to differentially expressed genes. More works are needed to support this.

      b) After the previous statement, the authors go on to mention "contributing to the coding of conserved lncRNAs (Ramsay et al., 2017)". First, lnc = long non-coding, so this doesn't make sense. Second, in the work by Ramsay they mention "that contributed a significant amount of sequence to primate lncRNAs whose expression was conserved", which is different from what the authors in this study are trying to convey. Again, additional work and a rephrasing might help to support this idea.

      Certainly, these two sentences need rephrasing to better adjust to current evidence.

      Revised sentences can now be found in lines 397-402

      (3) When presenting the clusters, the authors overlook the fact that cluster 4 is clearly control-specific, and fail to discuss what this means. Could this subset of HERV be used as bona fide markers of healthy individuals in the context of these diseases? Are they associated with DE genes? What could be the impact of such associations?

      Using control DE HERV as bona fide markers of healthy individuals seems like an interesting possibility worth exploring. Control DE HERV (cluster 4) associate with DE genes involved in apoptosis, T cell activation and cell-cell adhesion (modules 1 and 6). The impact of which deserves further study.

      Appraisals on aims:

      The authors set specific questions and presented the results to successfully answer them. The evidence is solid, with some weaknesses discussed above that will methodologically strengthen the work.

      Likely impact of work on the field:

      This work will be of interest to the medical community looking for novel ways to improve clinical diagnosis. Although future works with a greater population size, and more robust techniques such as RNA-Seq, are needed, this is the first step in presenting a novel way to distinguish these pathologies.

      It would be of great benefit to the community to provide a table/spreadsheet indicating the specific genomic locations of the HERVs specific to each condition. This will allow proper provenance for future researchers interested in expanding on this knowledge, as these genomic coordinates will be independent of the technique used (as was the array used here).

      We agree with the reviewer that sharing genomic locations of DE HERVs in these pathologies would contribute to the development of these findings. Unfortunately, we do not hold the rights to share probe coordinates from this custom HERV-V3 microarray which we used under MTA agreement with its developer.

      Reviewer #3 (Public review):

      The authors find that HERV expression patterns can be used as new criteria for differential diagnosis of FM and ME/CFS and patient subtyping. The data are based on transcriptome analysis by microarray for HERVs using patient blood samples, followed by differential expression of ERVs and bioinformatic analyses. This is a standard and solid data processing pipeline, and the results are well presented and support the authors' claim.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Recommandations/questions:

      (1) The authors point towards the biomarker potential of HERV expression signatures. In line with this, it would be important to test if they can predict the correct pathology for patients using the expression of DE HERVs. Additionally, as a single clinician annotated the cohort analysed in this study, it would be interesting to validate the signatures identified in this work by reanalysing publicly available transcriptomic data from independent studies.

      Thank you for the suggestion. We plan to conduct this analysis and have added the following statement to the manuscript (lines 482-483): “Given the limited sample size in our cohort, validation of the findings in extended cohorts is a must.”

      (2) The authors suggest that an epigenetic mechanism causes the dysregulated HERV expression in ME/CFS patients. However, in Fig.1A, HERV expression profiles of co-diagnosed patients are more similar to healthy controls than patients with either condition. How could the co-morbidity of FM "rescue" the phenotype of ME/CFS?

      Thank you for the insightful comment. It is notable that co-diagnosed patients exhibit HERV expression profiles more similar to those of healthy controls than to either FM´s or ME/CFS´s. These findings may suggest a distinct underlying pathomechanism for this patient group, supporting the identification of a novel nosologic entity, as discussed in lines 372-374 of the manuscript.

      (3) Abundant evidence in the literature links HERV dysregulation with the production of RNA:DNA hybrids and dsRNAs and viral mimicry. The authors found that ME/CFS subgroup 2, which exhibits the most important HERV dysregulation, is also associated with decreased signatures of pathogen detection. It would be interesting to quantify the abundance of DNA:RNA hybrids and dsRNAs in PBMCs of ME/CFS and FM patients as well as healthy controls. It would be interesting to discuss how downregulation of pathogen detection pathways could be a mechanism in ME/CFS patients to avoid viral mimicry and potential links with inflammation in this disease.

      Certainly, HERVs can influence disease pathophysiology by generating RNA:DNA hybrids and dsRNA. However, microarray data does not allow this analysis. Future actions to investigate the underlying mechanisms of differentially expressed HERVs could investigate this interesting possibility.

      (4) Another intriguing result is how overexpression of Module 3 in ME/CFS subgroup 2 is associated with higher levels of plasma cells. The authors hypothesize that the changes in immune cell abundances reflect previous viral infections, but another possibility would be immune activation against HERVs. Are there protein-coding sequences (gag, pro, pol, env) amongst the HERV sequences of module 3? If so, it would be interesting to validate HERV protein expression in these samples. Additionally, blood samples of ME/CFS patients and healthy controls should be analysed in flow cytometry to describe the abundance and phenotype of immune cells precisely.

      Thank you for your insightful comments. In fact, we identified three HERV elements with protein-coding regions whose functional relevance remains uncertain. They present an interesting avenue for future investigation, particularly regarding immune activation.

      Minor comments:

      (1) On lines 170-172, it is unclear to me how Figure 1E is linked to the text.

      We have added a line better explaining Fig. 1E: “Top 10 contributing HERVs to principal components PC1 and PC2 are shown” (lines 171-172).

      (2) Figure S2: grouping or colouring the plots based on the cluster to which HERVs were assigned could facilitate the understanding of the figure.

      We appreciate the suggestion to enhance the clarity of the figures. However, this color-coding cannot be implemented, as a family is not exclusively assigned to a single cluster.

      (3) How are the 4 HERV clusters of Figure 2 and the 8 modules of Figure 3 related to the clusters identified by hierarchical clustering in Figure 1? More details should be provided in the text (Results and Methods sections), and figures to illustrate the clustering strategy should be added if needed.

      To enhance clarity, we have included the following explanation in the results section (lines 244-251): “To uncover potentially affected physiologic functions linked to DE HERV, we examined how DE HERVs and DE genes with similar expression patterns grouped together in modules based on their intrinsic relationships by their hierarchical co-clustering (Fig. 3). Then, the functional significance of these modules was assessed by gene ontology (GO) analysis of the DE genes within each module. The hierarchical clustering analysis resulted in the identification of eight distinct modules, each characterized by unique combinations of DE HERV and DE gene patterns across all four study groups (Fig. 3)”.

      (4) Related to Figure 4, are there HERV sequences in module 3 located near genes important for plasma cells and/or resting CD4 memory T cells?

      Thank you for your insightful comment. However, gene relevance for plasma cells and/or resting CD4 memory T cells may depend on multiple factors in addition to cell type and subtypes and, therefore, the analysis may not be straight forward.

      Reviewer #2 (Recommendations for the authors):

      In Figure 1, the heatmap scale goes from -4 to 4. This should reflect at least the numbers on the lowest and highest end of the scale.

      Thank you for bringing this to our attention. The scale was correct; however, when arranging the panels, the numbers were not properly positioned. The figure has now been updated with the corrected version.

      Figure 2F and G, percentages are shown as decimal numbers up to 1.00, while it should be 100%, and so on.

      We also replaced this figure, changing the numbers to fit percentages.

      It would be interesting to know how the results change using FDR of 0.05. I'm not familiar with microarray thresholds, but in RNA-Seq, 0.1 is rarely used, with 0.05 being the standard. Could it be that a more stringent result better distinguishes the pathologies?

      Applying a more stringent threshold, such as FDR 0.05, may remove sequences that, while not strongly differentially expressed, may be still important for distinguishing between these pathologies. Therefore, we decided to also include DE tendencies (FDR<0.1) in this first of a kind study. Findings will need validation in enlarged cohorts.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      The authors aimed to investigate the interaction between tissue-resident immune cells (microglia) and circulating systemic neutrophils in response to acute, focal retinal injury. They induced retinal lesions using 488 nm light to ablate photoreceptor (PR) outer segments, then utilized various imaging techniques (AOSLO, SLO, and OCT) to study the dynamics of fluorescent microglia and neutrophils in mice over time. Their findings revealed that while microglia showed a dynamic response and migrated to the injury site within a day, neutrophils were not recruited to the area despite being nearby. Post-mortem confocal microscopy confirmed these in vivo results. The study concluded that microglial activation does not recruit neutrophils in response to acute, focal photoreceptor loss, a scenario common in many retinal diseases.

      Strengths:

      The primary strength of this manuscript lies in the techniques employed.

      In this study, the authors utilized advanced Adaptive Optics Scanning Laser Ophthalmoscopy (AOSLO) to document immune cell interactions in the retina accurately. AOSLO's micron-level resolution and enhanced contrast, achieved through near-infrared (NIR) light and phase-contrast techniques, allowed visualization of individual immune cells without extrinsic dyes. This method combined confocal reflectance, phase-contrast, and fluorescence modalities to reveal various cell types simultaneously. Confocal AOSLO tracked cellular changes with less than 6 μm axial resolution, while phase-contrast AOSLO provided detailed views of vascular walls, blood cells, and immune cells. Fluorescence imaging enabled the study of labeled cells and dyes throughout the retina. These techniques, integrated with conventional histology and Optical Coherence Tomography (OCT), offered a comprehensive platform to visualize immune cell dynamics during retinal inflammation and injury.

      Thank you!

      Weaknesses:

      One significant weakness of the manuscript is the use of Cx3cr1GFP mice to specifically track GFP-expressing microglia. While this model is valuable for identifying resident phagocytic cells when the blood-retinal barrier (BRB) is intact, it is important to note that recruited macrophages also express the same marker following BRB breakdown. This overlap complicates the interpretation of results and makes it difficult to distinguish between the contributions of microglia and infiltrating macrophages, a point that is not addressed in the manuscript.

      We agree that greater emphasis is required that CX3CR1 mice exhibit fluorescence in not only microglia, but also other cells of macrophage origin including monocytes, perivascular macrophages and some hyalocytes.

      Through the advantages of in vivo AOSLO, however, we are able to establish that CX3CR1 cells are present within the tissue before the laser lesion is placed. This suggests they are tissue resident. We agree that it is possible that at later time points (days-weeks), systemic macrophages and/or monocytes may participate. Lack of rolling/crawling cells suggest they are not systemic. We elaborate on this point in a new section in the discussion:

      P29 L534-541:

      “CX3CR1-GFP mice exhibit fluorescence not only in microglia

      We recognize that the CX3CR1-GFP model can also label systemic cells such as monocytes/macrophages77. While it is possible these cells could infiltrate the retina in response to the lesion, we find it unlikely since there was no indication of the leukocyte extravasation cascade (rolling/crawling/stalled cells) within the nearest retinal vasculature. In addition to microglia, retinal perivascular macrophages and hyalocytes also exhibit GFP fluorescence and thus that these cells may also contribute toward damage resolution.”

      Another major concern is the time point chosen for analyzing the neutrophil response. The authors assess neutrophil activity 24 hours after injury, which may be too late to capture the initial inflammatory response. This delayed assessment could overlook crucial early dynamics that occur shortly after injury, potentially impacting the overall findings and conclusions of the study.

      The power of in vivo imaging makes these early assessments possible. Therefore, we have taken the reviewers concern and conducted an additional experiment which examines whether neutrophils are seen in the window of time between lesion and 24hrs. In a newly examined mouse, we find that within 3.5 hours post-lesion, neutrophils do not extravasate adjacent to the lesion site (see new “figure 8 – figure supplement 1”).

      Also see accompanying video (new “figure 8 – video 3”) for an example of nearby neutrophils flowing through OPL capillaries just microns away from the lesion site. Neutrophils are clearly contained within the vasculature and exhibit dynamics consistent with healthy retinal tissue. While it remains possible that the lesion may increase leukocyte stalling within the nearest capillaries, we are unable to confirm or deny this with a single experiment. We now submit this evidence as a new supplementary figure following the reviewer’s suggestion.

      Reviewer #2 (Public review):

      Summary:

      This study uses in vivo multimodal high-resolution imaging to track how microglia and neutrophils respond to light-induced retinal injury from soon after injury to 2 months post-injury. The in vivo imaging finding was subsequently verified by an ex vivo study. The results suggest that despite the highly active microglia at the injury site, neutrophils were not recruited in response to acute light-induced retinal injury.

      Strengths:

      An extremely thorough examination of the cellular-level immune activity at the injury site. In vivo imaging observations being verified using ex vivo techniques is a strong plus.

      We appreciate this recognition and hope that the reviewer considers the weaknesses below in the context of the papers identified strengths.

      Weaknesses:

      This paper is extremely long, and in the perspective of this reviewer, needs to be better organized.

      We agree and have taken the following steps to address this:

      (1) Paper has been shortened overall by 8%

      (2) We reorganized the following sections:

      a. Introduction: shortened

      b. Methods: merged section “Ex vivo confocal image processing” with “Ex vivo confocal imaging”.

      c. Results: most sections shortened, others simplified for concision

      d. Discussion: most sections shortened, removed “Microglial/neutrophil discrimination using label-free phase contrast”

      e. Figure references reorganized in order of their appearance.

      Study weakness: though the finding prompts more questions and future studies, the findings discussed in this paper are potentially important for us to understand how the immune cells respond differently to different severity levels of injury.

      On the heels of this burgeoning technology, we consider this report among the first studies of its kind. We are hopeful that it forms the foundation of many further investigations to come. We expect a rich parameter space to be explored with future studies including investigation of other time points, other injuries of varying degree and other immune cell populations (along with their interactions with each other). Each has the potential to reveal the complexities of the ocular immune system in action.

      Reviewer #3 (Public review):

      Summary:

      This work investigated the immune response in the murine retina after focal laser lesions. These lesions are made with close to 2 orders of magnitude lower laser power than the more prevalent choroidal neovascularization model of laser ablation. Histology and OCT together show that the laser insult is localized to the photoreceptors and spares the inner retina, the vasculature, and the pigment epithelium. As early as 1-day after injury, a loss of cell bodies in the outer nuclear layer is observed. This is accompanied by strong microglial proliferation at the site of injury in the outer retina where microglia do not typically reside. The injury did not seem to result in the extravasation of neutrophils from the capillary network constituting one of the main findings of the paper. The demonstrated paradigm of studying the immune response and potentially retinal remodeling in the future in vivo is valuable and would appeal to a broad audience in visual neuroscience. However, there are some issues with the conclusions drawn from the data and analysis that can be addressed to further bolster the manuscript.

      Strengths:

      Adaptive optics imaging of the murine retina is cutting edge and enables non-destructive visualization of fluorescently labeled cells in the milieu of retinal injury. As may be obvious, this in vivo approach is beneficial for studying fast and dynamic immune processes on a local time scale - minutes and hours, and also for the longer days-to-months follow-up of retinal remodeling as demonstrated in the article. In certain cases, the in vivo findings are corroborated with histology.

      Thank you!

      The analysis is sound and accompanied by stunning video and static imagery. A few different sets of mouse models are used, (a) two different mouse lines, each with a fluorescent tag for neutrophils and microglia, (b) two different models of inflammation - endotoxin-induced uveitis (EAU) and laser ablation are used to study differences in the immune interaction.

      Thank you!

      One of the major advances in this article is the development of the laser ablation model for 'mild' retinal damage as an alternative to the more severe neovascularization models. While not directly shown in the article, this model would potentially allow for controlling the size, depth, and severity of the laser injury opening interesting avenues for future study.

      We agree that there is an established community that is invested in developing titrated dosimetry for light damage models. As the reviewer recognizes, this parameter space is exceptionally large therefore we controlled this parameter by choosing a single wavelength that is commonly used in ophthalmoscopy (488nm), fixed duration and exposure regime that created a reproducible, mild damage of photoreceptors. At this titration we created a mild lesion that spares retina above and below.

      Weaknesses:

      (1) It is unclear based on the current data/study to what extent the mild laser damage phenotype is generalizable to disease phenotypes. The outer nuclear cell loss of 28% and a complete recovery in 2 months would seem quite mild, thus the generalizability in terms of immune-mediated response in the face of retinal remodeling is not certain, specifically whether the key finding regarding the lack of neutrophil recruitment will be maintained with a stronger laser ablation.

      It seems the concern here is whether our finding is generalizable to other damage regimes, especially more severe ones. While speculative, we would suspect that it is not generalizable across different lesions of greater severity. For example, puncturing Bruch’s membrane is an example of a more severe phenotype that is often encountered in laser damage. However, this creates a complicated model that not only induces inflammation, but also compromises BRB integrity and promotes CNV. The parameter space to be tested in the reviewer’s question is quite vast and therefore have tried to summarize the generalizability within our manuscript in

      P31 L586-588 “There are limitations on how generalizable this mild damage to more severe damage or disease phenotypes, but this acute damage model can begin to provide clues about how immune cells interact in response to PR loss. In this laser lesion model, we ablate 27% of the PRs in a 50 µm region.”

      (2) Mice numbers and associated statistics are insufficient to draw strong conclusions in the paper on the activity of neutrophils, some examples are below:

      a) 2 catchup mice and 2 positive control EAU mice are used to draw inferences about immune-mediated activity in response to injury. If the goal was to show 'feasibility' of imaging these mouse models for the purposes of tracking specific cell type behavior, the case is sufficiently made and already published by the authors earlier. It is possible that a larger sample size would alter the conclusion.

      We would like to highlight that the total number of mice studied in this report was 28 (18 in-vivo imaging, 10 ex-vivo histology, >40 lesions total). While power analysis is challenging as these are the first studies of their kind, we underscore that in vivo imaging allows those same mice to be studied multiple times longitudinally. This is not possible with traditional histology. Therefore, in vivo imaging not only reveals the temporal progression (unlike histology), but also increases the number of observations beyond a simple count of the “number of mice”.

      The goal of the study was not one of feasibility. The goal was to address a specific question in ocular biology: “do resident CX3CR1 cells recruit neutrophils in early, regional retinal injury”

      The low numbers that the reviewer points to, are not the primary data of the paper, rather, supportive control data. Moreover, we refocus the attention on the fact that our study is performed on 28 mice across multiple modalities and each corroborates a common finding that neutrophils do not appear to be recruited despite strong microglial response; a central finding of the paper.

      b) There are only 2 examples of extravasated neutrophils in the entire article, shown in the positive control EAU model. With the rare extravasation events of these cells and their high-speed motility, the chance of observing their exit from the vasculature is likely low overall, therefore the general conclusions made about their recruitment or lack thereof are not justified by these limited examples shown.

      The spirit of the challenge raised is that because nothing was seen, is not proof that nothing occurred. Said more commonly, “absence of evidence is not evidence of absence”- a quote often attributed to Carl Sagan. Yet we push back on this conjecture as we have shown, not only with cutting edge in vivo imaging, but also with ample histological controls as well as multiple transgenic animals (and corroborating IHC antibodies) that in none of these imaging modalities, at none of the time points we evaluated, did neutrophils aggregate or extravasate in response to photoreceptor ablation.

      Reviewer adds: “the chance of observing their exit from the vasculature is likely low overall…”

      This is the reason that we specifically chose a focal lesion model to increase any possible chance of imaging a rare event. The focal lesion provides both a time and a location for “where” to look. Small 50 micrometer lesions were sufficient to drive a strong local microglial response (figures 5,6,9). This was evidence that local inflammatory cues were present. Yet despite this activation, neutrophils were not recruited to this location. We emphasize that this is a strength of our approach over other pan-retinal damage models that may indeed miss the rare extravasation events that are geographically sparse and happen over hours.

      c) In Figure 3, the 3-day time point post laser injury shows an 18% reduction in the density of ONL nuclei (p-value of 0.17 compared to baseline). In the case of neutrophils, it is noted that "Control locations (n = 2 mice, 4 z-stacks) had 15 {plus minus} 8 neutrophils per sq.mm of retina whereas lesioned locations (n = 2 mice, 4 z-stacks) had 23 {plus minus} 5 neutrophils per sq.mm of retina (Figure 10b). The difference between control and lesioned groups was not statistically significant (p = 0.19)." These data both come from histology. While the p-values - 0.17 and 0.19 - are similar, in the first case a reduction in ONL cell density is concluded while in the latter, no difference in neutrophil density is inferred in the lesioned case compared to control. Why is there a difference in the interpretation where the same statistical test and methodology are used in both cases? Besides this statistical nuance, is there an alternate possibility that there is an increased, albeit statistically insignificant, concentration of circulating neutrophils in the lesioned model? The increase is nearly 50% (15 {plus minus} 8 vs. 23 {plus minus} 5 neutrophils per sq.mm) and the reader may wonder if a larger animal number might skew the statistic towards significance.

      The statistics and p-values will be dependent on the strategy of analysis performed. As described in the methods, we used a predetermined 50 micron cylinder for our counting analysis based on the average lesion size created. We used this circular window to roughly approximate the size of the common lesion size. However, recall that the damage is created in a single axis (a line projected on the retina) therefore it is possible that the analysis region is too generous to capture the exceptionally local damage.

      While the reviewer is focused on the nuance of statistics, we would like to refocus the conversation on our data that shows that very few neutrophils were observed at all (105 cells from 8 locations, P value reported). But missed in the above critique is that all neutrophils were contained within capillaries (Fig 10). We found no examples of extravasated neutrophils.  This is the major finding and is supported by our in vivo as well as ex vivo confirmation.

      (2) The conclusions on the relative activity of neutrophils and microglia come from separate animals. The reader may wonder why simultaneous imaging of microglia and neutrophils is not shown in either the EAU mice or the fluorescently labeled catchup mice where the non-labeled cell type could possibly be imaged with phase-contrast as has been shown by the authors previously. One might suspect that the microglia dynamics are not substantially altered in these mice compared to the CX3CR1-GFP mice subjected to laser lesions, but for future applicability of this paradigm of in vivo imaging assessment of the laser damage model, including documenting the repeatability of the laser damage model and the immune cell behavior, acquiring these data in the same animals would be critical.

      A double fluorescent mouse (neutrophils and microglia) is a logical next step of this research. In fact, we have now crossed these transgenic mice and are studying this double labeled mouse in a second manuscript in preparation. However, for this study, it was imperative that the fluorescent imaging light was kept at low levels as not to contribute or alter the lesion phenotype and accompanying immune response. Therefore, imaging two fluorescent channels to simultaneously view neutrophils and microglia in the same animal would have required at least 2X the visible light exposure for imaging. The imaging light levels used in the current study were carefully examined in our previous publications as to not create additional light damage (Joseph et al 2021).

      (3) Along the same lines as above, the phase contrast ONL images at time points from 3-day to 2-month post laser injury are not shown and the absence of this data is not addressed. This missing data pertains only to the in vivo imaging mice model but are conducted in histology that adequately conveys the time-course of cell loss in the ONL.

      The ocular preparation of the phase contrast data in figure 2, unfortunately developed an anesthesia induced cataract that precluded adequate image quality. This is not uncommon in long-term mouse ocular imaging preparations (Feng et al 2023). Instead, we chose to include the phase-contrast data to show the visually compelling intact and disrupted ONL damage for baseline and 1 day to show that the damage is not only focal, but also shows clear disruption to the somatic layers of the photoreceptors.

      It is suggested that the reason be elaborated for the exclusion of this data and the simultaneous imaging of microglia and neutrophils mentioned above.

      We agree and we have included the reason for the “not acquired” data within the figure 2 legend:

      “Phase contrast data was not acquired for time points 3 days-2 months due to development of cataract which obscured the phase contrast signal”

      Also, it would be valuable to further qualify and check the claims in the Discussion that "ex vivo analysis confirms in vivo findings" and "Microglial/neutrophil discrimination using label-free phase contrast"

      We maintain that ex vivo analysis both corroborates and in many cases, confirms our in vivo findings. We feel this is a strength of our manuscript rather than a qualifier. A) Damage localization is visible with OCT and confocal/phase contrast AOSLO in a region that matches the DAPI loss we see ex vivo. B) Disruption of the ONL seen with in vivo AOSLO is of the same size, shape and location as the ONL damage quantified ex vivo. C) No damage or disruption was seen in locations above the lesion with OCT or AOSLO, which matches our finding that only the ONL shows loss of nuclei whereas other more superficial layers are spared. D) Microglial localization is found both in vivo and ex vivo and E) lack of neutrophil aggregation or extravasation was neither seen in vivo or ex vivo. Given the evidence above, we contend that this strong synergistic and complementary approach corroborates the experimental data in two ways of studying this tissue.

      We agree that the claims made in the section entitled “Microglial/neutrophil discrimination using label-free phase contrast” are not strongly supported by the phase-contrast imaging presented in this paper. Accordingly, we have since removed this section based on reviewer suggestion.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Based on the title and abstract, the main focus of the manuscript appears to be the immune response. However, most of the manuscript is dedicated to the authors' imaging technique. Additionally, several important concerns regarding the investigation of the immune response in the retina need to be addressed.

      We understand that emphasis may appear to be on the imaging technique, however, because AOSLO is not a widely used technology, we are committed to explaining the technique so that it both builds awareness and confidence in the way this exciting new data is acquired.

      (2) The authors indicate '1 day post-injury' as a timeframe spanning between 18 and 28 hours post-injury. This is a rather wide window of time, which could potentially affect the analysis. It is necessary to demonstrate that there is no significant difference in the immune response, particularly in terms of microglial morphology and branch orientation, between 18 and 28 hours post-injury.

      We agree that a fine time scale may show even greater insight to the natural history of the inflammatory response. However, we feel that our chosen time points go above and beyond the temporal precision that is offered by other investigations, especially considering the novel multi-modal imaging performed here. Studies using finer temporal sampling are poised for future investigation.

      (3) The authors should consider using additional markers or complementary techniques to differentiate between microglia and recruited macrophages, such as incorporating immunohistochemistry with P2RY12, a specific marker for microglia that helps distinguish them from macrophages, and CD68 or F4/80, markers for recruited macrophages. It is also crucial for the authors to include a discussion addressing the limitations of using Cx3cr1GFP mice and the potential impact on result interpretation. It is fundamental to validate the findings and clarify the roles of microglia and macrophages.

      The wonders of current IHC is that there are myriad antibodies and labels that “could” be used. We used what we felt were the most compelling for this stage of early investigation. We look forward to studies that employ this wider range of labels. See our response to reviewer 1’s first comment above for addressing the limitations of using Cx3CR1 mice.

      (4) Analyzing neutrophil responses at 24 hours post-injury may be too late to capture the critical early dynamics of inflammation. By this time, the initial recruitment and activation phases of neutrophils may have already peaked or begun to resolve, potentially missing key insights into the immediate immune response. The authors should conduct additional analysis of neutrophil responses at earlier time points post-injury, such as 6 or 12 hours. Including these time points would provide a more comprehensive and conclusive analysis of the neutrophil response, helping to delineate the progression of inflammation and its implications for subsequent healing processes.

      This point has been addressed above. Briefly, we have now included a new experiment (and figure + video) that shows no neutrophil extravasation at earlier time points. We thank the reviewer for this helpful suggestion.

      Reviewer #2 (Recommendations for the authors):

      This paper is extremely long, and in the perspective of this reviewer, needs to be better organized.

      (1) There was a lengthy description and verification of light-induced injury and longitudinal tracking of healing, which I believe can be further cleaned up and made more succinct.

      We have cleaned-up and re-organized the manuscript (see above response for details). Manuscript has been reorganized and reduced by 8%.

      (2) The intention/goal of the paper can be further strengthened. On page 33: "to what extent do neutrophils respond to acute neural loss in the retina?" This particular statement is so clear and really brings out the purpose of this study, and it will be great to see something like this in the opening statement.

      We thank the reviewer for this excellent suggestion. We have modified the final paragraph of the introduction to strengthen our study’s intention.

      P4 L45-47: Here, we ask the question: “To what extent do microglia/neutrophils respond to acute neural loss in the retina?” To begin unraveling the complexities in this response, we deploy a deep retinal laser ablation model.

      (3) The figures are not mentioned in the manuscript in the order they were numbered. It makes it extremely challenging to follow along. The methods/results sections started with Figure 1, then on to Figure 4, then back to Figures 2 and 3, etc. This reviewer recommends re-organizing figures and their order of appearance so the contents of the figures are referred to in the paragraph in the most efficient and clear manner.

      We have re-organized the appearance of figure references throughout the paper.

      (4) Figure 2: phase contrast was not acquired on days 3, 7, and 2 months. Please briefly explain the reason in the caption.

      Addressed above.

      (5) Figure 4 OPL layer, the area highlighted in a dashed circle was meant to demonstrate that perfusion was intact, but I cannot see the flow in the highlighted area very well at day 7 and 2 months (especially 2 months). Please explain.

      Perfusion maps are often difficult to interpret as a static image. Therefore, we have additionally provided the raw video data (“OPL_vasculature_7d” and “OPL_vasculature_2mo”) which helps visualize active perfusion. To the reviewer’s point, videos reveal that RBC motion is maintained in the capillaries of this location.

      (6) While there's a thorough discussion of the biological impact of the finding, the uniqueness of the imaging technique can be better highlighted. Immune response toward injury is highly dynamic and is often the first step of wound healing. To observe such dynamic events longitudinally in the living eye at the cellular level, it requires a special imaging technique such as the type addressed here. The author can better address the technical uniqueness of studying this type of biological event for readers less familiar with AOSLO.

      We agree and following the reviewer’s suggestion have further emphasized the advance in the current manuscript in two additional places:

      (1) Within the introduction

      P3-4 L21-42: “A missed window of interaction is highly problematic in histological study where a single time point reveals a snapshot of the temporally complex immune response, which changes dynamically over time. Here, we use in vivo imaging to overcome these constraints.

      Documenting immune cell interactions in the retina over time has been challenged by insufficient resolution and contrast to visualize single cells in the living eye. The microscopic size of immune cells requires exceptional resolution for detection. Recently, advances in AOSLO imaging have provided micron-level resolution and enhanced contrast for imaging individual immune cells in the retina and without requiring extrinsic dyes(7,23). AOSLO provides multi-modal information from confocal reflectance, phase-contrast and fluorescence modalities, which can reveal a variety of cell types simultaneously in the living eye. Here, we used confocal AOSLO to track changes in reflectance at cellular scale. Phase-contrast AOSLO provides detail on highly translucent retinal structures such as vascular wall, single blood cells(27–29), PR somata(30), and is well-suited to image resident and systemic immune cells.(7,23) Fluorescence AOSLO provides the ability to study fluorescently-labeled cells(25,31,32) and exogenous dyes(27,33) throughout the living retina. These modalities used in combination have recently provided detailed images of the retinal response to a model of human uveitis.(23,34) Together, these innovations now provide a platform to visualize, for the first time, the dynamic interplay between many immune cell types, each with a unique role in tissue inflammation.”

      (2) Within the discussion

      P34-35 L656-662 “Beyond the context of this specific finding, we share this work with the excitement that AOSLO cellular level imaging may reveal the interaction of multiple immune cell types in the living retina. By using fluorophores associated with specific immune cell populations, the complex dynamics that orchestrate the immune response may be examined in this specialized tissue. This work and future studies may reveal further insights to the interactions of single immune cells in the living body in a non-invasive way.”

      Reviewer #3 (Recommendations for the authors):

      Some other comments:

      (1) The reader may wonder why if all findings are confirmed by histology would an in vivo imaging model be needed. This does not need a generalized explanation given the typical virtues of an in vivo model, but perhaps the authors may want to amplify their findings in the current context, for example, those on the shorter minutes to hours timescales (Figure 2, Supplement 1) that would have been resource and time intensive, and likely impossible, to gather via histology alone.

      The reviewer appropriately underscores the utility of in vivo imaging above histological-only investigation. In response, we have added text in the introduction to emphasize the nuanced, but important value of both longitudinal imaging as well as dynamic imaging which is not possible with conventional histology (e.g. blood perfusion status, immune cell interactions etc.)

      P3-4 L21-42 (these points also addressed in response to reviewer #2 above)

      (2) A few questions and comments on the laser ablation model<br /> - It is alluded to in the Discussion in Lines 519-521 that the procedure is highly reproducible (95%) but the associated data for this repeatability metric is not shown.

      We agree that the criterion for determining a “successful lesion” requires further elaboration. Therefore, we have now included the criteria for successful lesions in the methods as well as discussion (in bullet below):

      Methods:

      P9-10 L129-133: “This protocol produced a hyper-reflective phenotype in the >40 locations across 28 mice. In rare cases, the exposure yielded no hyper-reflective lesion and were often in mice with high retinal motion, where the light dosage was spread over a larger retinal area. These locations were not included in the in-vivo or histological analysis.”

      - The methods state that a 24 x 1-micron line is focused on the retina, but all lesions seem to appear elliptical where the major to minor axis ratio is a lot smaller than this intended size. One wonders what leads to this discrepancy.

      We expect that this observation is related to the response above, we have added the following:

      Discussion:

      P27 L497-505: “The damage took on an elliptical form, likely due to: 1) Eye motion from respiration and heart rate which spreads the light over a larger integrative area (rather than line). 2) The impact of focal light scatter. 3) A micron-thin line imparting damage on cells that are many microns across manifesting as an ellipse. The majority of light exposures produced lesions of this elliptical shape. In a few conditions, for the reasons described above, the exposure failed to produce a strong, focal damage phenotype. To improve lesion reproducibility, future experiments should control for subtle eye motion affecting light damage, especially for long exposures.”

      (3) Lastly, a thickening is noted in the ONL after laser injury that seems to cause a thinning of the INL as well (Figure 3) which may increase the apparent INL nuclei density.

      The reviewer’s careful eye finds local swelling after injury. However, despite swelling, the segregation between INL and ONL was maintained in all days we examined. Thus, no ONL cells were included in INL counts (see figure 3A & 3D).

      Also, the ONL - inner (panel B) seems to show a little reduction in cell density in the same elliptical shape as the outer ONL in panel C.

      We agree with this observation and was one of the reasons we included this detailed analysis of both the inner and outer half of the ONL. Our finding is that there is more prominent loss of nuclei in the outer half of the ONL. While the mechanism for this is not understood, we felt it was an important finding to include and further shows the axial specificity of the light damage we are inducing (especially at day 1 observation).

      Lastly, the reduction in nuclear density is visually obvious in the ONL at the 1 and 3-day time points but the p-statistic does not seem to convey this. One may consider performing the analysis on panel F on a smaller region surrounding the lesion to more reliably reveal these effects.

      Related to the response above, the ONL shows a persistence of nuclei in the upper half of that layer, whereas the outer half, shows a visible reduction. Therefore, we expect that the reviewer is correct that a statistical analysis that considers just the outer half of the ONL would likely show a strong statistical significance. The challenge, however, is that our analysis strategy counted all cells within a 50 micron diameter cylinder through the entirety of the ONL (meaning strong loss in the outer half was attenuated by weak loss in the inner half). A more detailed sub-layer analysis is challenging given the notable retinal remodeling over days-to-weeks that make it challenging to attribute layers within the ONL as viable landmarks for the requested analysis.

      (4) In Figure 6, the NIR confocal image and fluorescent microglia seem to share the same shape, starting from the OPL and posterior to it. This is particularly evident in the 3 and 7-day time points in the ONL and ONL/IS images. This departs from lines 567-577 where the claim is made that the hyperreflective phenotype in NIR images does not emerge from the microglia and neutrophils. This discrepancy should be clarified. It may be so that the hyperreflective phenotype as observed by Figure 2 at shorter timescales is not related to the microglia but the locus of hyper-reflections changes at longer time scales to involve the microglia as well as in Figure 6. One potential clue/speculation of the common shapes/size in confocal hyper-reflectance and fluorescent microglia of Figure 6 comes from Figure 9 where the microglia seem to engulf the photoreceptor phagosomes in the DAPI stains. It is possible that the hyper-reflections arise from the phagosomes but their co-localization with microglia seems to demonstrate a shared size/shape. As an addendum to the first point, such correlations are a power of the in vivo model and impossible to achieve in histology.

      The reviewer shows a deep understanding of our data. We agree with many of the points, but for the purpose of the paper many of the above offerings are speculative and we have chosen not to elaborate on these points as it is not definitive from the data. Instead, we direct the reader to an important finding that within hours, the hyper-reflective phenotype is seen in both OCT and AOSLO, whereas microglial somas/processes have not yet migrated into the hyper-reflective region. We have now emphasized this point in the discussion section:

      P29-30 L543-552: “A common speculation is that the increased backscatter may arise from local inflammatory cells that activate or move into the damage location. In our data, confocal AOSLO and OCT revealed a hyperreflective band at the OPL and ONL after 488 nm light exposure (Figure 2a, b). We found that the hyperreflective bands appeared within 30 minutes after the laser injury, preceding any detectable microglial migration toward the damage location (Figure 2 – figure supplement 1 and Figure 6 – figure supplement 1). We thus conclude that the initial hyperreflective phenotype is not caused by microglial cell activity or aggregation.”

    1. Author response:

      The following is the authors’ response to the previous reviews

      eLife Assessment

      This work presents a valuable self-supervised method for the segmentation of 3D cells in microscopy images, alongside an implementation as a Napari plugin and an annotated dataset. While the Napari plugin is readily applicable and promises to eliminate time consuming data labeling to speed up quantitative analysis, there is incomplete evidence to support the claim that the segmentation method generalizes to other light-sheet microscopy image datasets beyond the two specific ones used here.

      Technical Note: We showed the utility of CellSeg3D in the first submission and in our revision on 5 distinct datasets; 4 of which we showed F1-Score performance on. We do not know which “two datasets” are referenced. We also already showed this is not limited to LSM, but was used on confocal images; we already limited our scope and changed the title in the last rebuttal, but just so it’s clear, we also benchmark on two non-LSM datasets.

      In this revision, we have now additionally extended our benchmarking of Cellpose and StarDrist on all 4 benchmark datasets, where our Wet3D (our novel contribution of a self-supervised model) outperforms or matches these supervised baselines. Moreover, we perform rigorous testing of our model’s generalization by training on one dataset and testing generalization to the other 3; we believe this is on par (or beyond) what most cell segmentation papers do, thus we hope that “incomplete” can now be updated.

      Public Reviews:

      Reviewer #1 (Public review):

      This work presents a self-supervised method for the segmentation of 3D cells in microscopy images, an annotated dataset, as well as a napari plugin. While the napari plugin is potentially useful, there is insufficient evidence in the manuscript to support the claim that the proposed method is able to segment cells in other light-sheet microscopy image datasets than the two specific ones used here.

      Thank you again for your time. We benchmarked already on four datasets the performance of WNet3Dd (our 3D SSL contribution) - thus, we do not know which two you refer to. Moreover, we now additionally benchmarked Cellpose and StarDist on all four so readers can see that on all datasets, WNet3D outperforms or matches these supervised methods.

      I acknowledge that the revision is now more upfront about the scope of this work. However, my main point still stands: even with the slight modifications to the title, this paper suggests to present a general method for self-supervised 3D cell segmentation in light-sheet microscopy data. This claim is simply not backed up.

      We respectfully disagree; we benchmark on four 3D datasets: three curated by others and used in learning ML conference proceedings, and one that we provide that is a new ground truth 3D dataset - the first of its kind - on mesoSPIM-acquired brain data. We believe benchmarking on four datasets is on par (or beyond) with current best practices in the field. For example, Cellpose curated one dataset and tested on held-out test data on this one dataset (https://www.nature.com/articles/s41592-020-01018-x) and benchmarked against StarDist and Mask R-CNN (two models). StarDist (Star-convex Polyhedra for 3D Object Detection and Segmentation in Microscopy) benchmarked on two datasets and against two models, IFT-Watershed and 3D U-Net. Thus, we feel our benchmarking on more models and more datasets is sufficient to claim our model and associated code is of interest to readers and supports our claims (for comparison, Cellpose’s title is “Cellpose: a generalist algorithm for cellular segmentation”, which is much broader than our claim).

      I still think the authors should spell out the assumptions that underlie their method early on (cells need to be well separated and clearly distinguishable from background). A subordinate clause like "often in cleared neural tissue" does not serve this purpose. First, it implies that the method is also suitable for non-cleared tissue (which would have to be shown). Second, this statement does not convey the crucial assumptions of well separated cells and clear foreground/background differences that the method is presumably relying on.

      We expanded the manuscript now quite significantly. To be clear, we did show our method works on non-cleared tissue; the Mouse Skull, 3D platynereis-Nuclei, and 3D platynereis-ISH-Nuclei is not cleared tissue, and not all with LSM, but rather with confocal microscopy. We attempted to make that more clear in the main text.

      Additionally, we do not believe it needs to be well separated and have a perfectly clean background. While we removed statements like "often in cleared neural tissue", expanded the benchmarking, and added a new demo figure for the readers to judge. As in the last rebuttal, we provide video-evidence (https://www.youtube.com/watch?v=U2a9IbiO7nE) of the WNet3D working on the densely packed and hard to segment by a human, Mouse Skull dataset and linked this directly in the figure caption.

      We have re-written the main manuscript in an attempt to clarify the limitations, including a dedicated “limitations” section. Thank you for the suggestion.

      It does appear that the proposed method works very well on the two investigated datasets, compared to other pre-trained or fine-tuned models. However, it still remains unclear whether this is because of the proposed method or the properties of those specific datasets (namely: well isolated cells that are easily distinguished from the background). I disagree with the authors that a comparison to non-learning methods "is unnecessary and beyond the scope of this work". In my opinion, this is exactly what is needed to proof that CellSeg3D's performance can not be matched with simple image processing.

      We want to again stress we benchmarked WNet3D on four datasets, not two. But now additionally added benchmarking with Cellpose, StarDist and a non-deep learning method as requested (see new Figures 1 and 3).

      As I mentioned in the original review, it appears that thresholding followed by connected component analysis already produces competitive segmentations. I am confused about the authors' reply stating that "[this] is not the case, as all the other leading methods we fairly benchmark cannot solve the task without deep learning". The methods against which CellSeg3D is compared are CellPose and StarDist, both are deep-learning based methods.

      That those methods do not perform well on this dataset does not imply that a simpler method (like thresholding) would not lead to competitive results. Again, I strongly suggest the authors include a simple, non-learning based baseline method in their analysis, e.g.: * comparison to thresholding (with the same post-processing as the proposed method) * comparison to a normalized cut segmentation (with the same post-processing as the proposed method)

      We added a non-deep learning based approach, namely, comparing directly to thresholding with the same post hoc approach we use to go from semantic to instance segmentation. WNet3D (and other deep learning approaches) perform favorably (see Figure 2 and 3).

      Regarding my feedback about the napari plugin, I apologize if I was not clear. The plugin "works" as far as I tested it (i.e., it can be installed and used without errors). However, I was not able to recreate a segmentation on the provided dataset using the plugin alone (see my comments in the original review). I used the current master as available at the time of the original review and default settings in the plugin.

      We updated the plugin and code for the revision at your request to make this possible directly in the napari GUI in addition to our scripts and Jupyter Notebooks (please see main and/or `pip install --upgrade napari-cellseg3d`’ the current is version 0.2.1). Of course this means the original submission code (May 2024) will not have this in the GUI so it would require you to update to test this. Alternatively, you can see the demo video we now provide for ease: https://www.youtube.com/watch?v=U2a9IbiO7nE (we understand testing code takes a lot of time and commitment).

      We greatly thank the review for their time, and we hope our clarifications, new benchmarking, and re-write of the paper now makes them able to change their assessment from incomplete to a more favorable and reflective eLife adjective.

      Reviewer #2 (Public review):

      Summary:

      The authors propose a new method for self-supervised learning of 3d semantic segmentation for fluorescence microscopy. It is based on a WNet architecture (Encoder / Decoder using a UNet for each of these components) that reconstructs the image data after binarization in the bottleneck with a soft n-cuts clustering. They annotate a new dataset for nucleus segmentation in mesoSPIM imaging and train their model on this dataset. They create a napari plugin that provides access to this model and provides additional functionality for training of own models (both supervised and self-supervised), data labeling and instance segmentation via post-processing of the semantic model predictions. This plugin also provides access to models trained on the contributed dataset in a supervised fashion.

      Strengths:

      -  The idea behind the self-supervised learning loss is interesting.

      -  It provides a new annotated dataset for an important segmentation problem.

      -  The paper addresses an important challenge. Data annotation is very time-consuming for 3d microscopy data, so a self-supervised method that yields similar results to supervised segmentation would provide massive benefits.

      -  The comparison to other methods on the provided dataset is extensive and experiments are reproducible via public notebooks.

      Weaknesses:

      The experiments presented by the authors support the core claims made in the paper. However, they do not convincingly prove that the method is applicable to segmentation problems with more complex morphologies or more crowded cells/nuclei.

      Major weaknesses:

      (1) The method only provides functionality for semantic segmentation outputs and instance segmentation is obtained by morphological post-processing. This approach is well known to be of limited use for segmentation of crowded objects with complex morphology. This is the main reason for prediction of additional channels such as in StarDist or CellPose. The experiments do not convincingly show that this limitation can be overcome as model comparisons are only done on a single dataset with well separated nuclei with simple morphology. Note that the method and dataset are still a valuable contribution with this limitation, which is somewhat addressed in the conclusion. However, I find that the presentation is still too favorable in terms of the presentation of practical applications of the method, see next points for details.

      Thank you for noting the methods strengths and core features. Regarding weaknesses, we have revised the manuscript again and added direct benchmarking now on four datasets and a fifth “worked example” (https://www.youtube.com/watch?v=3UOvvpKxEAo&t=4s) in a new Figure 4.

      We also re-wrote the paper to more thoroughly present the work (previously we adhered to the “Brief Communication” eLife format), and added an explicit note in the results about model assumptions.

      (2) The experimental set-up for the additional datasets seems to be unrealistic as hyperparameters for instance segmentation are derived from a grid search and it is unclear how a new user could find good parameters in the plugin without having access to already annotated ground-truth data or an extensive knowledge of the underlying implementations.

      We agree that of course with any self-supervised method the user will need a sense of what a good outcome looks like; that is why we provide Google Colab Notebooks

      (https://github.com/AdaptiveMotorControlLab/CellSeg3D/tree/main/notebooks) and the napari-plugin GUI for extensive visualization and even the ability to manually correct small subsets of the data and refine the WNet3D model.

      We attempted to make this more clear with a new Figure 2 and additional functionality directly into the plugin (such as the grid search). But, we believe this “trade-off” for SSL approaches over very labor intensive 3D labeling is often worth it; annotators are also biased so extensive checking of any GT data is equally required.

      We also added the “grid search” functionality in the GUI (please `pip install --upgrade napari-cellseg3d`; the latest v0.2.1) to supplement the previously shared Notebook (https://github.com/C-Achard/cellseg3d-figures/blob/main/thresholds_opti/find_best_threshold s.ipynb) and added a new YouTube video: https://www.youtube.com/watch?v=xYbYqL1KDYE.

      (3) Obtaining segmentation results of similar quality as reported in the experiments within the napari plugin was not possible for me. I tried this on the "MouseSkull" dataset that was also used for the additional results in the paper.

      Again we are sorry this did not work for you, but we added new functionality in the GUI and made a demo video (https://www.youtube.com/watch?v=U2a9IbiO7nE) where you either update your CellSeg3D code or watch the video to see how we obtained these results.

      Here, I could not find settings in the "Utilities->Convert to instance labels" widget that yielded good segmentation quality and it is unclear to me how a new user could find good parameter settings. In more detail, I cannot use the "Voronoi-Otsu" method due to installation issues that are prohibitive for a non expert user and the "Watershed" segmentation method yields a strong oversegmentation.

      Sorry to hear of the installation issue with Voronoi-Otsu; we updated the documentation and the GUI to hopefully make this easier to install. While we do not claim this code is for beginners, we do aim to be a welcoming community, thus we provide support on GitHub, extensive docs, videos, the GUI, and Google Colab Notebooks to help users get started.

      Comments on revised version

      Many of my comments were addressed well:

      -  It is now clear that the results are reproducible as they are well documented in the provided notebooks, which are now much more prominently referenced in the text.

      Thanks!

      -  My concerns about an unfair evaluation compared to CellPose and StarDist were addressed. It is now clear that the experiments on the mesoSPIM dataset are extensive and give an adequate comparison of the methods.

      Thank you; to note we additionally added benchmarking of Cellpose and StarDist on the three additional datasets (for R1), but hopefully this serves to also increase your confidence in our approach.

      -  Several other minor points like reporting of the evaluation metric are addressed.

      I have changed my assessment of the experimental evidence to incomplete/solid and updated the review accordingly. Note that some of my main concerns with the usability of the method for segmentation tasks with more complex morphology / more crowded cells and with the napari plugin still persist. The main points are (also mentioned in Weaknesses, but here with reference to the rebuttal letter):

      - Method comparison on datasets with more complex morphology etc. are missing. I disagree that it is enough to do this on one dataset for a good method comparison.

      We benchmarked WNet3D (our contribution) on four datasets, and to aid the readers we additionally now added Cellpose and StarDist benchmarking on all four. WNet3D performs favorably, even on the crowded and complex Mouse Skull data. See the new Figure 3 as well as the associated video: https://www.youtube.com/watch?v=U2a9IbiO7nE&t=1s.

      -  The current presentation still implies that CellSeg3d **and the napari plugin** work well for a dataset with complex nucleus morphology like the Mouse Skull dataset. But I could not get this to work with the napari plugin, see next points.

      - First, deriving hyperparameters via grid search may lead to over-optimistic evaluation results. How would a user find these parameters without having access to ground-truth? Did you do any experiments on the robustness of the parameters?

      -  In my own experiments I could not do this with the plugin. I tried this again, but ran into the same problems as last time: pyClesperanto does not work for me. The solution you link requires updating openCL drivers and the accepted solution in the forum post is "switch to a different workstation".

      We apologize for the confusion here; the accepted solution (not accepted by us) was user specific as they switched work stations and it worked, so that was their solution. Other comments actually solved the issue as well. For ease this package can be installed on Google Colab (here is the link from our repo for ease: https://colab.research.google.com/github/AdaptiveMotorControlLab/CellSeg3d/blob/main/not ebooks/Colab_inference_demo.ipynb) where pyClesperanto can be installed via: !pip install pyclesperanto-prototype without issue on Google Colab.

      This a) goes beyond the time I can invest for a review and b) is unrealistic to expect computationally inexperienced users to manage. Then I tried with the "watershed" segmentation, but this yields a strong oversegmentation no matter what I try, which is consistent with the predictions that look like a slightly denoised version of the input images and not like a proper foreground-background segmentation. With respect to the video you provide: I would like to see how a user can do this in the plugin without having a prior knowledge on good parameters or just pasting code, which is again not what you would expect a computationally unexperienced user to do.

      We agree with the reviewer that the user needs domain knowledge, but we never claim our method was for inexperienced users. Our main goal was to show a new computer vision method with self-supervised learning (WNet3D) that works on LSM and confocal data for cell nuclei. To this end, we made you a demo video to show how a user can visually perform a thresholding check https://www.youtube.com/watch?v=xYbYqL1KDYE&t=5s, and we added all of these new utilities to the GUI, thanks for the suggestion. Otherwise, the threshold can also be done in a Notebook (as previously noted).

      I acknowledge that some of these points are addressed in the limitations, but the text still implies that it is possible to get good segmentation results for such segmentation problems: "we believe that our self-supervised semantic segmentation model could be applied to more challenging data as long as the above limitations are taken into account." From my point of view the evidence for this is still lacking and would need to be provided by addressing the points raised above for me to further raise the Incomplete/solid rating, especially showing how this can be done wit the napari plugin. As an alternative, I would also consider raising it if the claims are further reduced and acknowledge that the current version of the method is only a good method for well separated nuclei.

      We hope our new benchmarking and clear demo on four datasets helps improve your confidence in our evidence in our approach. We also refined our over text and hope our contributions, the limitations and the advantages are now more clear.

      I understand that this may be frustrating, but please put yourself in the role of a new reader of this work: the impression that is made is that this is a method that can solve 3D segmentation tasks in light-sheet microscopy with unsupervised learning. This would be a really big achievement! The wording in the limitation section sounds like strategic disclaimers that imply that it is still possible to do this, just that it wasn't tested enough.

      But, to the best of my assessment, the current version of the method only enables the more narrow case of well separated nuclei with a simple morphology. This is still a quite meaningful achievement, but more limited than the initial impression. So either the experimental evidence needs to be improved, including a demonstration how to achieve this in practice, including without deriving parameters via grid-search and in the plugin, or the claim needs to be meaningfully toned down.

      Thanks for raising this point; we do think that WNet3D and the associated CellSeg3D package - aimed to continue to integrate state of the art models, is a non-trivial step forward. Have we completely solved the problem, certainly not, but given the limited 3D cell segmentation tools that exist, we hope this, coupled with our novel 3D dataset, pushes the field forward. We don’t show it works on the narrow well-separated use case, but rather show this works even better than supervised models on the very challenging benchmark Mouse Skull. Given we now show evidence that we outperform or match supervised algorithms with an unsupervised approach, we respectfully do think this is a noteworthy achievement. Thank you for your time in assessing our work.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study addresses the question of how task-relevant sensory information affects activity in the motor cortex. The authors use various approaches to address this question, looking at single units and population activity. They find that there are three subtypes of modulation by sensory information at the single unit level. Population analyses reveal that sensory information affects the neural activity orthogonally to motor output. The authors then compare both single unit and population activity to computational models to investigate how encoding of sensory information at the single unit level is coordinated in a network. They find that an RNN that displays similar orbital dynamics and sensory modulation to the motor cortex also contains nodes that are modulated similarly to the three subtypes identified by the single unit analysis.

      Strengths:

      The strengths of this study lie in the population analyses and the approach of comparing single-unit encoding to population dynamics. In particular, the analysis in Figure 3 is very elegant and informative about the effect of sensory information on motor cortical activity.

      The task is also well designed to suit the questions being asked and well controlled.

      We appreciate these kind comments.

      It is commendable that the authors compare single units to population modulation. The addition of the RNN model and perturbations strengthen the conclusion that the subtypes of individual units all contribute to the population dynamics. However, the subtypes (PD shift, gain, and addition) are not sufficiently justified. The authors also do not address that single units exhibit mixed modulation, but RNN units are not treated as such.

      We’re sorry that we didn’t provide sufficient grounds to introduce the subtypes. We have updated this in the revised manuscript, in Lines 102-104 as:

      “We determined these modulations on the basis of the classical cosine tuning model (Georgopoulos et al., 1982) and several previous studies (Bremner and Andersen, 2012; Pesaran et al., 2010; Sergio et al., 2005).”

      In our study, we applied the subtype analysis as a criterion to identify the modulation in neuron populations, rather than sorting neurons into exclusively different cell types.

      Weaknesses:

      The main weaknesses of the study lie in the categorization of the single units into PD shift, gain, and addition types. The single units exhibit clear mixed selectivity, as the authors highlight. Therefore, the subsequent analyses looking only at the individual classes in the RNN are a little limited. Another weakness of the paper is that the choice of windows for analyses is not properly justified and the dependence of the results on the time windows chosen for single-unit analyses is not assessed. This is particularly pertinent because tuning curves are known to rotate during movements (Sergio et al. 2005 Journal of Neurophysiology).

      In our study, the mixed selectivity or specifically the target-motion modulation on reach- direction tuning is a significant feature of the single neurons. We categorized the neurons into three subclasses, not intending to claim their absolute cell types, but meaning to distinguish target-motion modulation patterns. To further characterize these three patterns, we also investigated their interaction by perturbing connection weights in RNN.

      Yes, it’s important to consider the role of rotating tuning curves in neural dynamics during interception. In our case, we observed population neural state with sliding windows, and we focused on the period around movement onset (MO) due to the unexpected ring-like structure and the highest decoding accuracy of transferred decoders (Figure S7C). Then, the single-unit analyses were implemented.

      This paper shows sensory information can affect motor cortical activity whilst not affecting motor output. However, it is not the first to do so and fails to cite other papers that have investigated sensory modulation of the motor cortex (Stavinksy et al. 2017 Neuron, Pruszynski et al. 2011 Nature, Omrani et al. 2016 eLife). These studies should be mentioned in the Introduction to capture better the context around the present study. It would also be beneficial to add a discussion of how the results compare to the findings from these other works.

      Thanks for the reminder. We’ve introduced these relevant researches in the updated manuscript in Lines 422-426 as:

      “To further clarify, the discussing target-motion effect is different from the sensory modulation in action selection (Cisek and Kalaska, 2005), motor planning (Pesaran et al., 2006), visual replay and somatosensory feedback (Pruszynski et al., 2011; Stavisky et al., 2017; Suway and Schwartz, 2019; Tkach et al., 2007), because it occurred around movement onset and in predictive control trial-by-trial.”

      This study also uses insights from single-unit analysis to inform mechanistic models of these population dynamics, which is a powerful approach, but is dependent on the validity of the single-cell analysis, which I have expanded on below.

      I have clarified some of the areas that would benefit from further analysis below:

      (1) Task:

      The task is well designed, although it would have benefited from perhaps one more target speed (for each direction). One monkey appears to have experienced one more target speed than the others (seen in Figure 3C). It would have been nice to have this data for all monkeys.

      A great suggestion; however, it is hardly feasible as the Utah arrays have already been removed.

      (2) Single unit analyses:

      In some analyses, the effects of target speed look more driven by target movement direction (e.g. Figures 1D and E). To confirm target speed is the main modulator, it would be good to compare how much more variance is explained by models including speed rather than just direction. More target speeds may have been helpful here too.

      A nice suggestion. The fitting goodness of the simple model (only movement direction) is much worse than the complex models (including target speed). We’ve updated the results in the revised manuscript in Lines 119-122, as “We found that the adjusted R2 of a full model (0.55 ± 0.24, mean ± sd.) can be higher than that of the PD shift (0.47 ± 0.24), gain (0.46 ± 0.22), additive (0.41 ± 0.26), and simple models (only reach direction, 0.34 ± 0.25) for three monkeys (1162 neurons, ranksum test, one-tailed, p<0.01, Figure S5).”

      The choice of the three categories (PD shift, gain addition) is not completely justified in a satisfactory way. It would be nice to see whether these three main categories are confirmed by unsupervised methods.

      A good point. It is a pity that we haven’t found an appropriate unsupervised method.

      The decoder analyses in Figure 2 provide evidence that target speed modulation may change over the trial. Therefore, it is important to see how the window considered for the firing rate in Figure 1 (currently 100ms pre - 100ms post movement onset) affects the results.

      Thanks for the suggestion and close reading. Because the movement onset (MO) is the key time point of this study, we colored this time period in Figure 1 to highlight the perimovement neuronal activity.

      (3) Decoder:

      One feature of the task is that the reach endpoints tile the entire perimeter of the target circle (Figure 1B). However, this feature is not exploited for much of the single-unit analyses. This is most notable in Figure 2, where the use of a SVM limits the decoding to discrete values (the endpoints are divided into 8 categories). Using continuous decoding of hand kinematics would be more appropriate for this task.

      This is a very reasonable suggestion. In the revised manuscript, we’ve updated the continuous decoding results with support vector regression (SVR) in Figure S7A and in Lines 170-173 as:

      “These results were stable on the data of the other two monkeys and the pseudopopulation of all three monkeys (Figure S6) and reconfirmed by the continuous decoding results with support vector regressions (Figure S7A), suggesting that target motion information existed in M1 throughout almost the entire trial.”

      (4) RNN:

      Mixed selectivity is not analysed in the RNN, which would help to compare the model to the real data where mixed selectivity is common. Furthermore, it would be informative to compare the neural data to the RNN activity using canonical correlation or Procrustes analyses. These would help validate the claim of similarity between RNN and neural dynamics, rather than allowing comparisons to be dominated by geometric similarities that may be features of the task. There is also an absence of alternate models to compare the perturbation model results to.

      Thank you for these helpful suggestions. We have performed decoding analysis on RNN units and updated in Figure S12A and Lines 333-334 as: “First, from the decoding result, target motion information existed in nodes’ population dynamics shortly after TO (Figure S12A).”

      We also have included the results of canonical correlation analysis and Procrustes analysis in Table S2 and Lines 340-342 as: “We then performed canonical component analysis (CCA) and Procrustes analysis (Table S2; see Methods), the results also indicated the similarity between network dynamics and neural dynamics.”

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Zhang et al. examine neural activity in the motor cortex as monkeys make reaches in a novel target interception task. Zhang et al. begin by examining the single neuron tuning properties across different moving target conditions, finding several classes of neurons: those that shift their preferred direction, those that change their modulation gain, and those that shift their baseline firing rates. The authors go on to find an interesting, tilted ring structure of the neural population activity, depending on the target speed, and find that (1) the reach direction has consistent positioning around the ring, and (2) the tilt of the ring is highly predictive of the target movement speed. The authors then model the neural activity with a single neuron representational model and a recurrent neural network model, concluding that this population structure requires a mixture of the three types of single neurons described at the beginning of the manuscript.

      Strengths:

      I find the task the authors present here to be novel and exciting. It slots nicely into an overall trend to break away from a simple reach-to-static-target task to better characterize the breadth of how the motor cortex generates movements. I also appreciate the movement from single neuron characterization to population activity exploration, which generally serves to anchor the results and make them concrete. Further, the orbital ring structure of population activity is fascinating, and the modeling work at the end serves as a useful baseline control to see how it might arise.

      Thank you for your recognition of our work.

      Weaknesses:

      While I find the behavioral task presented here to be excitingly novel, I find the presented analyses and results to be far less interesting than they could be. Key to this, I think, is that the authors are examining this task and related neural activity primarily with a singleneuron representational lens. This would be fine as an initial analysis since the population activity is of course composed of individual neurons, but the field seems to have largely moved towards a more abstract "computation through dynamics" framework that has, in the last several years, provided much more understanding of motor control than the representational framework has. As the manuscript stands now, I'm not entirely sure what interpretation to take away from the representational conclusions the authors made (i.e. the fact that the orbital population geometry arises from a mixture of different tuning types). As such, by the end of the manuscript, I'm not sure I understand any better how the motor cortex or its neural geometry might be contributing to the execution of this novel task.

      This paper shows the sensory modulation on motor tuning in single units and neural population during motor execution period. It’s a pity that the findings were constrained in certain time windows. We are still working on this task, please look forward to our following work.

      Main Comments:

      My main suggestions to the authors revolve around bringing in the computation through a dynamics framework to strengthen their population results. The authors cite the Vyas et al. review paper on the subject, so I believe they are aware of this framework. I have three suggestions for improving or adding to the population results:

      (1) Examination of delay period activity: one of the most interesting aspects of the task was the fact that the monkey had a random-length delay period before he could move to intercept the target. Presumably, the monkey had to prepare to intercept at any time between 400 and 800 ms, which means that there may be some interesting preparatory activity dynamics during this period. For example, after 400ms, does the preparatory activity rotate with the target such that once the go cue happens, the correct interception can be executed? There is some analysis of the delay period population activity in the supplement, but it doesn't quite get at the question of how the interception movement is prepared. This is perhaps the most interesting question that can be asked with this experiment, and it's one that I think may be quite novel for the field--it is a shame that it isn't discussed.

      It’s a great idea! We are on the way, and it seems promising.

      (2) Supervised examination of population structure via potent and null spaces: simply examining the first three principal components revealed an orbital structure, with a seemingly conserved motor output space and a dimension orthogonal to it that relates to the visual input. However, the authors don't push this insight any further. One way to do that would be to find the "potent space" of motor cortical activity by regression to the arm movement and examine how the tilted rings look in that space (this is actually fairly easy to see in the reach direction components of the dPCA plot in the supplement--the rings will be highly aligned in this space). Presumably, then, the null space should contain information about the target movement. dPCA shows that there's not a single dimension that clearly delineates target speed, but the ring tilt is likely evident if the authors look at the highest variance neural dimension orthogonal to the potent space (the "null space")-this is akin to PC3 in the current figures, but it would be nice to see what comes out when you look in the data for it.

      Thank you for this nice suggestion. While it was feasible to identify potent subspaces encoding reach direction and null spaces for target-velocity modulation, as suggested by the reviewer, the challenge remained that unsupervised methods were insufficient to isolate a pure target-velocity subspace from numerous possible candidates due to the small variance of target-velocity information. Although dPCA components can be used to construct orthogonal subspaces for individual task variables, we found that the targetvelocity information remained highly entangled with reach-direction representation. More details can be found in Figure S8C and its caption as below:

      “We used dPCA components with different features to construct three subspaces (same data in A, reach-direction space #3, #4, #5; target-velocity space #10, #15, #17; interaction space #6, #11, #12), and we projected trial-averaged data into these orthogonal subspaces using different colormaps. This approach allowed us to obtain a “potent subspace” coding reach direction and a “null space” for target velocity. The results showed that the reach-direction subspace effectively represented the reach direction. However, while the target-velocity subspace encoded the target velocity information, it still contained reach-direction clusters within each target-velocity condition, corroborating the results of the addition model in the main text (Figure 4). The interaction subspace revealed that multiple reach-direction rings were nested within each other, similar to the findings from the gain model (Figure 3 & 4). The interaction subspace also captured more variance than target-velocity subspace, consistent with our PCA results, suggesting the target-velocity modulation primarily coexists with reach-direction coding. Furthermore, we explored alternative methods to verify whether orthogonal subspaces could effectively separate the reach direction and target velocity. We could easily identify the reach-direction subspace, but its orthogonal subspace was relatively large, and the target-velocity information exhibited only small variance, making it difficult to isolate a subspace that purely encodes target velocity.”

      (3) RNN perturbations: as it's currently written, the RNN modeling has promise, but the perturbations performed don't provide me with much insight. I think this is because the authors are trying to use the RNN to interpret the single neuron tuning, but it's unclear to me what was learned from perturbing the connectivity between what seems to me almost arbitrary groups of neurons (especially considering that 43% of nodes were unclassifiable). It seems to me that a better perturbation might be to move the neural state before the movement onset to see how it changes the output. For example, the authors could move the neural state from one tilted ring to another to see if the virtual hand then reaches a completely different (yet predictable) target. Moreover, if the authors can more clearly characterize the preparatory movement, perhaps perturbations in the delay period would provide even more insight into how the interception might be prepared.

      We are sorry that we did not clarify the definition of “none” type, which can be misleading. The 43% unclassifiable nodes include those inactive ones; when only activate (taskrelated) nodes included, the ratio of unclassifiable nodes would be much lower. We recomputed the ratios with only activated units and have updated Table 1. By perturbing the connectivity, we intended to explore the interaction between different modulations.

      Thank you for the great advice. We considered moving neural states from one ring to another without changing the directional cluster. However, we found that this perturbation design might not be fully developed: since the top two PCs are highly correlated with movement direction, such a move—similar to exchanging two states within the same cluster but under different target-motion conditions—would presumably not affect the behavior.

      Reviewer #3 (Public Review):

      Summary:

      This experimental study investigates the influence of sensory information on neural population activity in M1 during a delayed reaching task. In the experiment, monkeys are trained to perform a delayed interception reach task, in which the goal is to intercept a potentially moving target.

      This paradigm allows the authors to investigate how, given a fixed reach endpoint (which is assumed to correspond to a fixed motor output), the sensory information regarding the target motion is encoded in neural activity.

      At the level of single neurons, the authors found that target motion modulates the activity in three main ways: gain modulation (scaling of the neural activity depending on the target direction), shift (shift of the preferred direction of neurons tuned to reach direction), or addition (offset to the neural activity).

      At the level of the neural population, target motion information was largely encoded along the 3rd PC of the neural activity, leading to a tilt of the manifold along which reach direction was encoded that was proportional to the target speed. The tilt of the neural manifold was found to be largely driven by the variation of activity of the population of gain-modulated neurons.

      Finally, the authors studied the behaviour of an RNN trained to generate the correct hand velocity given the sensory input and reach direction. The RNN units were found to similarly exhibit mixed selectivity to the sensory information, and the geometry of the “ neural population” resembled that observed in the monkeys.

      Strengths:

      - The experiment is well set up to address the question of how sensory information that is directly relevant to the behaviour but does not lead to a direct change in behavioural output modulates motor cortical activity.

      - The finding that sensory information modulates the neural activity in M1 during motor preparation and execution is non trivial, given that this modulation of the activity must occur in the nullspace of the movement.

      - The paper gives a complete picture of the effect of the target motion on neural activity, by including analyses at the single neuron level as well as at the population level. Additionally, the authors link those two levels of representation by highlighting how gain modulation contributes to shaping the population representation.

      Thank you for your recognition.

      Weaknesses:

      - One of the main premises of the paper is the fact that the motor output for a given reach point is preserved across different target motions. However, as the authors briefly mention in the conclusion, they did not record muscle activity during the task, but only hand velocity, making it impossible to directly verify how preserved muscle patterns were across movements. While the authors highlight that they did not see any difference in their results when resampling the data to control for similar hand velocities across conditions, this seems like an important potential caveat of the paper whose implications should be discussed further or highlighted earlier in the paper.

      Thanks for the suggestion. We’ve highlighted the resampling results as an important control in the revised manuscript in Figure S11 and Lines 257-260 as:

      “To eliminate hand-speed effect, we resampled trials to construct a new dataset with similar distributions of hand speed in each target-motion condition and found similar orbital neural geometry. Moreover, the target-motion gain model provided a better explanation compared to the hand-speed gain model (Figure S11).”

      - The main takeaway of the RNN analysis is not fully clear. The authors find that an RNN trained given a sensory input representing a moving target displays modulation to target motion that resembles what is seen in real data. This is interesting, but the authors do not dissect why this representation arises, and how robust it is to various task design choices. For instance, it appears that the network should be able to solve the task using only the motion intention input, which contains the reach endpoint information. If the target motion input is not used for the task, it is not obvious why the RNN units would be modulated by this input (especially as this modulation must lie in the nullspace of the movement hand velocity if the velocity depends only on the reach endpoint). It would thus be important to see alternative models compared to true neural activity, in addition to the model currently included in the paper. Besides, for the model in the paper, it would therefore be interesting to study further how the details of the network setup (eg initial spectral radius of the connectivity, weight regularization, or using only the target position input) affect the modulation by the motion input, as well as the trained population geometry and the relative ratios of modulated cells after training.

      Great suggestions. In the revised manuscript, we’ve added the results of three alternative modes in Table S4 and Lines 355-365 as below:

      “We also tested three alternative network models: (1) only receives motor intention and a GO-signal; (2) only receives target location and a GO-signal; (3) initialized with sparse connection (sparsity=0.1); the unmentioned settings and training strategies were as the same as those for original models (Table S4; see Methods). The results showed that the three modulations could emerge in these models as well, but with obviously distinctive distributions. In (1), the ring-like structure became overlapped rings parallel to the PC1PC2 plane or barrel-like structure instead; in (2), the target-motion related tilting tendency of the neural states remained, but the projection of the neural states on the PC1-PC2 plane was distorted and the reach-direction clusters dispersed. These implies that both motor intention and target location seem to be needed for the proposed ring-like structure. The initialization of connection weights of the hidden layer can influence the network’s performance and neural state structure, even so, the ring-like structure”

      - Additionally, it is unclear what insights are gained from the perturbations to the network connectivity the authors perform, as it is generally expected that modulating the connectivity will degrade task performance and the geometry of the responses. If the authors wish the make claims about the role of the subpopulations, it could be interesting to test whether similar connectivity patterns develop in networks that are not initialized with an all-to-all random connectivity or to use ablation experiments to investigate whether the presence of multiple types of modulations confers any sort of robustness to the network.

      Thank you for these great suggestions. By perturbations, we intended to explore the contribution of interaction between certain subpopulations. We’ve included the ablation experiments in the updated manuscript in Table S3 and Lines 344-346 as below: “The ablation experiments showed that losing any kind of modulation nodes would largely deteriorate the performance, and those nodes merely with PD-shift modulation could mostly impact the neural state structure (Table S3).”

      - The results suggest that the observed changes in motor cortical activity with target velocity result from M1 activity receiving an input that encodes the velocity information. This also appears to be the assumption in the RNN model. However, even though the input shown to the animal during preparation is indeed a continuously moving target, it appears that the only relevant quantity to the actual movement is the final endpoint of the reach. While this would have to be a function of the target velocity, one could imagine that the computation of where the monkeys should reach might be performed upstream of the motor cortex, in which case the actual target velocity would become irrelevant to the final motor output. This makes the results of the paper very interesting, but it would be nice if the authors could discuss further when one might expect to see modulation by sensory information that does not directly affect motor output in M1, and where those inputs may come from. It may also be interesting to discuss how the findings relate to previous work that has found behaviourally irrelevant information is being filtered out from M1 (for instance, Russo et al, Neuron 2020 found that in monkeys performing a cycling task, context can be decoded from SMA but not from M1, and Wang et al, Nature Communications 2019 found that perceptual information could not be decoded from PMd)?

      How and where sensory information modulating M1 are very interesting and open questions. In the revised manuscript, we discuss these in Lines 435-446, as below: “It would be interesting to explore whether other motor areas also allow sensory modulation during flexible interception. The functional differences between M1 and other areas lead to uncertain speculations. Although M1 has pre-movement activity, it is more related to task variables and motor outputs. Recently, a cycling task sets a good example that the supplementary motor area (SMA) encodes context information and the entire movement (Russo et al., 2020), while M1 preferably relates to cycling velocity (Saxena et al., 2022). The dorsal premotor area (PMd) has been reported to capture potential action selection and task probability, while M1 not (Cisek and Kalaska, 2005; Glaser et al., 2018; Wang et al., 2019). If the neural dynamics of other frontal motor areas are revealed, we might be able to tell whether the orbital neural geometry of mixed selectivity is unique in M1, or it is just inherited from upstream areas like PMd. Either outcome would provide us some insights into understanding the interaction between M1 and other frontal motor areas in motor planning.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      At times the writing was a little hard to parse. It could benefit from being fleshed out a bit to link sentences together better.

      There are a few grammatical errors, such as:

      "These results support strong and similar roles of gain and additive nodes, but what is even more important is that the three modulations interact each other, so the PD-shift nodes should not be neglected."

      should be

      "These results support strong and similar roles of gain and additive nodes, but what is even more important is that the three modulations interact WITH each other, so the PDshift nodes should not be neglected."

      The discussion could also be more extensive to benefit non-experts in the field.

      Thank you. We have proofread and polished the updated manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Other comments:

      - The authors mention mixed selectivity a few times, but Table 1 doesn't have a column for mixed selective neurons--this seems like an important oversight. Likewise, it would be good to see an example of a "mixed" neuron.

      - The structure of the writing in the results section often talked about the supplementary results before the main results - this seems backwards. If the supplementary results are important enough to come before the main figures, then they should not be supplementary. Otherwise, if the results are truly supplementary, they should come after the main results are discussed.

      - Line 305: Authors say "most" RNN units could be classified, and this is technically true, but only barely, according to Table 1. It might be good to put the actual percentage here in the text.

      - Figure 5a: typo ("Motion intention" rather than "Motor")

      - I couldn't find any mention of code or data availability in the manuscript.

      - There were a number of lines that didn't make much sense to me and should probably be rewritten or expanded on:

      - Lines 167-168: "These results qualitatively imply the interaction as that target speeds..." - Lines 178-179: "However, these neural trajectories were not yet the ideal description, because they were shaped mostly by time."

      - Lines 187-188: "...suggesting that target motion affects M1 neural dynamics via a topologically invariant transformation."

      - Lines 224-226: "Note that here we performed an linear transformation on all resulting neural state points to make the ellipse of the static condition orthogonal to the z-axis for better visualization." Does this mean that the z-axis is not PC 3 anymore?

      - Lines 272-274: "These simulations suggest that the existence of PD-shift and additive modulation would not disrupt the neural geometry that is primarily driven by gain modulation; rather it is possible that these three modulations support each other in a mixed population."

      Thank you for these detailed suggestions. By “mixed selectivity”, we mean the joint tuning of both target-motion and movement. In this case, the target-motion modulated neurons (regardless of the modulation type) are of mixed selectivity. The term “motor intention” refers to Mazzoni et al., 1996, Journal of Neurophysiology. We also revised the manuscript for better readership.

      We have updated the data and code availability in Data availability as below:

      “The example experimental datasets and relevant analysis code have been deposited in Mendeley Data at https://data.mendeley.com/datasets/8gngr6tphf. The RNN relevant code and example model datasets are available at https://github.com/yunchenyc/RNN_ringlike_structure.“

      Reviewer #3 (Recommendations For The Authors):

      Minor typos:

      Line 153: “there were”

      Line 301: “network was trained to generate”

      Line 318: “interact with each other”

      Suggested reformulations :

      Line 310 : “tilting angles followed a pattern similar to that seen in the data” Line 187 : the claim of a “topologically invariant transformation” seems strong as the analysis is quite qualitative.

      Suggested changes to the paper (aside from those mentioned in the main review): It could be nice to show behaviour in a main figure panel early on in the paper. This could help with the task description (as it would directly show how the trials are separated based on endpoint) and could allow for discussing the potential caveats of the assumption that behaviour is preserved.

      Thank you. We have corrected these typos and writing problems. As the similar task design has been reported, we finally decided not to provide extra figures or videos. Still, we thank this nice suggestion.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript by Thronlow Lamson et al., the authors develop a "beads-on-a-string" or BOAS strategy to link diverse hemagglutinin head domains, to elicit broadly protective antibody responses. The authors are able to generate varying formulations and lengths of the BOAS and immunization of mice shows induction of antibodies against a broad range of influenza subtypes. However, several major concerns are raised, including the stability of the BOAS, that only 3 mice were used for most immunization experiments, and that important controls and analyses related to how the BOAS alone, and not the inclusion of diverse heads, impacts humoral immunity.

      Strengths:

      Vaccine strategy is new and exciting.

      Analyses were performed to support conclusions and improve paper quality.

      Weaknesses:

      Controls for how different hemagglutinin heads impact immunity versus the multivalency of the BOAS.

      Only 3 mice were used for most experiments.

      There were limited details on size exclusion data.

      We appreciate the reviewer’s comments and have made the following changes to the manuscript.

      (1) We recognize that deconvoluting the effect of including a diverse set of HA heads and multivalency in the BOAS immunogens is necessary to understand the impact on antigenicity. Therefore, we now include a cocktail of the identical eight HA heads used in the 8-mer and BOAS nanoparticle (NP) as an additional control group. While we observed similar HA binding titers relative to the 8-mer and BOAS NP groups, the cocktail group-elicited sera was unable to neutralize any of the viruses tested; multivalency thus appears to be important for eliciting neutralizing responses

      (2) We increased the sample size by repeated immunizations with n=5 mice, for a total of n=8 mice across two independent experiments.

      (3) We expanded the details on size exclusion data to include:

      a) extended chromatograms from Figure 2C as Supplemental Figure 3.

      b) additional details in the materials and methods section (lines 370-372):

      “Recovered proteins were then purified on a Superdex 200 (S200) Increase 10/300 GL (for trimeric HAs) or Superose 6 Increase 10/300 GL (for BOAS) size-exclusion column in Dulbecco’s Phosphate Buffered Saline (DPBS) within 48 hours of cobalt resin elution.”

      Reviewer #2 (Public Review):

      Summary:

      The authors describe a "beads-on-a-string" (BOAS) immunogen, where they link, using a non-flexible glycine linker, up to eight distinct hemagglutinin (HA) head domains from circulating and non-circulating influenzas and assess their immunogenicity. They also display some of their immunogens on ferritin NP and compare the immunogenicity. They conclude that this new platform can be useful to elicit robust immune responses to multiple influenza subtypes using one immunogen and that it can also be used for other viral proteins.

      Strengths:

      The paper is clearly written. While the use of flexible linkers has been used many times, this particular approach (linking different HA subtypes in the same construct resembling adding beads on a string, as the authors describe their display platform) is novel and could be of interest.

      Weaknesses:

      The authors did not compare to individuals HA ionized as cocktails and did not compare to other mosaic NP published earlier. It is thus difficult to assess how their BOAS compare.<br /> Other weaknesses include the rationale as to why these subtypes were chosen and also an explanation of why there are different sizes of the HA1 construct (apart from expression). Have the authors tried other lengths? Have they expressed all of them as FL HA1?

      We appreciate the reviewer’s comments. We responded to the concerns below and modified the manuscript accordingly.

      (1) We recognize that including a “cocktail” control is important to understand how the multivalency present in a single immunogen affects the immune response. We now include an additional control group comprised of a mixture of the same eight HA heads used in the 8-mer and the BOAS nanoparticle (NP). While this cocktail elicited similar HA binding titers relative to the 8-mer and BOAS NP immunogens (Fig. 6G), there was no detectable neutralization any of the viruses tested (Fig. 7).

      (2) In the introduction we reference other multivalent display platforms but acknowledge that distinct differences in their immunogen design platforms make direct comparisons to ours difficult—which is ultimately why we did not use them as comparators for our in vivo studies. Perhaps most directly relevant to our BOAS platform is the mosaic HA NP from Kanekiyo et al. (PMID 30742080). Here, HA heads, with similar boundaries to ours, were selected from historical H1N1 strains. These NPs however were significantly less antigenic diverse relative to our BOAS NPs as they did not include any group 2 (e.g., H7, H9) or B influenza HAs; restricting their multivalent display to group 1 H1N1s likely was an important factor in how they were able to achieve broad, neutralizing H1N1 responses. Additionally, Cohen et al. (PMID 33661993) used similarly antigenically distinct HAs in their mosaic NP, though these included full-length HAs with the conserved stem region, which likely has a significant impact on the elicited cross-reactive responses observed. Lastly, we reference Hills et al. (PMID 38710880), where authors designed similar NPs with four tandemly-linked betacoronoavirus receptor binding domains (RBDs) to make “quartets”. In contrast to our observations, the authors observed increased binding and neutralization titers following conjugation to protein-based NPs. We acknowledge potential differences between the studies, such as the antigen and larger VLP NP, that could lead to the different observed outcomes.

      (3) We intended to highlight the “plug-and-play” nature of the BOAS platform; theoretically any HA subtype could be interchanged into the BOAS. To that end, our rationale for selecting the HA subtypes in our proof-of-principle immunogen was to include an antigenically diverse set of circulating and non-circulating HAs that we could ultimately characterize with previously published subtype-specific antibodies that were also conformation-specific. In doing so, these diagnostic antibodies could confirm presence and conformation integrity of each component. We intentionally did not include HA subtypes that we did not have a conformation-specific antibody for.

      The different sizes of HA head domains was determined exclusively by expression of the recombinant protein. We have not attempted expression of full-length HA1 domains. Furthermore, we have not attempted to express the full-length HA (inclusive of HA1 and HA2) in our BOAS platform. The primary reason was to avoid including the conserved stem region of HA2 which may distract from the HA1 epitopes (e.g., receptor binding site, lateral patch) that can be engaged by broadly neutralizing antibodies. Additionally, the full-length HA is inherently trimeric and may not be as amenable to our BOAS platform as the monomeric HA1 head domain.

      Reviewer #3 (Public Review):

      This work describes the tandem linkage of influenza hemagglutinin (HA) receptor binding domains of diverse subtypes to create 'beads on a string' (BOAS) immunogens. They show that these immunogens elicit ELISA binding titers against full-length HA trimers in mice, as well as varying degrees of vaccine mismatched responses and neutralization titers. They also compare these to BOAS conjugated on ferritin nanoparticles and find that this did not largely improve immune responses. This work offers a new type of vaccine platform for influenza vaccines, and this could be useful for further studies on the effects of conformation and immunodominance on the resulting immune response.

      Overall, the central claims of immunogenicity in a murine model of the BOAS immunogens described here are supported by the data.

      Strengths included the adaptability of the approach to include several, diverse subtypes of HAs. The determination of the optimal composition of strains in the 5-BOAS that overall yielded the best immune responses was an interesting finding and one that could also be adapted to other vaccine platforms. Lastly, as the authors discuss, the ease of translation to an mRNA vaccine is indeed a strength of this platform.

      One interesting and counter-intuitive result is the high levels of neutralization titers seen in vaccine-mismatched, group 2 H7 in the 5-BOAS group that differs from the 4-BOAS with the addition of a group 1 H5 RBD. At the same time, no H5 neutralization titers were observed for any of the BOAS immunogens, yet they were seen for the BOAS-NP. Uncovering where these immune responses are being directed and why these discrepancies are being observed would constitute informative future work.

      There are a few caveats in the data that should be noted:

      (1) 20 ug is a pretty high dose for a mouse and the majority of the serology presented is after 3 doses at 20 ug. By comparison, 0.5-5 ug is a more typical range (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6380945/, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9980174/). Also, the authors state that 20 ug per immunogen was used, including for the BOAS-NP group, which would mean that the BOAS-NP group was given a lower gram dose of HA RBD relative to the BOAS groups.

      We agree that this is on the “upper end” of recombinant protein dose. While we did not do a dose-response, we now include serum analyses after a single prime. The overall trends and reactivity to matched and mis-matched BOAS components remained similar across days d28 and d42. However, the differences between the BOAS and BOAS NP groups and the mixture group were more pronounced at d28, which reinforces our observation that the multivalency of the HA heads is necessary for eliciting robust serum responses to each component. These data are included in Supplemental Figure 5, and we’ve modified the text (lines 185-187) to include;

      “Similar binding trends were also observed with d28 serum, though the difference between the 8mer and mix groups was more pronounced at d28 (Supplemental Figure 5).”

      Additionally, we acknowledge that there is a size discrepancy between the BOAS NP and the largest BOAS, leading to an approximately ~15-fold difference on a per mole basis of the BOAS immunogen. The smallest and largest BOAS also differ by ~ 2.5-fold on a per mole basis; this could favor the overall amount of the smaller immunogens, however because vaccine doses are typically calculated on a mg per kg basis, we did not calculate on a molar basis for this study. Any promising immunogens will be evaluated in dose-response study to optimize elicited responses.

      (2) Serum was pooled from all animals per group for neutralization assays, instead of testing individual animals. This could mean that a single animal with higher immune responses than the rest in the group could dominate the signal and potentially skew the interpretation of this data.

      We repeated the neutralization assays with data points for individual mice. There does appear to be variability in the immune response between mice. This is most noticeable for responses to the H5 component. We are currently assessing what properties of our BOAS immunogen might contribute to the variability across individual mice.

      (3) In Figure S2, it looks like an apparent increase in MW by changing the order of strains here, which may be due to differences in glycosylation. Further analysis would be needed to determine if there are discrepancies in glycosylation amongst the BOAS immunogens and how those differ from native HAs.

      There does appear to be a relatively small difference in MW between the two BOAS configurations shown in Figure S2. This could be due to differences in glycosylation, as the reviewer points out, and in future studies, we intend to assess the influence of native glycosylation on antibody responses elicited by our BOAS immunogens.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major Concerns

      (1) From Figure 2D-E, it looks like BOAS are forming clusters, rather than a straight line. Do these form aggregates over time? Both at 4 degrees over a few days or after freeze-thaw cycle(s)? It is unclear from the SEC methods how long after purification this was performed and stability should be considered.

      Due to the inherent flexibility of the Gly-Ser linker between each component we do not anticipate that any rigidity would be imposed resulting in a “straight line”. Nevertheless, we appreciate the reviewers concern about the long-term stability of the BOAS immunogens. To address this, we include 1) the extended chromatograms from Figure 2C as Supplemental Figure 3 to show any aggregates present, 2) traces from up to 48 hours post-IMAC, and 3) chromatograms following a freeze-thaw cycle. Post-IMAC purification there is a minor (<10% total peak height) at ~9mL corresponding to aggregation. Note, we excluded this aggregation for immunizations. Post freeze-thaw cycle, we can see that upon immediate (<24hrs) thawing, the BOAS maintain a homogeneous peak with no significant (<10%) aggregation or degradation peak. However, after ~1 week post-freeze-thaw cycle at 4C, additional peaks within the chromatogram correspond to degradation of the BOAS.

      We modified the materials and methods section to state (lines 370-372)

      “Recovered proteins were then purified on a Superdex 200 (S200) Increase 10/300 GL (for trimeric HAs) or Superose 6 Increase 10/300 GL (for BOAS) size-exclusion column in Dulbecco’s Phosphate Buffered Saline (DPBS) within 48 hours of cobalt resin elution.”

      We commented on BOAS stability in the results section (lines 142-148)

      “Following SEC, affinity tags were removed with HRV-3C protease; cleaved tags, uncleaved BOAS, and His-tagged enzyme were removed using cobalt affinity resin and snap frozen in liquid nitrogen before immunizations. BOAS maintained monodispersity upon thawing, though over time, degradation was observed following longer term (>1 week) storage at 4C (Supplemental Figure 3). This degradation became more significant as BOAS increased in length (Supplemental Figure 3).”

      We also included in the discussion (lines 277-279):

      “Notably, for longer BOAS we observed degradation following longer term storage at 4C, which may reflect their overall stability.”

      (2) Figures 3-4 and 6-7, to make conclusions off of 3 mice per group is inappropriate. A sample size calculation should have been conducted and the appropriate number of mice tested. In addition, two independent mouse experiments should always be performed. Moreover, the reliability of the statistical tests performed seems unlikely, given the very small sample size.

      We agree that additional mice are necessary to make assessments regarding immunogenicity and cross-reactivity differences between the immunogens. To address this, we repeated the immunization with 5 additional mice, for a total of n=8 mice over two independent experiments. We incorporated these data into Figure 3B-D, as well as an additional Figure 3E (see below). We also now report the log-transformed endpoint titer (EPT) values rather than reciprocal EC50 values and added clarity to statistical analyses used. We have added the following lines to the methods section

      lines 427-431:

      “Serum endpoint titer (EPT) were determined using a non-linear regression (sigmoidal, four-parameter logistic (4PL) equation, where x is concentration) to determine the dilution at which dilution the blank-subtracted 450nm absorbance value intersect a 0.1 threshold. Serum titers for individual mice against respective antigens are reported as log transformed values of the EPT dilution.”

      lines 406-408:

      “C57BL/6 mice (Jackson Laboratory) (n=8 per group for 3-, 4-, 5-, 6-, 7-, and 8mer cohorts; n=5 for BOAS NP, NP, and mix cohorts) were immunized with 20µg of BOAS immunogens of varying length and adjuvanted with 50% Sigmas Adjuvant for a total of 100µL of inoculum.”

      lines 482-490:

      “Statistical Analysis

      Significance for ELISAs and microneutralization assays were determined using Prism (GraphPad Prism v10.2.3). ELISAs comparing serum reactivity and microneutralization and comparing >2 samples were analyzed using a Kruskal-Wallis test with Dunn’s post-hoc test to correct for multiple comparisons. Multiple comparisons were made between each possible combination or relative to a control group, where indicated. ELISAs comparing two samples were analyzed using a Mann-Whitney test. Significance was assigned with the following: * = p<0.05, ** = p<0.01, *** = p<0.001, and **** = p<0.0001. Where conditions are compared and no significance is reported, the difference was non-significant.”

      (3) One critical control that is missing is a homogenous BOAS, for example, just linking one H1 on a BOAS. Does oligomerization and increasing avidity alone improve humoral immunity?

      We agree that this is an interesting point, However, to address the impact of oligomerization and avidity on humoral immunity, we now include an additional control with a cocktail of HA heads used in the 8mer. We have incorporated this into Figure 3A, 3D and 3E, Figure 6G, and Figure 7.

      Additionally, we have added the following lines in the manuscript:

      lines 38-40:

      “Finally, vaccination with a mixture of the same HA head domains is not sufficient to elicit the same neutralization profile as the BOAS immunogens or nanoparticles.”

      lines 105-106:

      “Additionally, we showed that a mixture of the same HA head components was not sufficient to recapitulate the neutralizing responses elicited by the BOAS or BOAS NP.”

      lines 169-172:

      “To determine immunogenicity of each BOAS immunogen, we performed a prime-boost-boost vaccination regimen in C5BL/6 mice at two-week intervals with 20µg of immunogen and adjuvanted with Sigma Adjuvant (Figure 3A). We compared these BOAS to a control group immunized with a mixture of the eight HA heads present in the 8mer.”

      lines 265-267:

      “There were qualitatively immunodominant HAs, notably H4 and H9, and these were relatively consistent across BOAS in which they were a component. This effect was reduced in the mix cohort.”

      (4) While some cross-reactivity is likely (Figure 6G), there is considerable loss of binding when there is a mismatch. Of the antibodies induced, how much of this is strain-specific? For example, how well do serum antibodies bind to a pre-2009 H1?

      We agree with the reviewer that there is a considerable loss of binding when there is a mismatched HA component. To better understand this and incorporate a mismatched strain into our analysis of the 8mer and BOAS NP, we looked at serum binding titers to a pre-2009 H1, H1/Solomon Islands/2006, and an antigenically distinct H3, H3/Hong Kong/1968. We have incorporated this data into Figures 3D, 3E, 6F and 6G. We observed relatively high titers against both a mismatched H1 and H3, indicating that the BOAS maintain high titers against subtype-specific strains that are conserved over considerable antigenic distance. However, this was similar in the mixture group, indicating that this may not be specific to oligomerization of BOAS immunogens.

      We added the following to the methods section:

      lines 357-361

      “Head subdomains from these HAs were used in the BOAS immunogens, and full-length soluble ectodomain (FLsE) trimers were used in ELISAs. Additional H1 (H1/A/Solomon Islands/3/2006) and H3 (H3/A/Hong Kong/1/1968) FLsEs were used in ELISAs as mismatched, antigenically distinct HAs for all BOAS.”

      Minor Concerns

      (1) Line 44-46, the deaths per year are almost exclusively due to seasonal influenza outbreaks caused by antigenically drifted viruses in humans, not those spilling over from avian sp. and swine. For accuracy, please adjust this sentence.

      We have adjusted lines 45-48 to say “This is largely a consequence of viral evolution and antigenic drift as it circulates seasonally within humans and ultimately impacts vaccine effectiveness. Additionally, the chance for spillover events from animal reservoirs (e.g., avian, swine) is increasing as population and connectivity also increase.”

      (2) Figure 4D-E, provide a legend for what the symbols indicate, or simply just put the symbol next to either the homology score and % serum competition labels on the y-axis.

      We have included a legend in Figures 4D,E to distinguish between homology score and % serum competition

      (3) I am a bit confused by the data presented in Figure 7. The figure legend says the two symbols represent technical replicates. How? Is one technical replicate of all the mice in a group averaged and that's what's graphed? If so, this is not standard practice. I would encourage the authors to show the average technical replicates of each animal, which is standard.

      We thank the reviewer for their suggestion, and we have revised Figure 7 such that each symbol represents a single animal for n=5 animals. We have also adjusted the figure caption to the following:

      “Figure 7: Microneutralization titers to matched and mis-matched virus- Microneutralization of matched and mis-matched psuedoviruses: H1N1 (green, top left), H3N2 (orange, top right), H5N1 (yellow, bottom left), and H7N9 viruses (pink, bottom right) with d42 serum. Solid bars below each plot indicate a matched sub-type, and striped bars indicate a mis-matched subtype (i.e. not present in the BOAS). NP negative controls were used to determine threshold for neutralization. Upper and lower dashed lines represent the first dilution (1:32) (for H1N1, H3N2, and H5N1) or neutralization average with negative control NP serum (H7N9), and the last serum dilution (1:32,768), respectively, and points at the dashed lines indicate IC50s at or outside the limit of detection. Individual points indicate IC50 values from individual mice from each cohort (n=5). The mean is denoted by a bar and error bars are +/- 1 s.d., * = p<0.05 as determined by a Kruskal-Wallis test with Dunn’s multiple comparison post hoc test relative to the mix group.”

      (4) Paragraphs 298-313, multiple studies are referred to but not referenced.

      We have added the following references to this section:

      (38) Kanekiyo, M. et al. Self-assembling influenza nanoparticle vaccines elicit broadly neutralizing H1N1 antibodies. Nature 498, 102–106 (2013).

      (48) Hills, R. A. et al. Proactive vaccination using multiviral Quartet Nanocages to elicit broad anti-coronavirus responses. Nat. Nanotechnol. 1–8 (2024) doi:10.1038/s41565-024-01655-9.

      (65) Jardine, J. et al. Rational HIV immunogen design to target specific germline B cell receptors. Science 340, 711–716 (2013).

      (66) Tokatlian, T. et al. Innate immune recognition of glycans targets HIV nanoparticle immunogens to germinal centers. Science 363, 649–654 (2019).

      (67) Kato, Y. et al. Multifaceted Effects of Antigen Valency on B Cell Response Composition and Differentiation In Vivo. Immunity 53, 548-563.e8 (2020).

      (68) Marcandalli, J. et al. Induction of Potent Neutralizing Antibody Responses by a Designed Protein Nanoparticle Vaccine for Respiratory Syncytial Virus. Cell 176, 1420-1431.e17 (2019).

      (69) Bruun, T. U. J., Andersson, A.-M. C., Draper, S. J. & Howarth, M. Engineering a Rugged Nanoscaffold To Enhance Plug-and-Display Vaccination. ACS Nano 12, 8855–8866 (2018).

      (70) Kraft, J. C. et al. Antigen- and scaffold-specific antibody responses to protein nanoparticle immunogens. Cell Reports Medicine 100780 (2022) doi:10.1016/j.xcrm.2022.100780.

      Reviewer #2 (Recommendations For The Authors):

      Can the authors define "detectable titers"?

      Maybe add a threshold value of reciprocal EC on the figure for each plot.

      We recognize the reviewers concern with reporting serum titers in this way, and we have adjusted our reported titers as endpoint titers (EPT) with a dotted line for the first detectable dilution (1:50). We have also adjusted the methods section to reflect this change:

      (lines 427-431)

      “Serum endpoint titer (EPT) were determined using a non-linear regression (sigmoidal, four-parameter logistic (4PL) equation, where x is concentration) to determine the dilution at which dilution the blank-subtracted 450nm absorbance value intersect a 0.1 threshold. Serum titers for individual mice against respective antigens are reported as log transformed values of the EPT dilution.”

      It also appears that not all X-mer elicits an immune response against matched HA, e.g. for the 7 and 8 -mer. Not sure why the authors do not mention this. It could be due to too many HAs, not sure.

      We apologize for the confusion, and agree that our original method of reporting EC50 values does not reflect weak but present binding titers. Upon further analysis with additional mice as well as adjusting our method of reporting titers, it is easier to see in Figure 3D that all X-mer BOAS do indeed elicit binding detectable titers to matched HA components.

      It will be nice to add a conclusion to the cross-reactivity - again it appears that past 6-mer there has been a loss in cross-reactivity even though there are more subtypes on the BOAS.

      Also, the TI seemed to be the more conserved epitope targeted here.

      (Of note these two are mentioned in the discussion)

      We have updated the results section to include the following:

      (lines 281-294)

      “Based on the immunogenicity of the various BOAS and their ability to elicit neutralizing responses, it may not be necessary to maximize the number of HA heads into a single immunogen. Indeed, it qualitatively appears that the intermediate 4-, 5-, and 6mer BOAS were the most immunogenic and this length may be sufficient to effectively engage and crosslink BCR for potent stimulation. These BOAS also had similar or improved binding cross-reactivity to mis-matched HAs as compared to longer 7- or 8mer BOAS. Notably, the 3mer BOAS elicited detectable cross-reactive binding titers to H4 and H5 mismatched HAs in all mice. This observed cross-reactivity could be due to sequence conservation between the HAs, as H3 and H4 share ~51% sequence identity, and H1 and H2 share ~46% and ~62% overall sequence identity with H5, respectively (Supplemental Figure 6). Additionally, the degree of surface conservation decreased considerably beyond the 5mer as more antigenically distinct HAs were added to the BOAS. These data suggest that both antigenic distance between HA components and BOAS length play a key role in eliciting cross-reactive antibody responses, and further studies are necessary to optimize BOAS valency and antigenic distance for a desired response.”

      Figure 5E, the authors could indicate which subtype each mab is specific to for those who are not HA experts. (They have them color-coded but it is hard to see because very small).

      The authors also do not explain why 3E5 does not bind well to H1, H2, H3, H4 4-mer BOA, etc...

      We apologize for the lack of clarity in this figure. We updated Figure 5E to include the subtype it is specific for as well as listing the antibodies and their subtype and targeted epitope in the figure caption.

      Minor

      Figure 1B zoom looks like the line is hidden to the structure - should come in front

      We adjusted the figure accordingly.

      Line 127 - whether the order

      Corrected

      What is the rationale for thinking that a different order will lead to a different expression and antigenic results?

      We thank the reviewer for this question. We did not necessarily anticipate a difference in protein expression based on BOAS order We, however, wanted to verify that our platform was indeed “plug-and-play” platform and we could readily exchange components and order. We do, however, hypothesize that a different order may in fact lead to different antigenic results. We think that the conformation of the BOAS as well as physical and antigenic distance of HA components may influence cross-linking efficiency of BCRs and lead to different antigenic results with different levels of cross-reactivity. For example, a BOAS design with a cluster of group 1 HAs followed by a cluster of group 2 HAs, rather than our roughly alternating pattern could impact which HAs are in proximity to each other or could be potentially shielded in certain conformations, and thus could affect antigenic results. We expand on this rationale in the discussion in lines 310-314:

      “Further studies with different combinations of HAs could aid in understanding how length and composition influences epitope focusing. For example, a BOAS design with a cluster of group 1 HAs followed by a cluster of group 2 HAs, rather than our roughly alternating pattern could impact which HAs are in close proximity to one other or could be potentially shielded in certain conformations, and thus could affect antigenic results.”

      Maybe list HA#1 HA#2 HA#3 instead of HA1, HA2, HA3 to make sure it is not confounded with HA2 and HA2

      We agree that this may be confusing for readers, and have adjusted Figure 1C to show HA#1, HA#2, etc.

      For nsEM, do the authors have 2D classes and even 3D reconstructions? Line 148-149: maybe or just because there are more HAs.

      We did not obtain 2D class or 3D reconstructions of these BOAS. However, we do agree with the reviewer that the collapsed/rosette structure of the 8mer BOAS may be a consequence of the additional HA heads as well as the flexible Gly-Ser linkers between the components. We have added clarify to our statement in the discussion to read:

      lines 154-156:

      “This is likely a consequence of the flexible GSS linker separating the individual HA head components as well as the addition of significantly more HA head components to the construct.”.

      Line 153 " interface-directed" - what does this mean?

      We apologize for any confusion- we intend for “interface-directed” to refer antibodies that engage the trimer interface (TI) epitope between HA protomers. We have adjusted the manuscript to use the same terminology throughout, i.e. trimer interface or its abbreviation, TI.

      For Figure 2 F - do you have a negative control? Usually one does not determine an ELISA KD, it is not very accurate but shows binding in terms of OD value.

      We did include a negative control, MEDI8852, a stem-directed antibody, though it was not shown in the figure because we observed no binding, as expected. This negative control antibody was also used in Figure 5E for characterizing the BOAS NPs, and also shows no binding. We recognize that in an ELISA the KD is an equilibrium measurement and we do not report kinetic measurements as determined by a method such as bio-layer interferometry (BLI), and have this adjusted the figure caption to denote the values as “apparent K<sub>D</sub> values”.

      Line 169 - reads strangely, "BOAS-elicited serum, regardless of its length, reacted<br /> The length is the one of the Immunogen, not the serum

      We agree that this statement is unclear, and we have modified the sentence to read:

      lines 177-178:

      “Each of the BOAS, regardless of its length, elicited binding titers to all matched full-length HAs representing individual components (Figure 3D).”

      What is the adjuvant used (add in results)?

      We used Sigma adjuvant for all immunizations, and have included this information in the results section:

      lines 169-171:

      “To determine immunogenicity of each BOAS, we performed a prime-boost-boost vaccination regimen in C5BL/6 mice at two-week intervals with 20µg of immunogen and adjuvanted with Sigma Adjuvant (Figure 3A).”

      This information is also included in the methods section in lines 406-412.

      Line 178 - remove " across"

      We have removed the word “across” in this sentence and replaced it with “on” (line 194)

      Trimer- interface, and interface epitopes are used exchangeably - maybe keep it as trimer interface to be more precise

      As stated above, we have adjusted the manuscript to use the same term throughout, i.e., trimer interface or its abbreviation, TI.

      Line 221 - no figure 6H (6G?)

      We apologize for this typo and have corrected to Figure 6G (line 231)

      Reviewer #3 (Recommendations For The Authors):

      (1) Since 20 ug x3 doses is quite a high amount of vaccine, differences between immunogens may become blurred. Thus, it may be informative to compare post-prime serology for all immunogens or select immunogens to compare to the post-3rd dose data.

      We agree with the reviewer that this is on the upper end of vaccine dose and thus we explored the serum responses after a single boost. The overall trends and reactivity to matched and mis-matched BOAS components remained similar across days d28 and d42. However, the differences between the BOAS and BOAS NP groups and the mixture group were more pronounced at d28, which bolsters our claim that the presentation of the HA heads is important for eliciting strong serum responses to all components. We have included this data in Supplemental Figure 5, and have acknowledged this in the text:

      lines 185-187:

      “Similar binding trends were also observed with d28 serum, though the difference between the 8mer and mix groups was more pronounced at d28 (Supplemental Figure 5).”

      (2) Significance statistics for all immunogenicity data should be added and discussed; it is particularly absent in Figures 3D and 7.

      We have added statistical analyses to Figure 3 and Figure 7 to reflect changes in immunogenicity. We have also added the following to the methods section:

      lines 482-490:

      “Statistical Analysis

      Significance for ELISAs and microneutralization assays were determined using either a Mann-Whitney test or a Kruskal-Wallis test with Dunn’s post-hoc test in Prism (GraphPad Prism v10.2.3) to correct for multiple comparisons. Multiple comparisons were made between each possible combination or relative to a control group, where indicated. Significance was assigned with the following: * = p<0.05, ** = p<0.01, *** = p<0.001, and **** = p<0.0001. Where conditions are compared and no significance is reported, the difference was non-significant.”

      (3) Figure 2F: the figure has K03.12 listed for the H3-specific mAb and in the main text, but the caption says 3E5 - is the 3E5 in the caption a typo? 3E5 is listed for the competition ELISAs as an RBS mAb, but its binding site is distal to the RBS at residues 165-170 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9787348/), H7.167 binds in the RBS periphery and not directly within the RBS, and the epitope for P2-D9 is undetermined/not presented. This could mean that there is actually a higher proportion of RBS-directed antibodies than what is determined from this serum competition data. Also, reference to these as 'RBS-directed' in the serum competition methods section should be revised for accuracy.

      We sincerely apologize for this error and the resulting confusion. 3E5 in the caption is incorrect and should be K03.12 (https://www.rcsb.org/structure/5W08) and does engage the receptor binding site. We also apologize for the oversight that H7.167 is in the RBS periphery and not directly in the RBS. The additional P2-D9 in the panel of RBS-directed antibodies was also in error, as we do not believe it is RBS-directed, but is indeed H4 specific. We also included a reference to the paper and immunogen that elicited this antibody. We agree that this indicates that there could be a higher proportion of RBS-directed antibodies in the serum and have modified the text in the results and methods sections to read:

      lines 300-306:

      “Notably, this proportion is approximate, as at the time of reporting, antibodies that bind the receptor binding site of all components were not available. RBS-directed antibodies to the H4 and H9 component were not available, and the RBS-directed antibodies used targeting the other HA components have different footprints around the periphery of the RBS. Additionally, there are currently no reported influenza B TI-directed antibodies in the literature. Therefore, this may be an underestimate of the serum proportion focused to the conserved RBS and TI epitopes.”

      lines 435-439:

      “Following blocking with BSA in PBS-T, blocking solution was discarded and 40µL of either DPBS (no competition control), a cocktail of humanized antibodies targeting the RBS and periphery (5J8, 2G1, K03.12, H5.3, H7.167, H1209), a cocktail of humanized TI-directed antibodies (S5V2-29, D1 H1-17/H3-14, D2 H1-1/H3-1), or a negative control antibody (MEDI8852) were added at a concentration of 100µg/mL per antibody.”

      (4) Only nsEM data is shown for the 3-BOAS and 8-BOAS, where differences in morphology were seen between these longer and shorter proteins. Including nsEM images for all BOAS immunogens may show trends in morphology or organization that could correlate with immune responses, e.g. if the 5-BOAS also forms a higher proportion of rosette-like structures, while the the 4-BOAS is still a mix between extended and rosette-like, this could be a factor in the better immune responses seen for 5-BOAS.

      We appreciate the reviewer’s suggestion for further analysis of morphology between the intermediate BOAS sizes. We agree that the relationship between BOAS length and morphology should be explored more in depth, and we intend to do so in future studies and to also vary linker length and rigidity.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      This was a clearly written manuscript that did an excellent job summarizing complex data.

      In this manuscript, Cuevas-Zuviría et al. use protein modeling to generate over 5,000 predicted structures of nitrogenase components, encompassing both extant and ancestral forms across different clades. The study highlights that key insertions define the various Nif groups. The authors also examined the structures of three ancestral nitrogenase variants that had been previously identified and experimentally tested. These ancestral forms were shown in earlier studies to exhibit reduced activity in Azotobacter vinelandii, a model diazotroph. This work provides a useful resource for studying nitrogenase evolution.

      However, its impact is somewhat limited due to a lack of evidence linking the observed structural differences to functional changes. For example, in the ancestral nitrogenase structures, only a small set of residues (lines 421-431) were identified as potentially affecting interactions between nitrogenase components. Why didn't the authors test whether reverting these residues to their extant counterparts could improve nitrogenase activity of the ancestral variants?

      We thank the reviewer for their thoughtful comments. We acknowledge that our current study is primarily focused on a computational exploration of the structural differences in both extant and ancestral nitrogenase variants, which allowed us to generate a comprehensive structural dataset. Although we did not carry out experimental reversion tests in this study, we agree that directly assessing the functional consequences of reverting the specific residues (lines 420 to 429) to their extant counterparts is an important next step to elucidate their functional role. Indeed, these findings provide a valuable foundation for our future work, which is designed to include experimental characterization of these variants and further elucidate the role of critical residues in nitrogenase activity and evolution. We believe that these experiments will offer the direct functional validation that the reviewer has rightly pointed out, and we look forward to reporting on these results in a future study.

      Additionally, the paper feels somewhat disconnected. The predicted nitrogenase structures discussed in the first half of the manuscript were not well integrated with the findings from the ancestral structures. For instance, do the ancestral nitrogenase structures align with the predicted models? This comparison was never explicitly made and could have strengthened the study's conclusions.

      We thank the reviewer for this suggestion. Our original analysis (previously shown in Figure S9, now Figure S10) included insights into structural align comparisons. In response, we have reorganized the results section (lines 351-355) to explicitly address this comparison.

      Reviewer #2 (Public review):

      This work aims to study the evolution of nitrogenases, understanding how their structure and function adapted to changes in the environment, including oxygen levels and changes in metal availability. The study predicts > 5000 structures of nitrogenases, corresponding to extant, ancestral, and alternative ancestral sequences. It is observed that structural variations in the nitrogenases correlate with phylogenetic relationships. The amount of data generated in this study represents a massive undertaking that is certain to be a resource for the community. The study also provides strong insight into how structural evolution correlates with environmental and biological phenotypes.

      The challenge with this study is that all (or nearly all) of the quantitative analyses presented are based on RMSD calculations, many of which are under 2 angstroms. For all intents and purposes, two structures with RMSD < 2 angstroms could be considered 'structurally identical'. A lot of insight generated is based on minuscule differences in RMSD, for which it is not clear that they are significantly different. The suggestion would be to find a way to evaluate the RMSD metric and determine whether these values, as obtained for structures being compared, are reliable. Some options are provided in earlier studies: PMID: 11514933, PMID: 17218333, PMID: 11420449, PMID: 8289285 (and others). It could also be valuable to focus more on site-specific RMSDs rather than Global RMSDs. The high conservation in the nitrogenases likely ensures that the global RMSDs will remain low across the family. Focusing on specific regions might reveal interesting differences between clades that are more informative regarding the evolution of structure in tandem with environment/time.

      We thank the reviewer for their suggestions. We agree that while global RMSD values below 2Å typically indicate high structural similarity, relying solely on these measures can mask subtle yet potentially functionally meaningful differences. Our aim was not to test for overall structural identity but rather to quantify fine-scale variations between highly conserved nitrogenase structures, including extant and ancestral variants. Nevertheless, in light of the reviewer’s suggestions, we have implemented an additional metric ( rmsd<sub>100</sub>) for a more nuanced comparison. The results of our additional analyses (Figure S3) align closely with our original results (Figure 2), supporting our decision to retain the un-normalized results in the main text. As an additional measure, we also computed site-specific RMSDs for the active site’s environments (Figure S6) to further delineate subtle structural variations.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      Examination of (a)periodic brain activity has gained particular interest in the last few years in the neuroscience fields relating to cognition, disorders, and brain states. Using large EEG/MEG datasets from younger and older adults, the current study provides compelling evidence that age-related differences in aperiodic EEG/MEG signals can be driven by cardiac rather than brain activity. Their findings have important implications for all future research that aims to assess aperiodic neural activity, suggesting control for the influence of cardiac signals is essential.

      We want to thank the editors for their assessment of our work and highlighting its importance for the understanding of aperiodic neural activity. Additionally, we want to thank the three present and four former reviewers (at a different journal) whose comments and ideas were critical in shaping this manuscript to its current form. We hope that this paper opens up many more questions that will guide us - as a field - to an improved understanding of how “cortical” and “cardiac” changes in aperiodic activity are linked and want to invite readers to engage with our work through eLife’s comment function.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The present study addresses whether physiological signals influence aperiodic brain activity with a focus on age-related changes. The authors report age effects on aperiodic cardiac activity derived from ECG in low and high-frequency ranges in roughly 2300 participants from four different sites. Slopes of the ECGs were associated with common heart variability measures, which, according to the authors, shows that ECG, even at higher frequencies, conveys meaningful information. Using temporal response functions on concurrent ECG and M/EEG time series, the authors demonstrate that cardiac activity is instantaneously reflected in neural recordings, even after applying ICA analysis to remove cardiac activity. This was more strongly the case for EEG than MEG data. Finally, spectral parameterization was done in large-scale resting-state MEG and ECG data in individuals between 18 and 88 years, and age effects were tested. A steepening of spectral slopes with age was observed particularly for ECG and, to a lesser extent, in cleaned MEG data in most frequency ranges and sensors investigated. The authors conclude that commonly observed age effects on neural aperiodic activity can mainly be explained by cardiac activity.

      Strengths:

      Compared to previous investigations, the authors demonstrate the effects of aging on the spectral slope in the currently largest MEG dataset with equal age distribution available. Their efforts of replicating observed effects in another large MEG dataset and considering potential confounding by ocular activity, head movements, or preprocessing methods are commendable and valuable to the community. This study also employs a wide range of fitting ranges and two commonly used algorithms for spectral parameterization of neural and cardiac activity, hence providing a comprehensive overview of the impact of methodological choices. Based on their findings, the authors give recommendations for the separation of physiological and neural sources of aperiodic activity.

      Weaknesses:

      While the aim of the study is well-motivated and analyses rigorously conducted, the overall structure of the manuscript, as it stands now, is partially misleading. Some of the described results are not well-embedded and lack discussion.

      We want to thank the reviewer for their comments focussed on improving the overall structure of the manuscript. We agree with their suggestions that some results could be more clearly contextualized and restructured the manuscript accordingly.

      Reviewer #2 (Public review):

      I previously reviewed this important and timely manuscript at a previous journal where, after two rounds of review, I recommended publication. Because eLife practices an open reviewing format, I will recapitulate some of my previous comments here, for the scientific record.

      In that previous review, I revealed my identity to help reassure the authors that I was doing my best to remain unbiased because I work in this area and some of the authors' results directly impact my prior research. I was genuinely excited to see the earlier preprint version of this paper when it first appeared. I get a lot of joy out of trying to - collectively, as a field - really understand the nature of our data, and I continue to commend the authors here for pushing at the sources of aperiodic activity!

      In their manuscript, Schmidt and colleagues provide a very compelling, convincing, thorough, and measured set of analyses. Previously I recommended that the push even further, and they added the current Figure 5 analysis of event-related changes in the ECG during working memory. In my opinion this result practically warrants a separate paper its own!

      The literature analysis is very clever, and expanded upon from any other prior version I've seen.

      In my previous review, the broadest, most high-level comment I wanted to make was that authors are correct. We (in my lab) have tried to be measured in our approach to talking about aperiodic analyses - including adopting measuring ECG when possible now - because there are so many sources of aperiodic activity: neural, ECG, respiration, skin conductance, muscle activity, electrode impedances, room noise, electronics noise, etc. The authors discuss this all very clearly, and I commend them on that. We, as a field, should move more toward a model where we can account for all of those sources of noise together. (This was less of an action item, and more of an inclusion of a comment for the record.)

      I also very much appreciate the authors' excellent commentary regarding the physiological effects that pharmacological challenges such as propofol and ketamine also have on non-neural (autonomic) functions such as ECG. Previously I also asked them to discuss the possibility that, while their manuscript focuses on aperiodic activity, it is possible that the wealth of literature regarding age-related changes in "oscillatory" activity might be driven partly by age-related changes in neural (or non-neural, ECG-related) changes in aperiodic activity. They have included a nice discussion on this, and I'm excited about the possibilities for cognitive neuroscience as we move more in this direction.

      Finally, I previously asked for recommendations on how to proceed. The authors convinced me that we should care about how the ECG might impact our field potential measures, but how do I, as a relative novice, proceed. They now include three strong recommendations at the end of their manuscript that I find to be very helpful.

      As was obvious from previous review, I consider this to be an important and impactful cautionary report, that is incredibly well supported by multiple thorough analyses. The authors have done an excellent job responding to all my previous comments and concerns and, in my estimation, those of the previous reviewers as well.

      We want to thank the reviewer for agreeing to review our manuscript again and for recapitulating on their previous comments and the progress the manuscript has made over the course of the last ~2 years. The reviewer's comments have been essential in shaping the manuscript into its current form. Their feedback has made the review process truly feel like a collaborative effort, focused on strengthening the manuscript and refining its conclusions and resulting recommendations.

      Reviewer #3 (Public review):

      Summary:

      Schmidt et al., aimed to provide an extremely comprehensive demonstration of the influence cardiac electromagnetic fields have on the relationship between age and the aperiodic slope measured from electroencephalographic (EEG) and magnetoencephalographic (MEG) data.

      Strengths:

      Schmidt et al., used a multiverse approach to show that the cardiac influence on this relationship is considerable, by testing a wide range of different analysis parameters (including extensive testing of different frequency ranges assessed to determine the aperiodic fit), algorithms (including different artifact reduction approaches and different aperiodic fitting algorithms), and multiple large datasets to provide conclusions that are robust to the vast majority of potential experimental variations.

      The study showed that across these different analytical variations, the cardiac contribution to aperiodic activity measured using EEG and MEG is considerable, and likely influences the relationship between aperiodic activity and age to a greater extent than the influence of neural activity.

      Their findings have significant implications for all future research that aims to assess aperiodic neural activity, suggesting control for the influence of cardiac fields is essential.

      We want to thank the reviewer for their thorough engagement with our work and the resultant substantive amount of great ideas both mentioned in the section of Weaknesses and Authors Recommendations below. Their suggestions have sparked many ideas in us on how to move forward in better separating peripheral- from neuro-physiological signals that are likely to greatly influence our future attempts to better extract both cardiac and muscle activity from M/EEG recordings. So we want to thank them for their input, time and effort!

      Weaknesses:

      Figure 4I: The regressions explained here seem to contain a very large number of potential predictors. Based on the way it is currently written, I'm assuming it includes all sensors for both the ECG component and ECG rejected conditions?

      I'm not sure about the logic of taking a complete signal, decomposing it with ICA to separate out the ECG and non-ECG signals, then including these latent contributions to the full signal back into the same regression model. It seems that there could be some circularity or redundancy in doing so. Can the authors provide a justification for why this is a valid approach?

      After observing significant effects both in the MEG<sub>ECG component</sub> and MEG<sub>ECG rejected</sub> conditions in similar frequency bands we wanted to understand whether or not these age-related changes are statistically independent. To test this we added both variables as predictors in a regression model (thereby accounting for the influence of the other in relation to age). The regression models we performed were therefore actually not very complex. They were built using only two predictors, namely the data (in a specific frequency range) averaged over channels on which we noticed significant effects in the ECG rejected and ECG components data respectively (Wilkinson notation: age ~ 1 + ECG rejected + ECG components). This was also described in the results section stating that: “To see if MEG<sub>ECG rejected</sub> and MEG<sub>ECG component</sub> explain unique variance in aging at frequency ranges where we noticed shared effects, we averaged the spectral slope across significant channels and calculated a multiple regression model with MEG<sub>ECG component</sub> and MEG<sub>ECG rejected</sub> as predictors for age (to statistically control for the effect of MEG<sub>ECG component</sub>s and MEG<sub>ECG rejected</sub> on age). This analysis was performed to understand whether the observed shared age-related effects (MEG<sub>ECG rejected</sub> and MEG<sub>ECG component</sub>) are in(dependent).”  

      We hope this explanation solves the previous misunderstanding.

      I'm not sure whether there is good evidence or rationale to support the statement in the discussion that the presence of the ECG signal in reference electrodes makes it more difficult to isolate independent ECG components. The ICA algorithm will still function to detect common voltage shifts from the ECG as statistically independent from other voltage shifts, even if they're spread across all electrodes due to the referencing montage. I would suggest there are other reasons why the ICA might lead to imperfect separation of the ECG component (assumption of the same number of source components as sensors, non-Gaussian assumption, assumption of independence of source activities).

      The inclusion of only 32 channels in the EEG data might also have reduced the performance of ICA, increasing the chances of imperfect component separation and the mixing of cardiac artifacts into the neural components, whereas the higher number of sensors in the MEG data would enable better component separation. This could explain the difference between EEG and MEG in the ability to clean the ECG artifact (and perhaps higher-density EEG recordings would not show the same issue).

      The reviewer is making a good argument suggesting that our initial assumption that the presence of cardiac activity on the reference electrode influences the performance of the ICA may be wrong. After rereading and rethinking upon the matter we think that the reviewer is correct and that their assumptions for why the ECG signal was not so easily separable from our EEG recordings are more plausible and better grounded in the literature than our initial suggestion. We therefore now highlight their view as a main reason for why the ECG rejection was more challenging in EEG data. However, we also note that understanding the exact reason probably ends up being an empirical question that demands further research stating that:

      “Difficulties in removing ECG related components from EEG signals via ICA might be attributable to various reasons such as the number of available sensors or assumptions related to the non-gaussianity of the underlying sources. Further understanding of this matter is highly important given that ICA is the most widely used procedure to separate neural from peripheral physiological sources. ”

      In addition to the inability to effectively clean the ECG artifact from EEG data, ICA and other component subtraction methods have also all been shown to distort neural activity in periods that aren't affected by the artifact due to the ubiquitous issue of imperfect component separation (https://doi.org/10.1101/2024.06.06.597688). As such, component subtraction-based (as well as regression-based) removal of the cardiac artifact might also distort the neural contributions to the aperiodic signal, so even methods to adequately address the cardiac artifact might not solve the problem explained in the study. This poses an additional potential confound to the "M/EEG without ECG" conditions.

      The reviewer is correct in stating that, if an “artifactual” signal is not always present but appears and disappears (like e.g. eye-blinks) neural activity may be distorted in periods where the “artifactual” signal is absent. However, while this plausibly presents a problem for ocular activity, there is no obvious reason to believe that this applies to cardiac activity. While the ECG signal is non-stationary in nature, it is remarkably more stable than eye-movements in the healthy populations we analyzed (especially at rest). Therefore, the presence of the cardiac “artifact” was consistently present across the entirety of the MEG recordings we visually inspected.

      Literature Analysis, Page 23: was there a method applied to address studies that report reducing artifacts in general, but are not specific to a single type of artifact? For example, there are automated methods for cleaning EEG data that use ICLabel (a machine learning algorithm) to delete "artifact" components. Within these studies, the cardiac artifact will not be mentioned specifically, but is included under "artifacts".

      The literature analysis was largely performed automatically and solely focussed on ECG related activity as described in the methods section under Literature Analysis, if no ECG related terms were used in the context of artifact rejection a study was flagged as not having removed cardiac activity. This could have been indeed better highlighted by us and we apologize for the oversight on our behalf. We now additionally link to these details stating that:

      “However, an analysis of openly accessible M/EEG articles (N<sub>Articles</sub>=279; see Methods - Literature Analysis for further details) that investigate aperiodic activity revealed that only 17.1% of EEG studies explicitly mention that cardiac activity was removed and only 16.5% measure ECG (45.9% of MEG studies removed cardiac activity and 31.1% of MEG studies mention that ECG was measured; see Figure 1EF).”

      The reviewer makes a fair point that there is some uncertainty here and our results probably present a lower bound of ECG handling in M/EEG research as, when I manually rechecked the studies that were not initially flagged in studies it was often solely mentioned that “artifacts” were rejected. However, this information seemed too ambiguous to assume that cardiac activity was in fact accounted for. However, again this could have been mentioned more clearly in writing and we apologize for this oversight. Now this is included as part of the methods section Literature Analysis stating that:

      “All valid word contexts were then manually inspected by scanning the respective word context to ensure that the removal of “artifacts” was related specifically to cardiac and not e.g. ocular activity or the rejection of artifacts in general (without specifying which “artifactual” source was rejected in which case the manuscript was marked as invalid). This means that the results of our literature analysis likely present a lower bound for the rejection of cardiac activity in the M/EEG literature investigating aperiodic activity.”

      Statistical inferences, page 23: as far as I can tell, no methods to control for multiple comparisons were implemented. Many of the statistical comparisons were not independent (or even overlapped with similar analyses in the full analysis space to a large extent), so I wouldn't expect strong multiple comparison controls. But addressing this point to some extent would be useful (or clarifying how it has already been addressed if I've missed something).

      In the present study we tried to minimize the risk of type 1 errors by several means, such as A) weakly informative priors, B) robust regression models and C) by specifying a region of practical equivalence (ROPE, see Methods Statistical Inference for further Information) to define meaningful effects.

      Weakly informative priors can lower the risk of type 1 errors arising from multiple testing by shrinking parameter estimates towards zero (see e.g. Lemoine, 2019). Robust regression models use a Student T distribution to describe the distribution of the data. This distribution features heavier tails, meaning it allocates more probability to extreme values, which in turn minimizes the influence of outliers. The ROPE criterion ensures that only effects exceeding a negligible size are considered meaningful, representing a strict and conservative approach to interpreting our findings (see Kruschke 2018, Cohen, 1988).

      Furthermore, and more generally we do not selectively report “significant” effects in the situations in which multiple analyses were conducted on the same family of data (e.g. Figure 2 & 4). Instead we provide joint inference across several plausible analysis options (akin to a specification curve analysis, Simonsohn, Simmons & Nelson 2020) to provide other researchers with an overview of how different analysis choices impact the association between cardiac and neural aperiodic activity.

      Lemoine, N. P. (2019). Moving beyond noninformative priors: why and how to choose weakly informative priors in Bayesian analyses. Oikos, 128(7), 912-928.

      Simonsohn, U., Simmons, J. P., & Nelson, L. D. (2020). Specification curve analysis. Nature Human Behaviour, 4(11), 1208-1214.

      Methods:

      Applying ICA components from 1Hz high pass filtered data back to the 0.1Hz filtered data leads to worse artifact cleaning performance, as the contribution of the artifact in the 0.1Hz to 1Hz frequency band is not addressed (see Bailey, N. W., Hill, A. T., Biabani, M., Murphy, O. W., Rogasch, N. C., McQueen, B., ... & Fitzgerald, P. B. (2023). RELAX part 2: A fully automated EEG data cleaning algorithm that is applicable to Event-Related-Potentials. Clinical Neurophysiology, result reported in the supplementary materials). This might explain some of the lower frequency slope results (which include a lower frequency limit <1Hz) in the EEG data - the EEG cleaning method is just not addressing the cardiac artifact in that frequency range (although it certainly wouldn't explain all of the results).

      We want to thank the reviewer for suggesting this interesting paper, showing that lower high-pass filters may be preferable to the more commonly used >1Hz high-pass filters for detection of ICA components that largely contain peripheral physiological activity. However, the results presented by Bailey et al. contradict the more commonly reported findings by other researchers that >1Hz high-pass filter is actually preferable (e.g. Winkler et al. 2015; Dimingen, 2020 or Klug & Gramann, 2021) and recommendations in widely used packages for M/EEG analysis (e.g. https://mne.tools/1.8/generated/mne.preprocessing.ICA.html). Yet, the fact that there seems to be a discrepancy suggests that further research is needed to better understand which type of high-pass filtering is preferable in which situation. Furthermore, it is notable that all the findings for high-pass filtering in ICA component detection and removal that we are aware of relate to ocular activity. Given that ocular and cardiac activity have very different temporal and spectral patterns it is probably worth further investigating whether the classic 1Hz high-pass filter is really also the best option for the detection and removal of cardiac activity. However, in our opinion this requires a dedicated investigation on its own..

      We therefore highlight this now in our manuscript stating that:

      “Additionally, it is worth noting that the effectiveness of an ICA crucially depends on the quality of the extracted components(63,64) and even widely suggested settings e.g. high-pass filtering at 1Hz before fitting an ICA may not be universally applicable (see supplementary material of (64)).

      Winkler, S. Debener, K. -R. Müller and M. Tangermann, "On the influence of high-pass filtering on ICA-based artifact reduction in EEG-ERP," 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Milan, Italy, 2015, pp. 4101-4105, doi: 10.1109/EMBC.2015.7319296.

      Dimigen, O. (2020). Optimizing the ICA-based removal of ocular EEG artifacts from free viewing experiments. NeuroImage, 207, 116117.

      Klug, M., & Gramann, K. (2021). Identifying key factors for improving ICA‐based decomposition of EEG data in mobile and stationary experiments. European Journal of Neuroscience, 54(12), 8406-8420.

      It looks like no methods were implemented to address muscle artifacts. These can affect the slope of EEG activity at higher frequencies. Perhaps the Riemannian Potato addressed these artifacts, but I suspect it wouldn't eliminate all muscle activity. As such, I would be concerned that remaining muscle artifacts affected some of the results, particularly those that included high frequency ranges in the aperiodic estimate. Perhaps if muscle activity were left in the EEG data, it could have disrupted the ability to detect a relationship between age and 1/f slope in a way that didn't disrupt the same relationship in the cardiac data (although I suspect it wouldn't reverse the overall conclusions given the number of converging results including in lower frequency bands). Is there a quick validity analysis the authors can implement to confirm muscle artifacts haven't negatively affected their results?

      I note that an analysis of head movement in the MEG is provided on page 32, but it would be more robust to show that removing ICA components reflecting muscle doesn't change the results. The results/conclusions of the following study might be useful for objectively detecting probable muscle artifact components: Fitzgibbon, S. P., DeLosAngeles, D., Lewis, T. W., Powers, D. M. W., Grummett, T. S., Whitham, E. M., ... & Pope, K. J. (2016). Automatic determination of EMG-contaminated components and validation of independent component analysis using EEG during pharmacologic paralysis. Clinical neurophysiology, 127(3), 1781-1793.

      We thank the reviewer for their suggestion. Muscle activity can indeed be a potential concern, for the estimation of the spectral slope. This is precisely why we used head movements (as also noted by the reviewer) as a proxy for muscle activity. We also agree with the reviewer that this is not a perfect estimate. Additionally, also the riemannian potato would probably only capture epochs that contain transient, but not persistent patterns of muscle activity.

      The paper recommended by the reviewer contains a clever approach of using the steepness of the spectral slope (or lack thereof) as an indicator whether or not an independent component (IC) is driven by muscle activity. In order to determine an optimal threshold Fitzgibbon et al. compared paralyzed to temporarily non paralyzed subjects. They determined an expected “EMG-free” threshold for their spectral slope on paralyzed subjects and used this as a benchmark to detect IC’s that were contaminated by muscle activity in non paralyzed subjects.

      This is a great idea, but unfortunately would go way beyond what we are able to sensibly estimate with our data for the following reasons. The authors estimated their optimal threshold on paralyzed subjects for EEG data and show that this is a feasible threshold to be applied across different recordings. So for EEG data it might be feasible, at least as a first shot, to use their threshold on our data. However, we are measuring MEG and as alluded to in our discussion section under “Differences in aperiodic activity between magnetic and electric field recordings” the spectral slope differs greatly between MEG and EEG recordings for non-trivial reasons. Furthermore, the spectral slope even seems to also differ across different MEG devices. We noticed this when we initially tried to pool the data recorded in Salzburg with the Cambridge dataset. This means we would need to do a complete validation of this procedure for the MEG data recorded in Cambridge and in Salzburg, which is not feasible considering that we A) don’t have direct access to one of the recording sites and B) would even if we had access face substantial hurdles to get ethical approval for the experiment performed by Fitzgibbon et al..

      However, we think the approach brought forward by Fitzgibbon and colleagues is a clever way to remove muscle activity from EEG recordings, whenever EMG was not directly recorded. We therefore suggested in the Discussion section that ideally also EMG should be recorded stating that:

      “It is worth noting that, apart from cardiac activity, muscle activity can also be captured in (non-)invasive recordings and may drastically influence measures of the spectral slope(72). To ensure that persistent muscle activity does not bias our results we used changes in head movement velocity as a control analysis (see Supplementary Figure S9). However, it should be noted that this is only a proxy for the presence of persistent muscle activity. Ideally, studies investigating aperiodic activity should also be complemented by measurements of EMG. Whenever such measurements are not available creative approaches that use the steepness of the spectral slope (or the lack thereof) as an indicator to detect whether or not e.g. an independent component is driven by muscle activity are promising(72,73). However, these approaches may require further validation to determine how well myographic aperiodic thresholds are transferable across the wide variety of different M/EEG devices.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) As outlined above, I recommend rephrasing the last section of the introduction to briefly summarize/introduce all main analysis steps undertaken in the study and why these were done (for example, it is only mentioned that the Cam-CAN dataset was used to study the impact of cardiac on MEG activity although the author used a variety of different datasets). Similarly, I am missing an overview of all main findings in the context of the study goals in the discussion. I believe clarifying the structure of the paper would not only provide a red thread to the reader but also highlight the efforts/strength of the study as described above.

      This is a good call! As suggested by the reviewer we now try to give a clearer overview of what was investigated why. We do that both at the end of the introduction stating that: “Using the publicly available Cam-CAN dataset(28,29), we find that the aperiodic signal measured using M/EEG originates from multiple physiological sources. In particular, significant portions of age-related changes in aperiodic activity –normally attributed to neural processes– can be better explained by cardiac activity. This observation holds across a wide range of processing options and control analyses (see Supplementary S1), and was replicable on a separate MEG dataset. However, the extent to which cardiac activity accounts for age-related changes in aperiodic activity varies with the investigated frequency range and recording site. Importantly, in some frequency ranges and sensor locations, age-related changes in neural aperiodic activity still prevail. But does the influence of cardiac activity on the aperiodic spectrum extend beyond age? In a preliminary analysis, we demonstrate that working memory load modulates the aperiodic spectrum of “pure” ECG recordings. The direction of this working memory effect mirrors previous findings on EEG data(5) suggesting that the impact of cardiac activity goes well beyond aging. In sum, our results highlight the complexity of aperiodic activity while cautioning against interpreting it as solely “neural“ without considering physiological influences.”

      and at the beginning of the discussion section:

      “Difficulties in removing ECG related components from EEG signals via ICA might be attributable to various reasons such as the number of available sensors or assumptions related to the non-gaussianity of the underlying sources. Further understanding of this matter is highly important given that ICA is the most widely used procedure to separate neural from peripheral physiological sources (see Figure 1EF). Additionally, it is worth noting that the effectiveness of an ICA crucially depends on the quality of the extracted components(63,64) and even widely suggested settings e.g. high-pass filtering at 1Hz before fitting an ICA may not be universally applicable (see supplementary material of (64)). “

      (2) I found it interesting that the spectral slopes of ECG activity at higher frequency ranges (> 10 Hz) seem mostly related to HRV measures such as fractal and time domain indices and less so with frequency-domain indices. Do the authors have an explanation for why this is the case? Also, the analysis of the HRV measures and their association with aperiodic ECG activity is not explained in any of the method sections.

      We apologize for the oversight in not mentioning the HRV analysis in more detail in our methods section. We added a subsection to the Methods section entitled ECG Processing - Heart rate variability analysis to further describe the HRV analyses.

      “ECG Processing - Heart rate variability analysis

      Heart rate variability (HRV) was computed using the NeuroKit2 toolbox, a high level tool for the analysis of physiological signals. First, the raw electrocardiogram (ECG) data were preprocessed, by highpass filtering the signal at 0.5Hz using an infinite impulse response (IIR) butterworth filter(order=5) and by smoothing the signal with a moving average kernel with the width of one period of 50Hz to remove the powerline noise (default settings of neurokit.ecg.ecg_clean). Afterwards, QRS complexes were detected based on the steepness of the absolute gradient of the ECG signal. Subsequently, R-Peaks were detected as local maxima in the QRS complexes (default settings of neurokit.ecg.ecg_peaks; see (98) for a validation of the algorithm). From the cleaned R-R intervals, 90 HRV indices were derived, encompassing time-domain, frequency-domain, and non-linear measures. Time-domain indices included standard metrics such as the mean and standard deviation of the normalized R-R intervals , the root mean square of successive differences, and other statistical descriptors of interbeat interval variability. Frequency-domain analyses were performed using power spectral density estimation, yielding for instance low frequency (0.04-0.15Hz) and high frequency (0.15-0.4Hz) power components. Additionally, non-linear dynamics were characterized through measures such as sample entropy, detrended fluctuation analysis and various Poincaré plot descriptors. All these measures were then related to the slopes of the low frequency (0.25 – 20 Hz) and high frequency (10 – 145 Hz) aperiodic spectrum of the raw ECG.”

      With regards to association of the ECG’s spectral slopes at high frequencies and frequency domain indices of heart rate variability. Common frequency domain indices of heart rate variability fall in the range of 0.01-.4Hz. Which probably explains why we didn’t notice any association at higher frequency ranges (>10Hz).

      This is also stated in the related part of the results section:

      “In the higher frequency ranges (10 - 145 Hz) spectral slopes were most consistently related to fractal and time domain indices of heart rate variability, but not so much to frequency-domain indices assessing spectral power in frequency ranges < 0.4 Hz.”

      (3) Related to the previous point - what is being reflected in the ECG at higher frequency ranges, with regard to biological mechanisms? Results are being mentioned, but not further discussed. However, this point seems crucial because the age effects across the four datasets differ between low and high-frequency slope limits (Figure 2C).

      This is a great question that definitely also requires further attention and investigation in general (see also Tereshchenko & Josephson, 2015). We investigated the change of the slope across frequency ranges that are typically captured in common ECG setups for adults (0.05 - 150Hz, Tereshchenko & Josephson, 2015; Kusayama, Wong, Liu et al. 2020). While most of the physiological significant spectral information of an ECG recording rests between 1-50Hz (Clifford & Azuaje, 2006), meaningful information can be extracted at much higher frequencies. For instance, ventricular late potentials have a broader frequency band (~40-250Hz) that falls straight in our spectral analysis window. However, that’s not all, as further meaningful information can be extracted at even higher frequencies (>100Hz). Yet, the exact physiological mechanisms underlying so-called high-frequency QRS remain unclear (HF-QRS; see Tereshchenko & Josephson, 2015; Qiu et al. 2024 for a review discussing possible mechanisms). Yet, at the same time the HF-QRS seems to be highly informative for the early detection of myocardial ischemia and other cardiac abnormalities that may not yet be evident in the standard frequency range (Schlegel et al. 2004; Qiu et al. 2024). All optimism aside, it is also worth noting that ECG recordings at higher frequencies can capture skeletal muscle activity with an overlapping frequency range up to 400Hz (Kusayama, Wong, Liu et al. 2020). We highlight all of this now when introducing this analysis in the results sections as outstanding research question stating that:

      “However, substantially less is known about aperiodic activity above 0.4Hz in the ECG. Yet, common ECG setups for adults capture activity at a broad bandwidth of 0.05 - 150Hz(33,34).

      Importantly, a lot of the physiological meaningful spectral information rests between 1-50Hz(35), similarly to M/EEG recordings. Furthermore, meaningful information can be extracted at much higher frequencies. For instance, ventricular late potentials have a broader frequency band (~40-250Hz(35)). However, that’s not all, as further meaningful information can be extracted at even higher frequencies (>100Hz). For instance, the so-called high-frequency QRS seems to be highly informative for the early detection of myocardial ischemia and other cardiac abnormalities that may not yet be evident in the standard frequency range(36,37). Yet, the exact physiological mechanisms underlying the high-frequency QRS remain unclear (see (37) for a review discussing possible mechanisms). ”

      Tereshchenko, L. G., & Josephson, M. E. (2015). Frequency content and characteristics of ventricular conduction. Journal of electrocardiology, 48(6), 933-937.

      Kusayama, T., Wong, J., Liu, X. et al. Simultaneous noninvasive recording of electrocardiogram and skin sympathetic nerve activity (neuECG). Nat Protoc 15, 1853–1877 (2020). https://doi.org/10.1038/s41596-020-0316-6

      Clifford, G. D., & Azuaje, F. (2006). Advanced methods and tools for ECG data analysis (Vol. 10). P. McSharry (Ed.). Boston: Artech house.

      Qiu, S., Liu, T., Zhan, Z., Li, X., Liu, X., Xin, X., ... & Xiu, J. (2024). Revisiting the diagnostic and prognostic significance of high-frequency QRS analysis in cardiovascular diseases: a comprehensive review. Postgraduate Medical Journal, qgae064.

      Schlegel, T. T., Kulecz, W. B., DePalma, J. L., Feiveson, A. H., Wilson, J. S., Rahman, M. A., & Bungo, M. W. (2004, March). Real-time 12-lead high-frequency QRS electrocardiography for enhanced detection of myocardial ischemia and coronary artery disease. In Mayo Clinic Proceedings (Vol. 79, No. 3, pp. 339-350). Elsevier.

      (4) Page 10: At first glance, it is not quite clear what is meant by "processing option" in the text. Please clarify.

      Thank you for catching this! Upon re-reading this is indeed a bit oblivious. We now swapped “processing options” with “slope fits” to make it clearer that we are talking about the percentage of effects based on the different slope fits.

      (5) The authors mention previous findings on age effects on neural 1/f activity (References Nr 5,8,27,39) that seem contrary to their own findings such as e.g., the mostly steepening of the slopes with age. Also, the authors discuss thoroughly why spectral slopes derived from MEG signals may differ from EEG signals. I encourage the authors to have a closer look at these studies and elaborate a bit more on why these studies differ in their conclusions on the age effects. For example, Tröndle et al. (2022, Ref. 39) investigated neural activity in children and young adults, hence, focused on brain maturation, whereas the CamCAN set only considers the adult lifespan. In a similar vein, others report age effects on 1/f activity in much smaller samples as reported here (e.g., Voytek et al., 2015).

      I believe taking these points into account by briefly discussing them, would strengthen the authors' claims and provide a more fine-grained perspective on aging effects on 1/f.

      The reviewer is making a very important point. As age-related differences in (neuro-)physiological activity are not necessarily strictly comparable and entirely linear across different age-cohorts (e.g. age-related changes in alpha center frequency). We therefore, added the suggested discussion points to the discussion section.

      “Differences in electric and magnetic field recordings aside, aperiodic activity may not change strictly linearly as we are ageing and studies looking at younger age groups (e.g. <22; (44) may capture different aspects of aging (e.g. brain maturation), than those looking at older subjects (>18 years; our sample). A recent report even shows some first evidence of an interesting putatively non-linear relationship with age in the sensorimotor cortex for resting recordings(59)”

      (6) The analysis of the working memory paradigm as described in the outlook-section of the discussion comes as a bit of a surprise as it has not been introduced before. If the authors want to convey with this study that, in general, aperiodic neural activity could be influenced by aperiodic cardiac activity, I recommend introducing this analysis and the results earlier in the manuscript than only in the discussion to strengthen their message.

      The reviewer is correct. This analysis really comes a bit out of the blue. However, this was also exactly the intention for placing this analysis in the discussion. As the reviewer correctly noted, the aim was to suggest “that, in general, aperiodic neural activity could be influenced by aperiodic cardiac activity”. We placed this outlook directly after the discussion of “(neuro-)physiological origins of aperiodic activity”, where we highlight the potential challenges of interpreting drug induced changes to M/EEG recordings. So the aim was to get the reader to think about whether age is the only feature affected by cardiac activity and then directly present some evidence that this might go beyond age.

      However, we have been rethinking this approach based on the reviewers comments and moved that paragraph to the end of the results section accordingly and introduce it already at the end of the introduction stating that:

      “But does the influence of cardiac activity on the aperiodic spectrum extend beyond age? In a preliminary analysis, we demonstrate that working memory load modulates the aperiodic spectrum of “pure” ECG recordings. The direction of this working memory effect mirrors previous findings on EEG data(5) suggesting that the impact of cardiac activity goes well beyond aging.”

      (7) The font in Figure 2 is a bit hard to read (especially in D). I recommend increasing the font sizes where necessary for better readability.

      We agree with the Reviewer and increased the font sizes accordingly.

      (8) Text in the discussion: Figure 3B on page 10 => shouldn't it be Figure 4?

      Thank you for catching this oversight. We have now corrected this mistake.

      (9) In the third section on page 10, the Figure labels seem to be confused. For example, Figure 4 E is supposed to show "steepening effects", which should be Figure 4B I believe.

      Please check the figure labels in this section to avoid confusion.

      Thank you for catching this oversight. We have now corrected this mistake.

      (10) Figure Legend 4 I), please check the figure labels in the text

      Thank you for catching this oversight. We have now corrected this mistake.

      Reviewer #3 (Recommendations for the authors):

      I have a number of suggestions for improving the manuscript, which I have divided by section in the following:

      ABSTRACT:

      I would suggest re-writing the first sentences to make it easier to read for non-expert readers: "The power of electrophysiologically measured cortical activity decays with an approximately 1/fX function. The slope of this decay (i.e. the spectral exponent, X) is modulated..."

      Thank you for the suggestion. We adjusted the sentence as suggested to make it easier for less technical readers to understand that “X” refers to the exponent.

      Including the age range that was studied in the abstract could be informative.

      Done as suggested.

      As an optional recommendation, I think it would increase the impact of the article if the authors note in the abstract that the current most commonly applied cardiac artifact reduction approaches don't resolve the issue for EEG data, likely due to an imperfect ability to separate the cardiac artifact from the neural activity with independent component analysis. This would highlight to the reader that they can't just expect to address these concerns by cleaning their data with typical cleaning methods.

      I think it would also be useful to convey in the abstract just how comprehensive the included analyses were (in terms of artifact reduction methods tested, different aperiodic algorithms and frequency ranges, and both MEG and EEG). Doing so would let the reader know just how robust the conclusions are likely to be.

      This is a brilliant idea! As suggested we added a sentence highlighting that simply performing an ICA may not be sufficient to separate cardiac contributions to M/EEG recordings and refer to the comprehensiveness of the performed analyses.

      INTRODUCTION:

      I would suggest re-writing the following sentence for readability: "In the past, aperiodic neural activity, other than periodic neural activity (local peaks that rise above the "power-law" distribution), was often treated as noise and simply removed from the signal"

      To something like: "In the past, aperiodic neural activity was often treated as noise and simply removed from the signal e.g. via pre-whitening, so that analyses could focus on periodic neural activity (local peaks that rise above the "power-law" distribution, which are typically thought to reflect neural oscillations).

      We are happy to follow that suggestion.

      Page 3: please provide the number of articles that were included in the examination of the percentage that remove cardiac activity, and note whether the included articles could be considered a comprehensive or nearly comprehensive list, or just a representative sample.

      We stated the exact number of articles in the methods section under Literature Analysis. However, we added it to the Introduction on page 3 as suggested by the reviewer. The selection of articles was done automatically, dependent on a list of pre-specified terms and exclusively focussed on articles that had terms related to aperiodic activity in their title (see Literature Analysis). Therefore, I would personally be hesitant in calling it a comprehensive or nearly comprehensive list of the general M/EEG literature as the analysis of aperiodic activity is still relatively niche compared to the more commonly investigated evoked potentials or oscillations. I think whether or not a reader perceives our analysis as comprehensive should be up to them to decide and does not reflect something I want to impose on them. This is exacerbated by the fact that the analysis of neural aperiodic activity has rapidly gained traction over the last years (see Figure 1D orange) and the literature analysis was performed almost 2 years ago and therefore, in my eyes, only represents a glimpse in the rapidly evolving field related to the analysis of aperiodic activity.

      Figure 1E-F: It's not completely clear that the "Cleaning Methods" part of the figure indicates just methods to clean the cardiac artifact (rather than any artifact). It also seems that ~40% of EEG studies do not apply any cleaning methods even from within the studies that do clean the cardiac artifact (if I've read the details correctly). This seems unlikely. Perhaps there should be a bar for "other methods", or "unspecified"? Having said that, I'm quite familiar with the EEG artifact reduction literature, and I would be very surprised if ~40% of studies cleaned the cardiac artifact using a different method to the methods listed in the bar graph, so I'm wondering if I've misunderstood the figure, or whether the data capture is incomplete / inaccurate (even though the conclusion that ICA is the most common method is almost certainly accurate).

      The cleaning is indeed only focussed on cardiac activity specifically. This was however also mentioned in the caption of Figure 1: “We were further interested in determining which artifact rejection approaches were most commonly used to remove cardiac activity, such as independent component analysis (ICA(22)), singular value decomposition (SVD(23)), signal space separation (SSS(24)), signal space projections (SSP(25)) and denoising source separation (DSS(26)).” and in the methods section under Literature Analysis. However, we adjusted figure 1EF to make it more obvious that the described cleaning methods were only related to the ECG. Aside from using blind source separation techniques such as ICA a good amount of studies mentioned that they cleaned their data based on visual inspection (which was not further considered). Furthermore, it has to be noted that only studies were marked as having separated cardiac from neural activity, when this was mentioned explicitly.

      RESULTS:

      Page 6: I would delete the "from a neurophysiological perspective" clause, which makes the sentence more difficult to read and isn't so accurate (frequencies 13-25Hz would probably more commonly be considered mid-range rather than low or high). Additionally, both frequency ranges include 15Hz, but the next sentence states that the ranges were selected to avoid the knee at 15Hz, which seems to be a contradiction. Could the authors explain in more detail how the split addresses the 15Hz knee?

      We removed the “from a neurophysiological perspective” clause as suggested. With regards to the “knee” at ~15Hz I would like to defer the reviewer to Supplementary Figure S1. The Knee Frequency varies substantially across subjects so splitting the data at only 1 exact Frequency did not seem appropriate. Additionally, we found only spurious significant age-related variations in Knee Frequency (i.e. only one out of the 4 datasets; not shown).

      Furthermore, we wanted to better connect our findings to our MEG results in Figure 4 and also give the readers a holistic overview of how different frequency ranges in the aperiodic ECG would be affected by age. So to fulfill all of these objectives we decided to fit slopes with respective upper/lower bounds around a range of 5Hz above and below the average 15Hz Knee Frequency across datasets.

      The later parts of this same paragraph refer to a vast amount of different frequency ranges, but only the "low" and "high" frequency ranges were previously mentioned. Perhaps the explanation could be expanded to note that multiple lower and upper bounds were tested within each of these low and high frequency windows?

      This is a good catch we adjusted the sentence as suggested. We now write: “.. slopes were fitted individually to each subject's power spectrum in several lower (0.25 – 20 Hz) and higher (10-145 Hz) frequency ranges.”

      The following two sentences seem to contradict each other: "Overall, spectral slopes in lower frequency ranges were more consistently related to heart rate variability indices(> 39.4% percent of all investigated indices)" and: "In the lower frequency range (0.25 - 20Hz), spectral slopes were consistently related to most measures of heart rate variability; i.e. significant effects were detected in all 4 datasets (see Figure 2D)." (39.4% is not "most").

      The reviewer is correct in stating that 39.4% is not most. However, the 39.4% is the lowest bound and only refers to 1 dataset. In the other 3 datasets the percentage of effects was above 64% which can be categorized as “most” i.e. above 50%. We agree that this was a bit ambiguous in the sentence so we added the other percentages as well as a reference to Figure 2D to make this point clearer.

      Figure 2D: it isn't clear what the percentages in the semi-circles reflect, nor why some semi-circles are more full circles while others are only quarter circles.

      The percentages in the semi-circles reflect the amount of effects (marked in red) and null effects (marked in green) per dataset, when viewed as average across the different measures of HRV. Sometimes less effects were found for some frequency ranges resulting in quarters instead of semi circles.

      Page 8: I think the authors could make it more clear that one of the conditions they were testing was the ECG component of the EEG data (extracted by ICA then projected back into the scalp space for the temporal response function analysis).

      As suggested by the reviewer we adjusted our wording and replaced the arguably a bit ambiguous “... projected back separately” with “... projected back into the sensor space”. We thank the reviewer for this recommendation, as it does indeed make it easier to understand the procedure.

      “After pre-processing (see Methods) the data was split in three conditions using an ICA(22). Independent components that were correlated (at r > 0.4; see Methods: MEG/EEG Processing - pre-processing) with the ECG electrode were either not removed from the data (Figure 3ABCD - blue), removed from the data (Figure 2ABCD - orange) or projected back into the sensor space (Figure 3ABCD - green).”

      Figure 4A: standardized beta coefficients for the relationship between age and spectral slope could be noted to provide improved clarity (if I'm correct in assuming that is what they reflect).

      This was indeed shown in Figure 4A and noted in the color bar as “average beta (standardized)”. We do not specifically highlight this in the text, because the exact coefficients would depend on both on the analyzed frequency range and the selected electrodes.

      Figure 4I: The regressions explained at this point seems to contain a very large number of potential predictors, as I'm assuming it includes all sensors for both the ECG component and ECG rejected conditions? (if that is not the case, it could be explained in greater detail). I'm also not sure about the logic of taking a complete signal, decomposing it with ICA to separate out the ECG and non-ECG signals, then including them back into the same regression model. It seems that there could be some circularity or redundancy in doing so. However, I'm not confident that this is an issue, so would appreciate the authors explaining why it this is a valid approach (if that is the case).

      After observing significant effects both in the MEG<sub>ECG component</sub> and MEG<sub>ECG rejected</sub> conditions in similar frequency bands we wanted to understand whether or not these age-related changes are statistically independent. To test this we added both variables as predictors in a regression model (thereby accounting for the influence of the other in relation to age). The regression models we performed were therefore actually not very complex. They were built using only two predictors, namely the data (in a specific frequency range) averaged over channels on which we noticed significant effects in the ECG rejected and ECG components data respectively (Wilkinson notation: age ~ 1 + ECG rejected + ECG components). This was also described in the results section stating that: “To see if MEG<sub>ECG rejected</sub> and MEG<sub>ECG component</sub> explain unique variance in aging at frequency ranges where we noticed shared effects, we averaged the spectral slope across significant channels and calculated a multiple regression model with MEG<sub>ECG component</sub> and MEG<sub>ECG rejected</sub> as predictors for age (to statistically control for the effect of MEG<sub>ECG component</sub>s and MEG<sub>ECG rejected</sub> on age). This analysis was performed to understand whether the observed shared age-related effects (MEG<sub>ECG rejected</sub> and MEG<sub>ECG component</sub>) are in(dependent).”  

      We hope this explanation solves the previous misunderstanding.

      The explanation of results for relationships between spectral slopes and aging reported in Figure 4 refers to clusters of effects, but the statistical inference methods section doesn't explain how these clusters were determined.

      The wording of “cluster” was used to describe a “category” of effects e.g. null effects. We changed the wording from “cluster” to “category” to make this clearer stating now that: “This analysis, which is depicted in Figure 4, shows that over a broad amount of individual fitting ranges and sensors, aging resulted in a steepening of spectral slopes across conditions (see Figure 4E) with “steepening effects” observed in 25% of the processing options in MEG<sub>ECG not rejected</sub> , 0.5% in MEG<sub>ECG rejected</sub>, and 60% for MEG<sub>ECG components</sub>. The second largest category of effects were “null effects” in 13% of the options for MEG<sub>ECG not rejected</sub> , 30% in MEG<sub>ECG rejected</sub>, and 7% for MEG<sub>ECG components</sub>. ”

      Page 12: can the authors clarify whether these age related steepenings of the spectral slope in the MEG are when the data include the ECG contribution, or when the data exclude the ECG? (clarifying this seems critical to the message the authors are presenting).

      We apologize for not making this clearer. We now write: “This analysis also indicates that a vast majority of observed effects irrespective of condition (ECG components, ECG not rejected, ECG rejected) show a steepening of the spectral slope with age across sensors and frequency ranges.”

      Page 13: I think it would be useful to describe how much variance was explained by the MEG-ECG rejected vs MEG-ECG component conditions for a range of these analyses, so the reader also has an understanding of how much aperiodic neural activity might be influenced by age (vs if the effects are really driven mostly by changes in the ECG).

      With regards to the explained variance I think that the very important question of how strong age influences changes in aperiodic activity is a topic better suited for a meta analysis. As the effect sizes seems to vary largely depending on the sample e.g. for EEG in the literature results were reported at r=-0.08 (Cesnaite et al. 2023), r=-0.26 (Cellier et al. 2021), r=-0.24/r=-0.28/r=-0.35 (Hill et al. 2022) and r=0.5/r=0.7 (Voytek et al. 2015). I would defer the reader/reviewer to the standardized beta coefficients as a measure of effect size in the current study that is depicted in Figure 4A.

      Cellier, D., Riddle, J., Petersen, I., & Hwang, K. (2021). The development of theta and alpha neural oscillations from ages 3 to 24 years. Developmental cognitive neuroscience, 50, 100969.

      Cesnaite, E., Steinfath, P., Idaji, M. J., Stephani, T., Kumral, D., Haufe, S., ... & Nikulin, V. V. (2023). Alterations in rhythmic and non‐rhythmic resting‐state EEG activity and their link to cognition in older age. NeuroImage, 268, 119810.

      Hill, A. T., Clark, G. M., Bigelow, F. J., Lum, J. A., & Enticott, P. G. (2022). Periodic and aperiodic neural activity displays age-dependent changes across early-to-middle childhood. Developmental Cognitive Neuroscience, 54, 101076.

      Voytek, B., Kramer, M. A., Case, J., Lepage, K. Q., Tempesta, Z. R., Knight, R. T., & Gazzaley, A. (2015). Age-related changes in 1/f neural electrophysiological noise. Journal of Neuroscience, 35(38), 13257-13265.

      Also, if there are specific M/EEG sensors where the 1/f activity does relate strongly to age, it would be worth noting these, so future research could explore those sensors in more detail.

      I think it is difficult to make a clear claim about this for MEG data, as the exact location or type of the sensor may differ across manufacturers. Such a statement could be easier made for source projected data or in case EEG electrodes were available, where the location would be normed eg. according to the 10-20 system.

      DISCUSSION:

      Page 15: Please change the wording of the following sentence, as the way it is currently worded seems to suggest that the authors of the current manuscript have demonstrated this point (which I think is not the case): "The authors demonstrate that EEG typically integrates activity over larger volumes than MEG, resulting in differently shaped spectra across both recording methods."

      Apologies for the oversight! The reviewer is correct we in fact did not show this, but the authors of the cited manuscript. We correct the sentence as suggested stating now that:

      “Bénar et al. demonstrate that EEG typically integrates activity over larger volumes than MEG, resulting in differently shaped spectra across both recording methods.”

      Page 16: The authors mention the results can be sensitive to the application of SSS to clean the MEG data, but not ICA. I think it would be sensitive to the application of either SSS or ICA?

      This is correct and actually also supported by Figure S7, as differences in ICA thresholds affect also the detection of age-related effects. We therefore adjusted the related sentences stating now that:

      “ In case of the MEG signal this may include the application of Signal-Space-Separation algorithms (SSS(24,55)), different thresholds for ICA component detection (see Figure S7), high and low pass filtering, choices during spectral density estimation (window length/type etc.), different parametrization algorithms (e.g. IRASA vs FOOOF) and selection of frequency ranges for the aperiodic slope estimation.”

      It would be worth clarifying that the linked mastoid re-reference alone has been proposed to cancel out the ECG signal, rather than that a linked-mastoid re-reference improves the performance of the ICA separation (which could be inferred by the explanation as it's currently written).

      This is correct and we adjusted the sentence accordingly! Stating now that:

      “ Previous work(12,56) has shown that a linked mastoid reference alone was particularly effective in reducing the impact of ECG related activity on aperiodic activity measured using EEG. “

      The issue of the number of EEG channels could probably just be noted as a potential limitation, as could the issue of neural activity being mixed into the ECG component (although this does pose a potential confound to the M/EEG without ECG condition, I suspect it wouldn't be critical).

      This is indeed a very fair point as a higher amount of electrodes would probably make it easier to better isolate ECG components in the EEG, which may be the reason why the separation did not work so well in our case. However, this is ultimately an empirical question so we highlighted it in the discussion section stating that: “Difficulties in removing ECG related components from EEG signals via ICA might be attributable to various reasons such as the number of available sensors or assumptions related to the non-gaussianity of the underlying sources. Further understanding of this matter is highly important given that ICA is the most widely used procedure to separate neural from peripheral physiological sources. ”

      OUTLOOK:

      Page 19: Although there has been a recent trend to control for 1/f activity when examining oscillatory power, recent research suggests that this should only be implemented in specific circumstances, otherwise the correction causes more of a confound than the issue does. It might be worth considering this point with regards to the final recommendation in the Outlook section: Brake, N., Duc, F., Rokos, A., Arseneau, F., Shahiri, S., Khadra, A., & Plourde, G. (2024). A neurophysiological basis for aperiodic EEG and the background spectral trend. Nature Communications, 15(1), 1514.

      We want to thank the reviewer for recommending this very interesting paper! The authors of said paper present compelling evidence showing that, while peak detection above an aperiodic trend using methods like FOOOF or IRASA is a prerequisite to determine the presence of oscillatory activity, it’s not necessarily straightforward to determine which detrending approach should be applied to determine the actual power of an oscillation. Furthermore, the authors suggest that wrongfully detrending may cause larger errors than not detrending at all. We therefore added a sentence stating that: “However, whether or not periodic activity (after detection) should be detrended using approaches like FOOOF or IRASA still remains disputed, as incorrectly detrending the data may cause larger errors than not detrending at all(75).”

      RECOMMENDATIONS:

      Page 20: "measure and account for" seems like it's missing a word, can this be re-written so the meaning is more clear?

      Done as suggested. The sentence now states: “To better disentangle physiological and neural sources of aperiodic activity, we propose the following steps to (1) measure and (2) account for physiological influences.”

      I would re-phrase "doing an ICA" to "reducing cardiac artifacts using ICA" (this wording could be changed in other places also).

      I do not like to describe cardiac or ocular activity as artifactual per se. This is also why I used hyphens whenever I mention the word “artifact” in association with the ECG or EOG. However, I do understand that the wording of “doing an ICA” is a bit sloppy. We therefore reworded it accordingly throughout the manuscript to e.g. “separating cardiac from neural sources using an ICA” and “separating physiological from neural sources using an ICA”.

      I would additionally note that even if components are identified as unambiguously cardiac, it is still likely that neural activity is mixed in, and so either subtracting or leaving the component will both be an issue (https://doi.org/10.1101/2024.06.06.597688). As such, even perfect identification of whether components are cardiac or not would still mean the issue remains (and this issue is also consistent across a considerable range of component based methods). Furthermore, current methods including wavelet transforms on the ICA component still do not provide good separation of the artifact and neural activity.

      This is definitely a fair point and we also highlight this in our recommendations under 3 stating that:

      “However, separating physiological from neural sources using an ICA is no guarantee that peripheral physiological activity is fully removed from the cortical signal. Even more sophisticated ICA based methods that e.g. apply wavelet transforms on the ICA components may still not provide a good separation of peripheral physiological and neural activity76,77. This turns the process of deciding whether or not an ICA component is e.g. either reflective of cardiac or neural activity into a challenging problem. For instance, when we only extract cardiac components using relatively high detection thresholds (e.g. r > 0.8), we might end up misclassifying residual cardiac activity as neural. In turn, we can’t always be sure that using lower thresholds won’t result in misinterpreting parts of the neural effects as cardiac. Both ways of analyzing the data can potentially result in misconceptions.”

      Castellanos, N. P., & Makarov, V. A. (2006). Recovering EEG brain signals: Artifact suppression with wavelet enhanced independent component analysis. Journal of neuroscience methods, 158(2), 300-312.

      Bailey, N. W., Hill, A. T., Godfrey, K., Perera, M. P. N., Rogasch, N. C., Fitzgibbon, B. M., & Fitzgerald, P. B. (2024). EEG is better when cleaning effectively targets artifacts. bioRxiv, 2024-06.

      METHODS:

      Pre-processing, page 24: I assume the symmetric setting of fastica was used (rather than the deflation setting), but this should be specified.

      Indeed the reviewer is correct, we used the standard setting of fastICA implemented in MNE python, which is calling the FastICA implementation in sklearn that is per default using the “parallel” or symmetric algorithm to compute an ICA. We added this information to the text accordingly, stating that:

      “For extracting physiological “artifacts” from the data, 50 independent components were calculated using the fastica algorithm(22) (implemented in MNE-Python version 1.2; with the parallel/symmetric setting; note: 50 components were selected for MEG for computational reasons for the analysis of EEG data no threshold was applied).”

      Temporal response functions, page 26: can the authors please clarify whether the TRF is computed against the ECG signal for each electrode or sensory independently, or if all electrodes/sensors are included in the analysis concurrently? I'm assuming it was computed for each electrode and sensory separately, since the TRF was computed in both the forward and backwards direction (perhaps the meaning of forwards and backwards could be explained in more detail also - i.e. using the ECG to predict the EEG signal, or using the EEG signal to predict the ECG signal?).

      A TRF can also be conceptualized as a multiple regression model over time lags. This means that we used all channels to compute the forward and backward models. In the case of the forward model we predicted the signal of the M/EEG channels in a multivariate regression model using the ECG electrode as predictor. In case of the backward model we predicted the ECG electrode based on the signal of all M/EEG channels. The forward model was used to depict the time window at which the ECG signal was encoded in the M/EEG recording, which appears at 0 time lags indicating volume conduction. The backward model was used to see how much information of the ECG was decodable by taking the information of all channels.

      We tried to further clarify this approach in the methods section stating that:

      “We calculated the same model in the forward direction (encoding model; i.e. predicting M/EEG data in a multivariate model from the ECG signal) and backward direction (decoding model; i.e. predicting the ECG signal using all M/EEG channels as predictors).”

      Page 27: the ECG data was fit using a knee, but it seems the EEG and MEG data was not.

      Does this different pose any potential confound to the conclusions drawn? (having said this, Figure S4 suggests perhaps a knee was tested in the M/EEG data, which should perhaps be explained in the text also).

      This was indeed tested in a previous review round to ensure that our results are not dependent on the presence/absence of a knee in the data. We therefore added figure S4, but forgot to actually add a description in the text. We are sorry for this oversight and added a paragraph to S1 accordingly:

      “Using FOOOF(5), we also investigated the impact of different slope fitting options (fixed vs. knee model fits) on the aperiodic age relationship (see Supplementary Figure S4). The results that we obtained from these analyses using FOOOF offer converging evidence with our main analysis using IRASA.”

      Page 32: my understanding of the result reported here is that cleaning with ICA provided better sensitivity to the effects of age on 1/f activity than cleaning with SSS. Is this accurate? I think this could also be reported in the main manuscript, as it will be useful to researchers considering how to clean their M/EEG data prior to analyzing 1/f activity.

      The reviewer is correct in stating that we overall detected slightly more “significant” effects, when not additionally cleaning the data using SSS. However, I am a bit wary of recommending omitting the use of SSS maxfilter solely based on this information. It can very well be that the higher quantity of effects (when not employing SSS maxfilter) stems from other physiological sources (e.g. muscle activity) that are correlated with age and removed when applying SSS maxfiltering. I think that just conditioning the decision of whether or not maxfilter is applied based on the amount or size of effects may not be the best idea. Instead I think that the applicability of maxfilter for research questions related to aperiodic activity should be the topic of additional methodological research. We therefore now write in Text S1:

      “Considering that we detected less and weaker aperiodic effects when using SSS maxfilter is it advisable to omit maxfilter, when analyzing aperiodic signals? We don’t think that we can make such a judgment based on our current results. This is because it's unclear whether or not the reduction of effects stems from an additional removal of peripheral information (e.g. muscle activity; that may be correlated with aging) or is induced by the SSS maxfiltering procedure itself. As the use of maxfilter in detecting changes of aperiodic activity was not subject of analysis that we are aware of, we suggest that this should be the topic of additional methodological research.”

      Page 39, Figure S6 and Figure S8: Perhaps the caption could also briefly explain the difference between maxfilter set to false vs true? I might have missed it, but I didn't gain an understanding of what varying maxfilter would mean.

      Figure S6 shows the effect of ageing on the spectral slope averaged across all channels. The maxfilter set to false in AB) means that no maxfiltering using SSS was performed vs. in CD) where the data was additionally processed using the SSS maxfilter algorithm. We now describe this more clearly by writing in the caption:

      “Supplementary Figure S6: Age-related changes in aperiodic brain activity are most prominent on explained by cardiac components irrespective of maxfiltering the data using signal space separation (SSS) or not AC) Age was used to predict the spectral slope (fitted at 0.1-145Hz) averaged across sensors at rest in three different conditions (ECG components not rejected [blue], ECG components rejected [orange], ECG components only [green].”

    1. Author response:

      The following is the authors’ response to the original reviews

      Public reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Weber et al. investigate the role of 4 dopaminergic neurons of the Drosophila larva in mediating the association between an aversive high-salt stimulus and a neutral odor. The 4 DANs belong to the DL1 cluster and innervate non-overlapping compartments of the mushroom body, distinct from those involved in appetitive associative learning. Using specific driver lines, they show that activation of the DAN-g1 is sufficient to mimic an aversive memory and it is also necessary to form a high-salt memory of full strength, although optogenetic silencing of this neuron only partially affects the performance index. The authors use calcium imaging to show that the DAN-g1 is not the only one that responds to salt. DAN-c1 and d1 also respond to salt, but they seem to play no role in the assays tested. DAN-f1, which does not respond to salt, is able to lead to the formation of memory (if optogenetically activated), but it is not necessary for the salt-odor memory formation in normal conditions. However, silencing of DAN-f1 together with DAN-g1, enhances the memory deficit of DAN-g1.

      Strengths:

      The paper therefore reveals that also in the Drosophila larva as in the adult, rewards and punishments are processed by exclusive sets of DANs and that a complex interaction between a subset of DANs mediates salt-odor association.

      Overall, the manuscript contributes valuable results that are useful for understanding the organization and function of the dopaminergic system. The behavioral role of the specific DANs is accessed using specific driver lines which allow for testing of their function individually and in pairs. Moreover, the authors perform calcium imaging to test whether DANs are activated by salt, a prerequisite for inducing a negative association with it. Proper genetic controls are carried across the manuscript.

      Weaknesses:

      The authors use two different approaches to silence dopaminergic neurons: optogenetics and induction of apoptosis. The results are not always consistent, and the authors could improve the presentation and interpretation of the data. Specifically, optogenetics seems a better approach than apoptosis, which can affect the overall development of the system, but apoptosis experiments are used to set the grounds of the paper.

      The physiological data would suggest the role of a certain subset of DANs in salt-odor association, but a different partially overlapping set seems to be necessary. This should be better discussed and integrated into the author's conclusion. The EM data analysis reveals a non-trivial organization of sensory inputs into DANs and it is hard to extrapolate a link to the functional data presented in the paper.

      We would like to thank reviewer 1 for the positive evaluation of our work and for the critical suggestions for improvement. In the new version of the manuscript, we have centralized the optogenetic results and moved some of the ablation experiments to the Supplement. We also discuss in detail the experimental differences in the results. In addition, we have softened our interpretation of the specificity of memory for salt. As a result, we now emphasize more the general role of DANs for aversive learning in the larva. These changes are now also summarized and explained more simply and clearly in the Discussion, along with a revised discussion of the EM data.

      Reviewer #2 (Public Review):

      Summary:

      In this work, the authors show that dopaminergic neurons (DANs) from the DL1 cluster in Drosophila larvae are required for the formation of aversive memories. DL1 DANs complement pPAM cluster neurons which are required for the formation of attractive memories. This shows the compartmentalized network organization of how an insect learning center (the mushroom body) encodes memory by integrating olfactory stimuli with aversive or attractive teaching signals. Interestingly, the authors found that the 4 main dopaminergic DL1 neurons act redundantly, and that single-cell ablation did not result in aversive memory defects. However, ablation or silencing of a specific DL1 subset (DAN-f1,g1) resulted in reduced salt aversion learning, which was specific to salt but no other aversive teaching stimuli were tested. Importantly, activation of these DANs using an optogenetic approach was also sufficient to induce aversive learning in the presence of high salt. Together with the functional imaging of salt and fructose responses of the individual DANs and the implemented connectome analysis of sensory (and other) inputs to DL1/pPAM DANs, this represents a very comprehensive study linking the structural, functional, and behavioral role of DL1 DANs. This provides fundamental insight into the function of a simple yet efficiently organized learning center which displays highly conserved features of integrating teaching signals with other sensory cues via dopaminergic signaling.

      Strengths:

      This is a very careful, precise, and meticulous study identifying the main larval DANs involved in aversive learning using high salt as a teaching signal. This is highly interesting because it allows us to define the cellular substrates and pathways of aversive learning down to the single-cell level in a system without much redundancy. It therefore sets the basis to conduct even more sophisticated experiments and together with the neat connectome analysis opens the possibility of unraveling different sensory processing pathways within the DL1 cluster and integration with the higher-order circuit elements (Kenyon cells and MBONs). The authors' claims are well substantiated by the data and clearly discussed in the appropriate context. The authors also implement neat pathway analyses using the larval connectome data to its full advantage, thus providing network pathways that contribute towards explaining the obtained results.

      Weaknesses:

      While there is certainly room for further analysis in the future, the study is very complete as it stands. Suggestions for clarification are minor in nature.

      We would like to thank reviewer 2 for the positive evaluation of our work. In fact, follow-up work is already underway to further analyze the role of the individual DL1 DANs. We have addressed the constructive and detailed suggestions for improvement in our point-by-point responses in the “Recommendations for the authors” section.

      Reviewer #3 (Public Review):

      The study of Weber et al. provides a thorough investigation of the roles of four individual dopamine neurons for aversive associative learning in the Drosophila larva. They focus on the neurons of the DL-1 cluster which already have been shown to signal aversive teaching signals. However, the authors go far beyond the previous publications and test whether each of these dopamine neurons responds to salt or sugar, is necessary for learning about salt, bitter, or sugar, and is sufficient to induce a memory when optogenetically activated. In addition, previously published connectomic data is used to analyze the synaptic input to each of these dopamine neurons. The authors conclude that the aversive teaching signal induced by salt is distributed across the four DL-1 dopamine neurons, with two of them, DAN-f1 and DAN-g1, being particularly important. Overall, the experiments are well designed and performed, support the authors' conclusions, and deepen our understanding of the dopaminergic punishment system.

      Strengths:

      (1) This study provides, at least to my knowledge, the first in vivo imaging of larval dopamine neurons in response to tastants. Although the selection of tastants is limited, the results close an important gap in our understanding of the function of these neurons.

      (2) The authors performed a large number of experiments to probe for the necessity of each individual dopamine neuron, as well as combinations of neurons, for associative learning. This includes two different training regimens (1 or 3 trials), three different tastants (salt, quinine, and fructose) and two different effectors, one ablating the neuron, the other one acutely silencing it. This thorough work is highly commendable, and the results prove that it was worth it. The authors find that only one neuron, DAN-g1, is partially necessary for salt learning when acutely silenced, whereas a combination of two neurons, DAN-f1 and DAN-g1, are necessary for salt learning when either being ablated or silenced.

      (3) In addition, the authors probe whether any of the DL-1 neurons is sufficient for inducing an aversive memory. They found this to be the case for three of the neurons, largely confirming previous results obtained by a different learning paradigm, parameters, and effector.

      (4) This study also takes into account connectomic data to analyze the sensory input that each of the dopamine neurons receives. This analysis provides a welcome addition to previous studies and helps to gain a more complete understanding. The authors find large differences in inputs that each neuron receives, and little overlap in input that the dopamine neurons of the "aversive" DL-1 cluster and the "appetitive" pPAM cluster seem to receive.

      (5) Finally, the authors try to link all the gathered information in order to describe an updated working model of how aversive teaching signals are carried by dopamine neurons to the larva's memory center. This includes important comparisons both between two different aversive stimuli (salt and nociception) and between the larval and adult stages.

      Weaknesses:

      (1) The authors repeatedly claim that they found/proved salt-specific memories. I think this is problematic to some extent.

      (1a) With respect to the necessity of the DL-1 neurons for aversive memories, the authors' notion of salt-specificity relies on a significant reduction in salt memory after ablating DAN-f1 and g1, and the lack of such a reduction in quinine memory. However, Fig. 5K shows a quite suspicious trend of an impaired quinine memory which might have been significant with a higher sample size. I therefore think it is not fully clear yet whether DAN-f1 and DAN-g1 are really specifically necessary for salt learning, and the conclusions should be phrased carefully.

      (1b) With respect to the results of the optogenetic activation of DL-1 neurons, the authors conclude that specific salt memories were established because the aversive memories were observed in the presence of salt. However, this does not prove that the established memory is specific to salt - it could be an unspecific aversive memory that potentially could be observed in the presence of any other aversive stimuli. In the case of DAN-f1, the authors show that the neuron does not even get activated by salt, but is inhibited by sugar. Why should activation of such a neuron establish a specific salt memory? At the current state, the authors clearly showed that optogenetic activation of the neurons does induce aversive memories - the "content" of those memories, however, remains unknown.

      (2) In many figures (e.g. figures 4, 5, 6, supplementary figures S2, S3, S5), the same behavioural data of the effector control is plotted in several sub-figures. Were these experiments done in parallel? If not, the data should not be presented together with results not gathered in parallel. If yes, this should be clearly stated in the figure legends.

      We would also like to thank reviewer 3 for his positive assessment of our work. As already mentioned by reviewer 1, we understand the criticism that the salt specificity for which the individual DANs are coded is not fully always supported by the results of the work. We have therefore rewritten the relevant passages, which are also cited by the reviewer. We have also included the second point of criticism and incorporated it into our manuscript. As the control groups were always measured in parallel with the experimental animals, we can also present the data together in a sub-figure. We clearly state this now in the revised figure legends.

      Summary of recommendations to authors:

      Overall, the study is commendable for its systematic approach and solid methodology. Several weaknesses were identified, prompting the need for careful revisions of the manuscript:

      We thank the reviewers for the careful revision of our manuscript. In the subsequent sections, we aim to address their concerns as thoroughly as possible. A comprehensive one-to-one listing can be found below.

      (1) The authors should reconsider their assertion of uncovering a salt-specific memory, as the evidence does not conclusively demonstrate the exclusive necessity of DAN-f1 and DAN-g1 for salt learning. In particular, the optogenetic activation of DAN-f1 leads to plasticity but this might not be salt-specific. The precise nature of the memory content remains elusive, warranting a nuanced rephrasing of the conclusions.

      We only partially agree – optogenetic activation of DANs does not really allow to comment on its salt-specificity, true. However, we used high-salt concentrations during test. Over the years, the Gerber lab nicely demonstrated in several papers that larvae recall an aversive odor-salt memory only if salt is present during test (Gerber and Hendel, 2006; Niewalda et al 2008; Schleyer et al. 2011; Schleyer et al. 2015). The used US has to be present during test. Even at the same concentration other aversive stimuli (e.g. bitter quinine) are not able to allow the larvae to recall this particular type of memory. So, if the optogenetic activation of DAN-f1 establishes a memory that can be recalled on salt, we argue that it has to encode aspects of the salt information. On the other hand, only for DAN-g1 we see the necessity for salt learning. And – although (based on the current literature) very unlikely, we cannot fully exclude that the activation of DAN-f1 establishes a yet unknown type of memory that can be also recalled on a salt plate. Therefore, we partially agree and accordingly have rephrased the entire manuscript to avoid an over-interpretation of our data. Throughout the manuscript we avoid now to use the term salt-specific memory but rather describe the type of memory as aversive memory.

      (2) A thorough examination or discussion about the potential influence of blue light aversion on behavioral observations is necessary to ensure a balanced interpretation of the findings.

      To address this point every single behavioral experiment that uses optogenetic blue light activation runs with appropriate and mandatory controls. For blue light activation experiments, two genetic controls are used that either get the same blue light treatment (effector control, w1118>UAS-ChR2XXL) or no blue light treatment (dark control, XY-split-Gal4>UAS-ChR2XXL). For blue light inactivation experiments one group is added that has exactly the same genotype but did not receive food containing retinal. These experiments show that blue light exposure itself does not induce an aversive nor positive memory and blue light exposure does not impair the establishment of odor-high salt memory. In addition, we used the latest established transgenes available. ChR2<sup>XXL</sup> is very sensitive to blue light. Only 220 lux (60 µW/cm<sup>²</sup>) were necessary to obtain stable results. In our hands – short term exposure for up to 5 minutes with such low intensities does not induce a blue light aversion. Following the advice of the reviewer, we also address this concern by adding several sentences into the related results and methods sections.

      (3) The authors should address the limitations associated with the use of rpr/hid for neuronal ablations, such as the effects of potential developmental compensation.

      We agree with this concern. It is well possible that the ablation experiments induce compensatory effects during larval development. Such an effect may be the reason for differences in phenotypes when comparing hid,rpr ablation with optogenetic inhibition. This is now part of the discussion. In addition, we evaluated if the ablation worked in our experiments. So far controls were missing that show that the expression of hid,rpr really leads to the ablation of DANs. We now added these experiments and clearly show anatomically that the DANs are ablated (related to figure 4-figure supplement 6).

      (4) While the connectome analysis offers valuable insights into the observed functions of specific DANs in relation to their extrinsic (sensory) and intrinsic (state) inputs, integrating this data more cohesively within the manuscript through careful rewriting would enhance the coherence of the study.

      We understand this concern. Therefore, the new version of our manuscript is now intensifying the inclusion of the EM data in our interpretation of the results. Throughout the entire manuscript we have now rewritten the related parts. We have also completely revised the corresponding section in the results chapter.

      (5) More generally, the authors are encouraged to discuss internal discrepancies in the results of their functional manipulation experiments.

      Thank you for this suggestion. We do of course understand that we have not given the different results enough space in the discussion. We have now changed this and have been happy to comprehensively address the concern. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Here are some suggestions for clarification and improvement of the manuscript:

      (1) The authors should discuss why the silencing experiment with TH-GAL4 (Fig. 1) does not abolish memory formation (I assume that the PI should go to zero). Does it mean that other non-TH neurons are involved in salt-odor memory formation? Are there other lines that completely abolish this type of learning?

      Thank you very much for highlighting this crucial point. Indeed, the functional intervention does not completely eliminate the memory. There could be several reasons, or a combination thereof, for this outcome. For instance, it's plausible that the UAS-GtACR2 effector doesn't entirely suppress the activity of dopaminergic neurons. Additionally, the memory may comprise different types, not all of which are linked to dopamine function. It's also noteworthy that TH-Gal4 doesn't encompass all dopaminergic neurons – even a neuron from the DL1 cluster is absent (as previously reported in Selcho et al., 2009). Considering we're utilizing high salt concentrations in this experiment, it's conceivable that non gustatory-driven memories are formed based solely on the systemic effects of salt (e.g., increased osmotic pressure). These possibilities are now acknowledged in the text.

      (2) The Rpr experiments in Fig. 4 do not lead to any phenotype and there is a general assumption that the system compensates during development. However, there is no demonstration that Rpr worked or that development compensated for that. What do we learn from these data? Would it make sense to move it to supplement to make the story more compact? In addition: the conclusion at L 236 "DL1.... Are not individually necessary" is later disproved by optogenetic silencing. Similarly, optogenetic silencing of f1+g1 is affecting 1X and 3X learning, but not when using Rpr. Moreover, Rpr wdid not give any phenotype in other data in the supplementary material. I'm not sure how valid these results are.

      We acknowledge this concern and have actively deliberated various options for restructuring the presented ablation data. Ultimately, we reached a consensus that relocating Figure 4 to the supplement is warranted. Furthermore, corresponding adjustments have been made in the text. This decision amplifies the significance of the optogenetic results. In addition, we also addressed the other part of the concern. We examined the efficacy of hid and rpr in our experiments. Indeed, we successfully ablated specific DANs, as illustrated in the new anatomical data presented in Figure 4- figure supplement 6, which strengthens the interpretation of the hid,rpr experiments.

      (3) In most figures that show data for 1X and 3X training, there is no difference between these two conditions (I would suggest moving one set as a supplement). When a difference appears (Fig.5A-D) the implications are not discussed properly. Is it known that some circuits are necessary for the 1X but not for the 3X protocol? Is that a reasonable finding? I would expect the opposite, but I might lack of knowledge here. However, the optogenetic silencing of the same neurons in Figure 7 shows the same phenotype for 1X and 3X. Again, the validity of the Rpr experiments seems debatable.

      Different training protocols lead to different memory phases (STM and STM+ARM). We have shown that in the past in Widmann et al. 2016. Therefore, we are convinced that it makes sense to keep both data sets in the main manuscript. However, we agree that this was not properly introduced and discussed and therefore made the respective changes in the manuscript.

      (4) In Figure 3, it is unclear what the responses were tested against. Since they are so small and noisy there would be a need for a control. Moreover, in some cases, it looks like the DF/F is normalized to the wrong value: e.g. in DAN-c1 100mM, the activity in 0-10s is always above zero, and in pPAM with fructose is always below zero. This might not have any consequence on the results but should be adjusted.

      Thank you very much for your criticism, which we greatly appreciate. We have carefully re-examined the data and found that there was a mistake for the normalization of the values. We made the necessary adjustments to the evaluation, as per your suggestions. The updated figures, figure legends, and results have been incorporated into the new version of the manuscript. As noted by the reviewer, these corrections have not altered the interpretation of the data or the primary responses of the various DANs.

      (5) In the abstract: "Optogenetic activation of DAN-f1 and DAN-g1 alone suffices to substitute for salt punishment... Each DAN encodes a different aspect of salt punishment". These sentences might be misleading and an overstatement: only DAN-g1 shows a clear role, while the function of the other DANs in the context of salt-odor learning remains obscure.

      We have refined the respective part of the abstract accordingly. Consequently, we have reworded the related section, aiming to avoid any exaggeration.

      (6) The physiology is done in L1 larvae but behavior is tested in L3 larvae. There could be a change in this time that could explain the salt responses in c1 and d1 but no role in salt-odor learning?

      While we cannot dismiss the possibility of a developmental change from L1 to L3, a comparison of the anatomical data of the DL1 DANs from electron microscopy (EM) and light microscopy (LM) data indicates that their overall morphology remains consistent. However, it's important to note that this observation does not analyse the physiological aspects of these cells. Consequently, we have incorporated this concern into the discussion of the revised version of the manuscript.

      (7) The introduction needs some editing starting at L 129, as it ends with a discussion of a previously published EM data analysis. I would rather suggest stating which questions are addressed in this paper and which methods will be used and perhaps a hint on the results obtained.

      We understand the concern. We have added a concise paragraph to the conclusion of the introduction, highlighting the biological question, technical details, and a short hint on the acquired findings.

      (8) It is clear to me that the presentation of salt during the test is necessary for recall, however in L 166 I don't understand the explanation: how is the memory used in a beneficial way in the test? The salt is present everywhere and the odor cue is actually useless to escape it.

      Extensive research, exemplified by studies such as Schleyer et al. (2015) published in Elife, clearly demonstrates that the recall of odor-high salt memory occurs exclusively when tested on a high salt plate. Even when tested on a bitter quinine plate, the aversive memory is not recalled. This phenomenon is attributed to the triggering of motivation to recall the memory by the omnipresent abundance of the unconditioned stimulus (US) during the test, which in our case is high salt. Furthermore, the concentration of the stimulus plays a crucial role (Schleyer et al. 2011). The odor cue indicates where the situation could potentially be improved; however, if high salt is absent, this motivational drive diminishes as there is no memory present to enhance the already favorable situation. Additionally, the motivation to evade the omnipresent and unpleasant high salt stimulus persists throughout the entire 5-minute test period.

      (9) L288: the fact that f1 shows a phenotype in this experiment does not mean that it encodes a salt signal, indeed it does not respond to salt. It perhaps induces a plasticity that can be recalled by salt, but not necessarily linked to salt. The synergy between f1 and g1 in the salt assay was postulated based on exp with Rpr, but the validity of these experiments is dubious. I'm not sure there is sufficient evidence from Figures 6 and 7 to support a synergistic action between f1 and g1.

      It is true that DAN-f1 alone is not necessary for mediating a high salt teaching signal based on ablation, optogenetic inhibition and even physiology. However, optogenetic activation alone shows a memory tested on a salt plate. Given the logic explained above that is accepted by several publications, we would like to keep the statement. Especially as the joined activation with DAN-g1 gives rise to significant higher or lower values after joined optogenetic activation or inactivation (Figure 5E and F, Figure 6E and F in the new version). Nevertheless, we have modified the sentence. In the text we describe these effects now as “these results may suggest that DAN-f1 and DAN-g1 encode aspects of the natural aversive high salt teaching signal under the conditions that we tested”. We think that this is an appropriate and three-fold restricted statement. Therefore, we would like to keep it in this restricted version. However, we are happy to reconsider this if the reviewer thinks it is critical. 

      (10) I find the EM analysis hard to read. First of all, because of the two different graphical representations used in Fig. 8, wouldn't one be sufficient to make the point? Secondly, I could not grasp a take-home-message: what do we learn from the EM data? Do they explain any of the results? It seems to me that they don't provide an explanation of why some DL1 neurons respond to salt and others don't.

      We understand that the EM analysis is hard to read and have now carefully rewritten this part of the manuscript. See also general concern 4 above. The main take home message is not to explain why some DL1 neurons respond to salt and other do not. This cannot be resolved due to the missing information on the salt perceiving receptor cells. Unfortunately, we miss the peripheral nervous system in the EM - the first layer of salt information processing. However, our analysis shows clearly that the 4 DANs have their own identity based on their connectivity. None of them is the same – but to a certain extent similarities exist. This nicely reflects the physiological and behavioral results. We have now clarified that in the result to ease the understanding for the readership. In addition, we also clearly state that we don’t address the point why some DL1 neurons respond to salt and why others don’t respond.

      (11) Do the manipulations (activation and silencing) affect odor preference in the presence of salt? Did the authors test that the two odors do not drive different behaviors on the salty plate? Or did they only test the odor preference on plain agarose? Can we exclude a role for the DAN in driving multisensory-driven innate behavior?

      Innate odor preferences are not changed by the presence of salt or even other tastants (this work but see also Schleyer et al 2015, Figure 3, Elife). Even the naïve choice between two odors is the same if tested in the presence of different tastants (Schleyer et al 2015, Figure 3, Elife). This shows – at least for the tested stimuli and conditions – that are similar to the ones that we use – that there is no multisensory-driven innate odor-taste behavior. Therefore – at least to our knowledge - experiments as the ones suggested by the reviewer were never done in larval odor-taste learning studies. Therefore, we suggest that DAN activation has no effect on innate larval behavior. However, we are happy to reconsider this if the reviewer thinks it is critical. 

      (12) L 280: the authors generalize the conclusion to all DL1-DANs, but it does not apply to c1 and d1.

      Thanks for this comment. We deleted that sentence as suggested and thus do not anymore generalize the conclusion to all DL-DANs.

      (13) L345: I do not see the described differences in Fig. 8F, presynaptic sites of both types seem to appear in rather broad regions: could the author try to clarify this?

      We understand that the anatomical description of the data is often hard to read. Especially to readers that are not used to these kind of figures. We have therefore modified the text to ease the understanding and clarify the difference in the labeled brain regions for the broad readership.

      (14) L373: the conclusion on c1 is unsupported by data: this neuron responds to both salt and fructose (Figure 3 ) while the conclusion is purely based on EM data analysis.

      The sentence is not a conclusion but a speculation and we also list the cell's response to positive and negative gustatory stimuli. Therefore, we do not understand exactly what the reviewer means here. However, we have tried to address the criticism and have revised the sentences.

      (15) L385: the data on d1 seem to be inconsistent with Eschbach 2020, but the authors do not discuss if this is due to the differential vs absolute training, or perhaps the presence of the US during the test (which does not seem to be there in Eschbach, 2020) - is the training protocol really responsible for this inconsistency? For f1 the data seem to be consistent across these studies. The authors should clarify how the exp in Fig 6 differs from Eschbach, 2020 and how one could interpret the differences.

      True. This concern is correct. We now discuss the difference in more detail. Eschbach et al. used Cs-Crimson as a genetic tool, a one odor paradigm with 3 training cycles, and no gustatory cues in their approach. These differences are now discussed in the new version of the manuscript.

      (16) L460-475 A long part of this paragraph discusses the similarities between c1 and d1 and corresponding PPL1 neurons in the adult fly. However, c1 and d1 do not really show any phenotype in this paper, I'm not sure what we learn from this discussion and how much this paper can contribute to it. I would have wished for a discussion of how one could possibly reconcile the observed inconsistencies.

      Based on the comments of the different reviewers several paragraphs in the discussion were modified. We agree that the part on the larval-adult comparison is quite long. Thus we have shortened it as suggested by the reviewer.

      Minor corrections:

      L28 "resultant association" maybe resulting instead.

      L55 "animals derive benefit": remove derive.

      L78 "composing 12,000 neurons": composed of.

      L79 what is stable in a "stable behavioral assay"?

      L104: 2 times cluste.

      L122: "DL1 DANs are involved" in what?

      Fig. 1 please check subpanels labels, D repeats.

      L 362: "But how do individual neurons contribute to the teaching signal of the complete cluster?" I don't understand the question.

      L364 I did not hear before about the "labeled line hypothesis" in this context - could the author clarify?

      L368: edit "combinatorically".

      L390: "current suppression" maybe acute suppression.

      L 400 I'm not sure what is meant by "judicious functional configuration" and "redundancy". The functions of these cells are not redundant, and no straightforward prediction of their function can be done from their physiological response to salt.

      Thanks a lot for your in detail review of our manuscript. We welcome your well-taken concerns and have made the requested changes for all points that you have raised.

      Reviewer #2 (Recommendations For The Authors):

      (1) In Figure 1 the reconstruction of pPAM and DL1 DANs shows the compartmentalized innervation of the larval MB. However, the images are a bit low in color contrast to appreciate the innervation well. In particular in panel B, it is hard to identify the innervated MB body structure. A schematic model of the larval MB and DAN innervation domains like in Fig. 2A would help to clarify the innervation pattern to the non-specialist.

      We understand this concern and have changed figure 1 as suggested by the reviewer. A schematic model of the MB and DANs is now presented already in figure 1 as well as the according supplemental figure.

      (2) Blue light itself can be aversive for larvae and thus interfere with the aversive learning paradigm. Does the given Illuminance (220 lux) used in these experiments affect the behavior and learning outcome?

      Yes, in former times high intensities of blue light were necessary to trigger the first generation optogenetic tools. The high intensity blue light itself was able to establish an aversive memory (e.g. Rohwedder et al. 2016). Usage of the second generation optogenetic tools allowed us to strongly reduce the applied light intensity. Now we use 220 lux (equal to 60 µW/cm<sup>2</sup>). Please note that all Gal4 and UAS controls in the manuscript are nonsignificant different from zero. The mild blue light stimulation therefore does not serve as a teaching signal and has neither an aversive nor an appetitive effect. Furthermore, we use this mild light intensity for several other behavioral paradigms (locomotion, feeding, naïve preferences) and have never seen an effect on the behavior.

      (3) Fig.2: Except for MB054B-Gal4 only the MB expression pattern is shown for other lines. Is there any additional expression in other cells of the brain? In the legend in line 761, the reporter does not show endogenous expression, rather it is a fluorescent reporter signal labeling the mushroom body.

      The lines were initially identified by a screen on larval MB neurons done together with Jim Truman, Marta Zlatic and Bertram Gerber. Here full brain scans were always analyzed. These images can be seen in Eschbach et al. 2020, extended figure 1. Neither in their evaluation nor in our anatomical evaluation (using a different protocol) additional expression in brain cells was detectable. We also modified the figure legend as suggested.

      (4) Fig.3: Precise n numbers per experiment should be stated in the figure legend.

      True, we now present n numbers per experiment whenever necessary.

      (5) Fig.4: Have the authors confirmed complete ablation of the targeted neuron using rpr/hid? Ablations can be highly incomplete depending on the onset and strength of Gal4 expression, leaving some functionality intact. While the ablation experiments are largely in line with the acute silencing of single DANs during high salt learning performed later on (Fig.7), there is potentially an interesting aspect of developmental compensation hidden in this data. Not a major point, but potentially interesting to check.

      We agree with this criticism. We have not tested if the expression of hid,rpr in DL1 DANs does really ablate them. Therefore we did an additional experiment to show that. The new data is now present as a supplemental figure (Figure 4- figure supplement 6). The result shows that expression of hid,rpr ablates also DL1 DANs similar to earlier experiments where we used the same effectors to ablate serotoniergic neurons (Huser et al., 2012, figure 5).

      (6) The performance index in Fig. 4 and 5 sometimes seems lower and the variability is higher than in some of the other experiments shown. Is this due to the high intrinsic variability of these particular experiments, or the background effects of the rpr/hid or splitGal4 lines?

      The general variability of these experiments is within the expected and known borders. In these kind of experiments there is always some variation due to several external factors (e.g. experimental time over the year). Therefore it is always important to measure controls and experimental animals at the same time. Of course that’s what we did and we only compare directly results of individual datasets. But not between different datasets. This is further hampered given that the experiments of Figure 4 (now Figure 4- figure supplement 1) and Figure 5 (now Figure 4) differ in several parameters from other learning experiments presented later in the text. Optogenetic activation uses blue light stimulation instead of “real world” high salt. Most often direct activation of specific DANs in the brain is more stable than the external high salt stimulation. Also optogenetic inactivation uses blue light stimulation and also retinal supplemented food. Both factors can affect the measurement. We thus want to argue that it is for each experiment most often the particular parameters that affect the variability of the results rather than background effects of the rpr/hid and split-Gal4 lines.

      (7) Fig.7: This is a neat experiment showing the effects of acute silencing of individual DL1 DANs. As silencing DAN-f1/g1 does not result in complete suppression of aversive learning, it would be highly interesting to test (or speculate about) additive or modulatory effects by the other DANs. Dan-c-1/d-1 also responds to high salt but does not show function on its own in these assays. I am aware that this is currently genetically not feasible. It would however be a nice future experiment.

      True, we were intensively screening for DL1 cluster specific driver lines that cover all 4 DL1 neurons or other combinations than the ones we tested. Unfortunately, we did not succeed in identifying them. Nevertheless, we will further screen new genetic resources (e.g. Meissner et al., 2024, bioRxiv) to expand our approach in future experiments. Please also see our comment on concern 1 of reviewer 1 for further technical limitations and biological questions that can also potentially explain the absence of complete suppression of high salt learning and memory. Some of these limitations are now also mentioned and discussed in the new version of the manuscript.

      (8) The discussion is excellent. I would just amend that it is likely that larval DAN-c1, which has high interconnectivity within the larval CNS, is likely integrating state-dependent network changes, similar to the role of some DANs in innate and state-dependent preference behavior. This might contribute to modulating learned behavior depending on the present (acute) and previous environmental conditions.

      Thanks a lot for bringing this up. We rewrote this part and added a discussion on recent work on DAN-c1 function in larvae as well as results on DAN function in innate and state-dependent preference behavior.

      (9) Citation in line 1115 missing access information: "Schnitzer M, Huang C, Luo J, Je Woo S, Roitman L, et al. 2023. Dopamine signals integrate innate and learned valences to regulate memory dynamics. Research Square".

      Unfortunately this escaped our notice. The paper is now published in Nature: Huang, C., Luo, J., Woo, S.J. et al. Dopamine-mediated interactions between short- and long-term memory dynamics. Nature 634, 1141–1149 (2024). https://doi.org/10.1038/s41586-024-07819-w. We have now changed the citation. The new citation includes the missing access information.

      Reviewer #3 (Recommendations For The Authors):

      Regarding my issue about salt specificity in the public review, I want to make clear that I do not suggest additional experiments, but to be very careful in phrasing the conclusions, in particular whenever referring to the experiments with optogenetic activation. This includes presenting these experiments as "(salt) substitution" experiments - inferring that the optogenetic activation would substitute for a natural salt punishment. As important and interesting as the experiments are, they simply do not allow such an interpretation at this point.

      Results, line 140ff: When presenting the results regarding TH-Gal4 crossed to ChR2-XXL, please cite Schroll et al. 2006 who demonstrated the same results for the first time.

      Thanks for mentioning this. We now cite Schroll et al. 2006 here in the text of the manuscript.

      Figure 3: The subfigure labels (ABC) are missing.

      Unfortunately this escaped our notice. Thanks a lot – we have now corrected this mistake.

      Figure 5: For I and L, it reads "salt replaced with fru", but the sketch on the left shows salt in the test. I assume that fructose was not actually present in the test, and therefore the figure can be misleading. I suggest separate sketches. Also, I and L are not mentioned in the figure legend.

      True, this is rather confusing. Based on the well taken concern we have changed the figure by adding a new and correct scheme for sugar reward learning that does not symbolize fructose during test.

      Figure S1: The experimental sketches for E,F and G,H seem to be mixed up.

      We thank the reviewer for bringing this up. In the new version we corrected this mistake.

      Figure S5: There are three sub-figures labelled with B. Please correct.

      Again, thanks a lot. We made the suggested correction in Figure S5.

      Discussion, line 353ff: this and the following sentences can be read as if the authors have discovered the DL-1 neurons as aversive teaching mediators in this study. However, Eschbach et al. 2020 already demonstrated very similar results regarding the optogenetic activation of single DL-1 DANs. I suggest to rephrase and cite Eschbach et al. 2020 at this point.

      That is correct. Our focus was on the gustatory pathway. The original discovery was made by Eschbach et al. We have now corrected this in the discussion and clarified our contribution. It was never our intention to hide this work, as the laboratory was also involved. Nevertheless, this is an annoying omission on our side.

      Line 385-387: this sentence is only correct with respect to Eschbach et al. 2020. Weiglein et al. 2021 used ChR2-XXL as an effector, but another training regimen.

      We understand this criticism. Therefore, we changed the sentence as suggested by the reviewer. See also our response on concern 15 of reviewer 1.

      Line 389ff: I do not understand this sentence. What is meant by persistent and current suppression of activity? If this refers to the behavioural experiments, it is misleading as in the hid, reaper experiments neurons are ablated and not suppressed in activity.

      We made the requested changes in the text. It is true that the ablation of a neuron throughout larval life is different from constantly blocking the output of a persisting neuron.

      Methods, line 615 ff: the performance index is said to be calculated as the difference between the two preferences, but the equation shows the average of the preferences.

      Thanks a lot. We are sorry for the confusion. We have carefully rewritten this part of the methods section to avoid any misunderstanding.

      When discussing the organization of the DL1 cluster, on several occasions I have the impression the authors use the terms "redundant" and "combinatorial" synonymously. I suggest to be more careful here. Redundancy implies that each DAN in principle can "do the job", whereas combinatorial coding implies that only a combination of DANs together can "do the job". If "the job" is establishing an aversive salt memory, the authors' results point to redundancy: no experimental manipulation totally abolished salt learning, implying that the non-manipulated neurons in each experiment sufficed to establish a memory; and several DANs, when individually activated, can establish an aversive memory, implying that each of them indeed can "do the job".

      Based on this concern we have rewritten the discussion as suggested to be more precise when talking about redundancy or combinatorial coding of the aversive teaching signal. Basically, we have removed all the combinatorial terms and replaced them by the term “redundancy”.

      The authors mix parametric and non-parametric statistical tests across the experiments dependent on whether the distribution of the data is normal or not. It would help readers if the authors would clearly state for which data which tests were used.

      We understand the criticism and now have added an additional supplemental file that includes all the information on the statistical tests applied and the distribution of the data.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      This study experimentally examined diet-microbe-host interactions through a complex systems framework, centered on dietary oxalate. Multiple, independent molecular, animal, and in vitro experimental models were introduced into this research. The authors found that microbiome composition influenced multiple oxalate-microbe-host interfaces. Oxalobacter formigenes were only effective against a poor oxalate-degrading microbiota background and give critical new insights into why clinical intervention trials with this species exhibit variable outcomes. Data suggest that, while heterogeneity in the microbiome impacts multiple diet-host-microbe interfaces, metabolic redundancy among diverse microorganisms in specific diet-microbe axes is a critical variable that may impact the efficacy of bacteriotherapies, which can help guide patient and probiotic selection criteria in probiotic clinical trials.

      Thank you. The main message of this research, is that through complex modelling, we believe we have identified the critical variable (metabolic redundancy) that is responsible for the efficacy of probiotics designed to reduce oxalate levels, thus allowing for improved patient selection in clinical trials. We also believe that this process and the critical features identified can be translated to other critical microbial functions such as short chain fatty acid synthesis, secondary bile acid synthesis, and others.

      Strengths:

      The paper has made significant progress in both the depth and breadth of scientific research by systematically comparing multiple experimental methods across multiple dimensions. Particularly through in-depth analysis from the enzymatic perspective, it has not only successfully identified several key strains and redundant genes, which is of great significance for understanding the functions of enzymes, the characteristics of strains, and the mechanisms of genes in microbial communities, but also provided a valuable reference for subsequent experimental design and theoretical research.

      More importantly, the establishment of a novel research approach to probiotics and gut microbiota in this paper represents a major contribution to the current research field. The proposal of this new approach not only breaks through the limitations of traditional research but also offers new perspectives and strategies for the screening, optimization of probiotics, and the regulation of gut microbiota balance. This holds potential significant value for improving human health and the prevention and treatment of related diseases.

      Thank you for the comments. We believe that the approach taken here, which contrasts with conventional reductionist techniques, will be critical for translating gut microbiome research into actionable therapeutic approaches.

      Weaknesses:

      While the study has excellently examined the overall changes in microbial community structure and the functions of individual bacteria, it lacks a focused investigation on the metabolic cross-feeding relationships between oxalate-degrading bacteria and related microorganisms, failing to provide a foundational microbial community or model for future research. Although this paper conducts a detailed study on oxalate metabolism, it would be beneficial to visually present the enrichment of different microbial community structures in metabolic pathways using graphical models.

      Thank you for this critique.  In the current study, we broadly examined the response of the gut microbiota to dietary oxalate. Based on initial shotgun metagenomic results, we focused in on specific taxa and metabolic functions.  Through metagenomic and multiple culture-based studies, we quickly honed in on redundancy in oxalate-degrading function as a key feature for oxalate homeostasis. We believe that the defined microbial community we used for microbial transplants (particularly the taxonomic cohort) provides a strong, minimal community to explore oxalate homeostasis further. In fact, we are using this consortium in multiple follow-up studies to fully understand the cross-feeding that may occur among these microorganisms, as you suggest.  We note that figure 3 shows the change of species and metabolic pathways with oxalate exposure.   

      Furthermore, the authors have done a commendable job in studying the roles of key bacteria. If the interactions and effects of upstream and downstream metabolically related bacteria could be integrated, it would provide readers with even more meaningful information. By illustrating how these bacteria interact within the metabolic network, readers can gain a deeper understanding of the complex ecological and functional relationships within microbial communities. Such an integrated approach would not only enhance the scientific value of the study but also facilitate future research in this area.

      Thank you. We note that based on the collective data obtained in this study, that redundancy in the oxalate degradation is the critical feature that maintains oxalate homeostasis. However, we are interested potential metabolic interactions between microbes in our defined community and are currently investigating these interactions through extensive investigations.   

      Reviewer #2 (Public review):

      Summary:

      Using the well-studied oxalate-microbiome-host system, the authors propose a novel conceptual and experimental framework for developing targeted bacteriotherapies using a three-phase pre-clinical workflow. The third phase is based on a 'complex system theoretical approach' in which multi-omics technologies are combined in independent in vivo and in vitro models to successfully identify the most pertinent variables that influence specific phenotypes in diet-host-microbe systems. The innovation relies on the third phase since phase I and phase II are the dominant approaches everyone in the microbiome field uses.

      Thank you. As you note, the proposed phases I and II are the predominant approaches used. In fact, many clinical trials have been conducted to try and reduce urine oxalate in patients, based solely on mechanistic studies with Oxalobacter formigenes.  As noted in our manuscript, only 43% of those studies results in the intended outcome, necessitating the approach we took in the current study. Our results suggest that the reason for the high rate of failure, despite well established mechanisms, is due to insufficient patient selection that focused only on the presence or absence of O. formigenes, which is a species that exhibits very low prevalence and abundance in the human gut microbiota, normally.

      Strengths:

      The authors used a multidisciplinary approach which included:

      (1) fecal transplant of two distinct microbial communities into Swiss-Webster mice (SWM) to characterize the host response (hepatic response-transcriptomics) and microbial activity (untargeted metabolomics of the stool samples) to different oxalate concentrations;

      (2) longitudinal analysis of the N. albigulia gut microbiome composition in response to varying concentrations of oxalate by shotgun metagenomics, with deep bioinformatic analyses of the genomes assembled; and

      (3) development of synthetic microbial communities around oxalate metabolisms and evaluation of these communities' activity in oxalate degradation in vivo.

      Thank you for these comments.  In the complex modelling approach, we focused on complete microbiota from host species known to have high and low capacities for oxalate tolerance, combined with targeting specific metabolic functions vs. specific taxa that may include unknown functions important for oxalate metabolism.  Further, we examined the influence of our target communities on oxalate metabolism through multiple in vitro and in vivo studies.

      Weaknesses:

      However, I have concerns about the frame the authors tried to provide for a 'complex system theoretical approach' and how the data are interpreted within this frame. Several of the conclusions the authors provide do not seem to have sufficient data to support them.

      Thank you.  We have tried to address these concerns by adding an exhaustive figure that broadly represents our complex modelling approach that includes potential complex system-based hypotheses, how they were tested, and the host-microbiome-oxalate interactions found in our study.

      Recommendations for the authors:  

      Reviewer #2 (Recommendations for the authors):

      Major Concerns

      (1) The authors argue about the importance of bringing 'Complex System Theory' to the microbiome field systematically and consistently. However, the authors fail to introduce this theory throughout the entire manuscript. For example, the authors tried to describe key elements and their nomenclature, such as nodes and fractal layers, in the first part of the result section. But the description is wordy and not precise. It would be more useful if the authors connected the model description with a visual representation, such as a figure. Unfortunately, these elements are not emphasizing and carried across the results section and are not mentioned in the discussion section.

      We have now added a figure (Figure 7) that details this process extensively and ties each of our findings to the complex system model and nomenclature.  We have also reiterated how our results fit in the complex system model in the discussion.

      In addition, there is no straightforward approach to integrating multi-omics datasets to identify the variables that are determinants of the system. For example, Figure 1 focuses on the impact of the host, hepatic activity, to oxalate exposure on fecal transplants into Swiss Webster mice; Figure 2 focuses on the effects of oxalate exposure on stool metabolic activity, not only microbial metabolic activity, on fecal transplants into Swiss Webster mice; and Figure 3 focuses on microbiome responses to different oxalate concentration in Neotoma albigula. There is no "model" to really integrate the host, the microbiome activity, and the microbiome composition information. And, unfortunately, the data generated between experiments cannot directly integrate; see major concern # 2.

      Thank you.  We have made more clear the experimental approach and how it applied to understanding the critical factors that maintain oxalate homeostasis.  Specifically, Figure 1 established that the effect of oxalate on the host was dependent on the microbiota, rather than host genetics.  Figure 2 established the effect of oxalate on the gut microbiota was again dependent on the whole gut microbiota and that these oxalate-microbe effects also influenced oxalate-host effects through a direct multi-omic data integration.  Once we established that the oxalate effects on host and microbiota were dependent on the whole microbiota composition, Figure 3 then sought to figure out how oxalate impacted the gut microbiota, using our model of high oxalate tolerance (N. albigula). With the finding in Figure 3 that there were multiple genes attributed to the degradation of oxalate, or acetogenic, methanogenic, and sulfate reducing pathways, Figure 4 and relevant supplemental figures sought to quantify the redundancy of these pathways.  After establishing a very high degree of redundancy, we sought to use a culturomic approach to determine what environmental factors impacted oxalate metabolism and to evaluate oxalate metabolism using our defined, hypothesized communities of microorganisms.  Finally, figure 6 sought to validate our metagenomic, metabolomic, and culturomic results from multiple animal and in vitro models using targeted microbial transplants in mice.  While we did have some direct multi-omic data integration (Figures 2 and 3), the process employed here sought to systematically determine which factors were most important for the oxalate-microbiota-host relationship, and then to use those results to design the subsequent experiments.  We have added this description to the discussion, which helps to contextualize the complex system modelling approach we took here.

      Finally, the authors did not provide a novel variable that successfully influences oxalate degradation in the oxalate-microbiome-host system. The authors argue that "both resource availability and community composition impact oxalate metabolism," which we currently inferred by the failure of the clinical tries and do not provide a clear intervention strategy to develop functional bacteriotherapy. The identification of composition as an important variable that was predictable without any multi-omics approach was highlighted by the development of synthetic microbial communities. Synthetic microbial communities are critical to characterizing complex microbiomes. Still, the authors did not explain how this strategy can be used in their theoretical framework (that is their goal), and these communities are not well introduced across the manuscript; see major concern # 4.

      As stated, it is clear from the failed clinical trials that we do not fully understand what microbial features dictate oxalate homeostasis.  We have specifically identified, through fecal transplant studies, that microbial composition is critical for oxalate homeostasis and that diverse oxalate-degrading bacteria exist.  However, ours is the first study that explicitly shows that it is this diversity that controls oxalate homeostasis.  This is specifically ascertained through the targeted microbial transplants in mice whereby O. formigenes was given alone or with different combinations of other microorganisms.  In other words, we were able to replicate both successful and failed studies by manipulating which specific species were introduced into animals.  This is unprecedented in the literature.

      (2) The authors provide several conclusions that are not completely supported by the data available. For example:

      (a) Lines 236-239: "Within the framework of complex systems, results show microbe-host cooperation whereby oxalate effectively processed within the SW-NALB gut microbiota reduced overall liver activity, indicative of a beneficial impact." - The authors did not provide data related to oxalate levels of oxalate processing for this dataset.

      While we did not specifically quantify oxalate degradation for this specific study, as cited in the text when describing this Swiss-Webster, Neotoma albigula system, we have previously published multiple animal studies explicitly showing that the N. albigula animals were highly effective oxalate degraders, which is transferable to Swiss-Webster mice through fecal transplants. Since the gut microbiota’s impact on oxalate has been welll established through experiments by our group, the purpose of these specific experiments were to look the other way and examine the effect of oxalate on the gut microbiota of these two animal models.  In the referenced text, we again cited our studies showing that the SW-NALB system effectively degrades oxalate.

      (b) Lines 239-243: "Data also suggest that both the gut microbiota and the immune system are involved in oxalate remediation (redundancy), such that if oxalate cannot be neutralized in the gut microbiota or liver, then the molecule will be processed through host immune response mechanisms (fractality), in this case indicated through an overall increase in hepatic activity and specifically in mitochondrial activity." - The authors did not provide any evidence related to the immune system and oxalate metabolism.

      We corrected that statement as follows: “…in this case indicated through an overall increase in inflammatory cytokines with oxalate exposure combined with an ineffective oxalate-degrading microbiota (Figures S6a,b; S9a,b).”  In other words, if the liver and gut microbiota can’t eliminate a toxin, then the immune system must deal with it through inflammatory pathways.  Oxalate is a well established, pro-inflammatory compound.  Our data show that this is dependent on the gut microbiota.

      (c) Lines 250-252: "Following the diet trial, colon stool was collected post-necropsy and processed for untargeted metabolomics, which is a measure of total microbial metabolic output." - Although most metabolites in stool samples are indeed microbial, there are also host metabolites. So, it is not technically correct to relate the metabolomic analysis of stool samples to only microbial metabolic analysis. In addition, the authors discussed compounds such as alkaloids and cholesterol as microbial metabolites, which these compounds are more related to the diet and host correspondingly.

      We have corrected this to state: “total metabolites present in stool from the diet, microbial activity, and host activity”

      (d) Lines 270-273. "Specifically, the SW-NALB mice exhibit hallmarks of homeostatic feedback with oxalate exposure to maintain a consistent metabolic output, defined by the relatively small, net negative, microbial metabolite-hepatic gene network compared to the large, net positive, network of SW-SW mice." - How do the authors define oxalate homeostasis? In addition, do the authors imply feedback between the liver and the microbiome in which the microbiome responds to a liver response related to oxalate levels? Or could the observation in Figure 1 be explained just by microbial consumption of oxalate that would reduce the impact of oxalate that arrives at the liver?

      Oxalate homeostasis is defined in that sentence: “relatively small, net negative, microbial metabolite-hepatic gene network compared to the large, net positive, network of SW-SW mice” – in other words, for SW-NALB mice, oxalate did not produce a considerable change to either microbial or hepatic metabolic activity.  We did not really test the liver impact on gut microbiota and can’t speak to that.  We believe, based on Figure 2 data, that it is not just the degradation of oxalate that explains the lack of change in hepatic activity in SW-NALB mice, rather that the oxalate-induced shift in the gut microbiota metabolic activity broadly altered hepatic activity, as inferred from Figure 2 c.  We made this more clear in the results: “suggests that the oxalate-induced change in microbial metabolism is responsible for the change in hepatic activity”.

      (e) Lines 297-301: "The oxalate-dependent metagenomic divergence of the NALB gut microbiota (Figure 3), combined with the lack of change in the microbial metabolomic profile with oxalate exposure (Figure 2), suggest that oxalate stimulates taxonomically diverse, but metabolically redundant microorganisms, in support of maintaining homeostasis." - The authors cannot conclude anything related between taxonomic changes and microbial activity since the taxonomic data presented is for microbial enrichment in N. albigulia, and the "microbial activity data" is from the fecal transplantation experiment in SWM. These are two completely different systems with two completely different experimental designs.

      We have shown very similar results in that oxalate induces the taxonomic divergence for the NALB gut microbiota, in multiple previous studies.  The experiment in which a minimal, positive increase in microbial metabolites, was saw with oxalate was based on the SW-NALB model whereby Swiss-Webster mice have an NALB microbiota.  We show throughout the manuscript, that the impact of oxalate is very microbiota dependent and supports our claim.  However, the claim is hypothesis generating – that metabolic redundancy is important for oxalate homeostasis.  We modified our statement to make all of this more clear.   

      Related to microbial composition, the authors did not show data validating the efficiency of the fecal transplantations (allograft or xenograft) in the SWM after antibiotic treatment. They also did not show evidence of microbial composition dynamics in response to oxalate exposure.

      Again, the efficacy of fecal transplants, used in the way they were here, has been shown in multiple past studies of our group.  In past studies, we have extensively characterized the microbiota from fecal transplants and which taxa were associated with oxalate levels.  Therefore, that topic was not the focus of the current study, instead focusing on the oxalate impact on gut microbiota activity.  Our past studies, referenced multiple times through the current manuscript, were used in large part to help determine which microbes to include in our taxonomic cohort, as described in the manuscript.

      (f) Lines 301-303: "Given that data came from the same hosts sampled longitudinally, these data also reflect a microbiota that is adaptive to oxalate exposure, which is another important characteristic of complex systems." - In their dataset, what is the evidence that the microbiota of N. albigulia is adapted to oxalate exposure? Is the increase in genomes with pathways related to oxalate metabolism related to an increase of oxalate in the diet? If so, does the microbiota exposure with a higher oxalate concentration decrease the systemic level of oxalate? In neither of the experiments related to Figures 1 to 3, the authors showed a correlation of systemic oxalate levels with microbial composition, hepatic host response, or stool metabolism.

      Figure 3 explicitly shows the longitudinal impact of increasing levels of oxalate showing an increase in oxalate degrading genes (Figure 3d). The specific samples selected for analysis here come from a previous study in which we explicitly quantified changes to the gut microbiota composition and both stool and urine oxalate for every time point listed in figure 3a.  This information is explicitly stated in the methods coupled with the fact that “neither fecal nor urinary oxalate levels increased significantly.”  Again, the effect of the gut microbiota on oxalate in these model systems have been extensively studied by our group and provide the foundation for the current study to look at the effect of oxalate on the gut microbiota and host.

      Considering my last two points, the authors do not present substantial evidence to support their hypothesis that oxalate stimulates taxonomically diverse, metabolically redundant communities.

      As stated above, that oxalate stimulates taxonomically diverse taxa was ascertained through multiple past studies, as well as the current study (Figure 3e).  The metabolically redundant part is ascertained both through untargeted metabolomics (Figure 2a,b) and shotgun metagenomics (Figure 3c,d).  Further evidence for the metabolic redundancy with oxalate comes from our culturomic approach, which showed that 14.58% of isolates could grow on oxalate as a carbon and energy source, in addition to the high proportion of isolates that could grow on other carbon and energy sources, at least much more than can be ascribed to a single species  (Figure 5c).  We made this more clear in the discussion.

      (g) Lines 330-335. "Additionally, the broad diversity of species that contain oxalate-related genes suggests that the distribution of metabolic genes is somewhat independent of the distribution of microbial species, which suggests that microbial genes exist in an autonomous fractal layer, to some degree. This hypothesis is supported by studies which show a high degree of horizontal gene transfer within the gut microbiota as a means of adaptation." - This conclusion is highly speculative, especially since the author did not do any analysis to directly evaluate a relationship between the oxalate metabolic pathways and the microbial species where these pathways are present.

      Figure 3c,d,e explicitly shows the metabolic pathways and species enriched by oxalate exposure.  Figure 4d, generated using the same data from Figure 3, explicitly shows the taxa that harbor oxalate-degrading genes.   

      (h) Lines 364-366. "Collectively, data show that both resource availability and community composition impacts oxalate metabolism, which helps to define the adaptive nature of the NALB gut microbiota." - The authors indeed showed evidence that community composition impacts oxalate metabolism. However, the authors did not show any evidence to directly evaluate the resource availability to impact oxalate metabolism.

      This is explicitly shown through in vitro community-based and single species assays varying multiple different carbon and energy sources to quantify changes to oxalate degradation (chosen based on shotgun metagenomic results; Figure 5a,b).

      (3) Lines 321-325. "Acetogenic genes were also present in 97.18% of genomes, dominated by acetate kinase and formate-tetrahydrofolate ligase (Figure S3A323C). Methanogenic genes were present in 100% of genomes, dominated by phosphoserine phosphatase, atpdependent 6-phosphofructokinase, and phosphate acetyltransferase (Figure S4A-C)." - The authors spent much time analyzing the adjacent pathways related to oxalate and oxalaterelated products of oxalate metabolism. However, my understanding is that the genes used to analyze these pathways (formate metabolism, acetogenesis, methanogenesis), such as the ones named above, are not unique/specific for those pathways but participate in other "housekeeping" pathways. What is the relevance of these analyses when those genes are not unique/specific to the function/pathways that the authors describe? If I infer correctly, these bioinformatic analyses aim to evaluate the hypothesis of whether oxalate metabolism could be a social/cooperation metabolism and whether other species could participate in the metabolism of oxalate subproducts. However, these analyses did not explicitly evaluate this hypothesis.

      The reviewer is correct in that we aimed to evaluate the potential that oxalate metabolism could benefit from metabolic cooperation.  The specific genes chosen for this analysis were those explicitly listed in the target metabolic pathways in KEGG, as described.  However, while the analyses do show the strong potential that the CO2 and formate produced from oxalate degradation could be used in these other pathways, as intended, the genes can be used in other metabolic pathways.  We did, however, explicitly test the hypothesis that formate, produced from oxalate degradation, could be utilized by the gut microbiota.  While the targeted transplants with the taxonomic cohort did not clearly show the use of formate in this way, those from the metabolic cohort did (Figures 6d and S8d).  This question is still in ongoing investigations in our group.  

      We have made it more clear that our genome analyses provide the potential for metabolic redundancy rather than definitive proof for metabolic redundancy, which was evaluated more extensively in other experiments from this study.

      (a) Lines 481-484. "Collectively, data offer strong support for the hypothesis that metabolic redundancy among diverse taxa, is the primary driver of oxalate homeostasis, rather than metabolic cooperation in which the by-products of oxalate degradation are used in downstream pathways such as acetogenesis, methanogenesis, and sulfate reduction." - Although the authors recognize that their data about the metabolic cooperation hypothesis is inconclusive, they never tested the hypothesis related to metabolic cooperation, as mentioned above. This is highly speculative.

      As stated above, the targeted microbial transplants to animals and in vitro studies (Figure 5e,f) did explicitly test the cooperation hypothesis, but it the results did not support it and instead pointed much more strongly to metabolic redundancy.    

      (4) Lines 355-359. "Cohorts, defined in the STAR methods, were used to delineate hypotheses that either carbon and energy substrates are sufficient to explain known effects of the oxalate-degrading microbial network or that additional aspects of taxa commonly stimulated by dietary oxalate are required to explain past results (taxa defined through previous meta-analysis of studies)." - The definition of the metabolic cohorts and the taxonomic cohorts should not be hidden in the material and methods section. It should be explicit and clearly explained in the main text. Related, the table presented in Figure 5D is exceptionally confusing and does not help to understand and differentiate between the metabolic and the taxonomic cohorts. The authors need to explicitly identify the synthetic communities used in each cohort and each group by their members and their characteristics in supplementary tables.

      In the sentences before those referenced, we state: “Culturomic data recapitulates molecular data to show a considerable amount of redundancy surrounding oxalate metabolism (Fig. 5C). Isolates generated from this assay were used for subsequent study (metabolic cohort; Figure 5D). Additionally, a second cohort was defined and commercially purchased based both on known metabolic functions and the proportion of studies that saw an increase in their taxonomic population with oxalate consumption (Fig. 5D; taxonomic cohort). Where possible, isolates from human sources were obtained.”  Figure 5d explicitly shows the specific species used in each cohort along with the groups they were in for transplant studies, the explicit metabolic pathways we were targeting, along with the % of studies that these species were associated with oxalate metabolism.  All of this information is both in the main text of the results and in the figure legends.  It is not hidden in the methods, but the methods do reiterate what was also placed in the results.   

      In Figures 5 and 6, the authors used the following groups with the corresponding nomenclature: 'Group 1, No_bact; Group 2, Ox; Group 3, Ox_form; Group 4, All; Group 5, No_ox'. Although the information related to these groups is present in the material and method section in lines 1139-1143, the authors also need to explicitly explain the groups and their nomenclature in the main text.

      Since this information is explicitly and succinctly given in the referenced figures, I believe that adding the same information in the text would be too redundant.

      Related to the development of the synthetic communities. How did the authors prepare the synthetic communities or 'cohort' for the in vitro experiments? 

      We added more information for the preparation of microbes and execution of the in vitro assays, as needed.  

      Also, it is unclear in the material and method section how the metabolic profile of each isolated was evaluated (Figure 5C). Related to the bacteria isolated from the culturomic assays, including Figure 5C and metabolic cohort, the authors indeed reported the isolation methodology in lines 1262-1275. However, there is no information about the sequencing of these isolates. The authors should present these isolates as a list (supplementary table) with their names, taxonomy, metabolic profile, and Genome ID if these genomes were submitted to NCBI.

      We added additional information for how metabolic cohort isolates were chosen and how they were taxonomically identified.  The taxonomy and substrate utilization of isolates are in Figure 5D.  We did not sequence the genomes of metabolic cohort bacteria.  However, the ATCC isolates, which comprise the taxonomic cohort, are publicly available.

      The author presented the 248 metagenomics assembles in Figure S1 in a circular chart in context with other genomes. However, the metagenomic assembles should be presented in a table form, with their name, taxonomy, coverage, completeness, and Genome ID, if these genomes were submitted to NCBI.

      The information for the genomes submitted to the NCBI is provided in the data availability statement.  However, we added a table (Table S9) that includes the requested information.   

      (5) Lines 371-3374: "To delineate hypotheses of metabolic redundancy or cooperation for mitigating the negative effects of oxalate on the gut microbiota and host, two independent diet trials were conducted with analogous microbial communities derived from the metabolic and taxonomic cohorts". 

      Lines 494-496: "we and others have found that oxalate can differentially exhibit positive or negative effects on microbial growth and metabolism dependent on the species and environment present" - What is the evidence that oxalate has a negative effect on the gut microbiota? The authors clearly showed the negative effect of oxalate on the host. Although there are reports in the literature of oxalate consumers with a negative effect on the microbiome, such as Lactobacilli and Bifidobacteria, there is no evidence in this manuscript about a negative effect of oxalate on the microbiome, and there is not an experimental design to evaluate it.

      These data are presented in Figure 2A and B.  As stated, oxalate led to a net reduction in total microbial metabolites produced of 34 metabolites, with a significant shift in overall metabolome, indicative of metabolic inhibition.  This is in comparison to the net gain of 9 metabolites, with no significant shift overall,  in the mice with the NALB microbiota.  The positive and negative effects of oxalate on the whole gut microbiota here are bolstered by previous studies on the effect of oxalate on pure cultures as discussed and cited on line 623624.

      (6) Related to the last section, it is hard to really compare the results of the taxonomic cohort versus the metabolic cohort when the data of one cohort is in the main figure and the other in a supplementary figure. In addition, all the comparisons between the two cohorts seem to be qualitative. For any comparisons, the authors need to do a statistical comparison between the groups of the two cohorts.

      The comparison of the two sets of data are indeed qualitative.  This is because these mouse models were run in separate experiments to test separate hypotheses (whether utilization of specific substrates is enough to improve oxalate metabolism or if specific taxa previously responsive to dietary oxalate was better, which is stated in the manuscript).  Given that these experimental models were tested separately, it would not be statistically valid to do a direct statistical comparison, even though the experimental procedures were the same and the only difference were the transplanted bacteria.  The separation of the experiments into a main and supplemental figure was done out of necessity given the very large amount of data and many experimental mouse models that were run in this study overall.   

      Minor Comments.

      (1) The authors should define 'antinutrients'. This term is not a familiar concept and could create confusion.

      This is defined in line 104 “molecules produced in plants to deter herbivory, disrupt homeostasis by targeting the function of the microbiome, host, or both”

      (2) The authors should explicitly describe the N. albigulia, aka White-throated woodrat system, as early as possible in the result section.

      We added some statements about the Swiss webster and N. albigula gut microbiota as poor and effective oxalate degraders in the second section of the results.

      (3) SW-SW mice exhibited an oxalate-dependent alteration of 219 hepatic genes, with a net increase in activity. In comparison, the SW-NALB mice exhibited an oxalate-dependent alteration of 21 genes with a net decrease in activity. However, the visual representation of the PCoA in Figure 1B showed that the most different samples are the SW-NALB 0% and 1.5%. Could you please explain this difference?

      In Figure 1b, the SW-NALB data are represented by the blue and black data points, which directly overlap with each other.  The SW-SW data are the orange and purple data points, which exhibit very little overlap.  

      (4) Is Table S7 the same as Table S6? If not, there is a missing supplementary table.

      These tables are different.  We ensured that both are present.

      (5) How did the authors test bacterial growth in in vivo studies (Figure 5B)?

      We added a statement to the culturomic section of the methods – we used media with or without oxalate and quantified colony-forming units.

      (6) A section of 16S rRNA metagenomics in the material and method section is not used across the main manuscript.

      These data are presented in figures S7 and S10, as stated in the results.  We added statements in the results to clarify that these figures show the 16S sequencing data.

      (7) Lines 506-511: "Collectively, data from the current and previous studies on the effect of oxalate exposure on the gut microbiota support the hypothesis that the gut microbiota serves as an adaptive organ in which specific, metabolically redundant microbes respond to and eliminate dietary components, for the benefit of themselves, but which can residually protect or harm host health depending on the dietary molecules and gut microbiota composition." - What is the benefit to bacteria in eliminating oxalate? This is highly speculative to this system.

      The benefit to bacteria is stated earlier in that paragraph – “In the current (Figs. 2B, 5B) and previous studies(33,34,64,65), we and others have found that oxalate can differentially exhibit positive or negative effects on microbial growth and metabolism dependent on the species and environment present.”

      (8) Lines 504 -506: "Importantly, the near-universal presence of formate metabolism genes suggest that formate may be an even greater source of ecological pressure (Figures S2-S5)."

      - Formate is primarily produced by fermentative anaerobic bacteria, such as Bacteroides, Clostridia, and certain species of Escherichia coli, since formate would be present in anaerobic communities independently of oxalate. How is formate an even greater source of ecological pressure?

      We added a statement about the toxicity of formate to both bacteria and mammalian hosts.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary

      In this study, the authors build upon previous research that utilized non-invasive EEG and MEG by analyzing intracranial human ECoG data with high spatial resolution. They employed a receptive field mapping task to infer the retinotopic organization of the human visual system. The results present compelling evidence that the spatial distribution of human alpha oscillations is highly specific and functionally relevant, as it provides information about the position of a stimulus within the visual field.

      Using state-of-the-art modeling approaches, the authors not only strengthen the existing evidence for the spatial specificity of the human dominant rhythm but also provide new quantification of its functional utility, specifically in terms of the size of the receptive field relative to the one estimated based on broad band activity.

      We thank the reviewer for their positive summary.

      Weakness 1.1

      The present manuscript currently omits the complementary view that the retinotopic map of the visual system might be related to eye movement control. Previous research in non-human primates using microelectrode stimulation has clearly shown that neuronal circuits in the visual system possess motor properties (e.g. Schiller and Styker 1972, Schiller and Tehovnik 2001). More recent work utilizing Utah arrays, receptive field mapping, and electrical stimulation further supports this perspective, demonstrating that the retinotopic map functions as a motor map. In other words, neurons within a specific area responding to a particular stimulus location also trigger eye movements towards that location when electrically stimulated (e.g. Chen et al. 2020).

      Similarly, recent studies in humans have established a link between the retinotopic variation of human alpha oscillations and eye movements (e.g., Quax et al. 2019, Popov et al. 2021, Celli et al. 2022, Liu et al. 2023, Popov et al. 2023). Therefore, it would be valuable to discuss and acknowledge this complementary perspective on the functional relevance of the presented evidence in the discussion section.

      The reviewer notes that we do not discuss the oculomotor system and alpha oscillations. We agree that the literature relating eye movements and alpha oscillations are relevant.

      At the Reviewer’s suggestion, we added a paragraph on this topic to the first section of the Discussion (section 3.1, “Other studies have proposed … “).

      Reviewer #2 (Public Review):

      Summary:

      In this work, Yuasa et al. aimed to study the spatial resolution of modulations in alpha frequency oscillations (~10Hz) within the human occipital lobe. Specifically, the authors examined the receptive field (RF) tuning properties of alpha oscillations, using retinotopic mapping and invasive electroencephalogram (iEEG) recordings. The authors employ established approaches for population RF mapping, together with a careful approach to isolating and dissociating overlapping, but distinct, activities in the frequency domain. Whereby, the authors dissociate genuine changes in alpha oscillation amplitude from other superimposed changes occurring over a broadband range of the power spectrum. Together, the authors used this approach to test how spatially tuned estimated RFs were when based on alpha range activity, vs. broadband activities (focused on 70-180Hz). Consistent with a large body of work, the authors report clear evidence of spatially precise RFs based on changes in alpha range activity. However, the size of these RFs were far larger than those reliably estimated using broadband range activity at the same recording site. Overall, the work reflects a rigorous approach to a previously examined question, for which improved characterization leads to improved consistency in findings and some advance of prior work.

      We thank the reviewer for the summary.

      Strengths:

      Overall, the authors take a careful and well-motivated approach to data analyses. The authors successfully test a clear question with a rigorous approach and provide strong supportive findings. Firstly, well-established methods are used for modeling population RFs. Secondly, the authors employ contemporary methods for dissociating unique changes in alpha power from superimposed and concomitant broadband frequency range changes. This is an important confound in estimating changes in alpha power not employed in prior studies. The authors show this approach produces more consistent and robust findings than standard band-filtering approaches. As noted below, this approach may also account for more subtle differences when compared to prior work studying similar effects.

      We thank the reviewer for the positive comments.

      Weaknesses:

      Weakness 2.1 Theoretical framing:

      The authors frame their study as testing between two alternative views on the organization, and putative functions, of occipital alpha oscillations: i) alpha oscillation amplitude reflects broad shifts in arousal state, with large spatial coherence and uniformity across cortex; ii) alpha oscillation amplitude reflects more specific perceptual processes and can be modulated at local spatial scales. However, in the introduction this framing seems mostly focused on comparing some of the first observations of alpha with more contemporary observations. Therefore, I read their introduction to more reflect the progress in studying alpha oscillations from Berger's initial observations to the present. I am not aware of a modern alternative in the literature that posits alpha to lack spatially specific modulations. I also note this framing isn't particularly returned to in the discussion.

      This was helpful feedback. We have rewritten nearly the entire Introduction to frame the study differently. The emphasis is now on the fact that several intracranial studies of spatial tuning of alpha (in both human and macaque) tend to show increases in alpha due to visual stimulation, in contrast to a century of MEG/EEG studies, from Berger to the present, showing decreases. We believe that the discrepancy is due to an interaction between measurement type and brain signals. Specifically, intracranial measurements sum decreases in alpha oscillations and increases in broadband power on the same trials, and both signals can be large. In contrast, extracranial measures are less sensitive to the broadband signals and mostly just measure the alpha oscillation. Our study reconciles this discrepancy by removing the baseline broadband power increases, thereby isolating the alpha oscillation, and showing that with iEEG spatial analyses, the alpha oscillation decreases with visual stimulation, consistent with EEG and MEG results.

      Weakness 2.2 A second important variable here is the spatial scale of measurement.

      It follows that EEG based studies will capture changes in alpha activity up to the limits of spatial resolution of the method (i.e. limited in ability to map RFs). This methodological distinction isn't as clearly mentioned in the introduction, but is part of the author's motivation. Finally, as noted below, there are several studies in the literature specifically addressing the authors question, but they are not discussed in the introduction.

      The new Introduction now explicitly contrasts EEG/MEG with intracranial studies and refers to the studies below.

      Weakness 2.3 Prior studies:

      There are important findings in the literature preceding the author's work that are not sufficiently highlighted or cited. In general terms, the spatio-temporal properties of the EEG/iEEG spectrum are well known (i.e. that changes in high frequency activity are more focal than changes in lower frequencies). Therefore, the observations of spatially larger RFs for alpha activities is highly predicted. Specifically, prior work has examined the impact of using different frequency ranges to estimate RF properties, for example ECoG studies in the macaque by Takura et al. NeuroImage (2016) [PubMed: 26363347], as well as prior ECoG work by the author's team of collaborators (Harvey et al., NeuroImage (2013) [PubMed: 23085107]), as well as more recent findings from other groups (Luo et al., (2022) BioRxiv: https://doi.org/10.1101/2022.08.28.505627). Also, a related literature exists for invasively examining RF mapping in the time-voltage domain, which provides some insight into the author's findings (as this signal will be dominated by low-frequency effects). The authors should provide a more modern framing of our current understanding of the spatial organization of the EEG/iEEG spectrum, including prior studies examining these properties within the context of visual cortex and RF mapping. Finally, I do note that the author's approach to these questions do reflect an important test of prior findings, via an improved approach to RF characterization and iEEG frequency isolation, which suggests some important differences with prior work.

      Thank you for these references and suggestions. Some of the references were already included, and the others have been added.

      There is one issue where we disagree with the Reviewer, namely that “the observations of spatially larger RFs for alpha activities is highly predicted”. We agree that alpha oscillations and other low frequency rhythms tend to be less focal than high frequency responses, but there are also low frequency non-rhythmic signals, and these can be spatially focal. We show this by demonstrating that pRFs solved using low frequency responses outside the alpha band (both below and above the alpha frequency) are small, similar to high frequency broadband pRFs, but differing from the large pRFs associated with alpha oscillations. Hence we believe the degree to which signals are focal is more related to the degree of rhythmicity than to the temporal frequency per se. While some of these results were already in the supplement, we now address the issue more directly in the main text in a new section called, “2.5 The difference in pRF size is not due to a difference in temporal frequency.”

      We incorporated additional references into the Introduction, added a new section on low frequency broadband responses to the Results (section 2.5), and expanded the Discussion (section 3.2) to address these new references.

      Weakness 2.4 Statistical testing:

      The authors employ many important controls in their processing of data. However, for many results there is only a qualitative description or summary metric. It appears very little statistical testing was performed to establish reported differences. Related to this point, the iEEG data is highly nested, with multiple electrodes (observations) coming from each subject, how was this nesting addressed to avoid bias?

      We reviewed the primary claims made in the manuscript and for each claim, we specify the supporting analyses and, where appropriate, how we address the issue of nesting. Although some of these analyses were already in the manuscript, many of them are new, including all of the analyses concerning nesting. We believe that putting this information in one place will be useful to the reader, and we now include this text as a new section in supplement, Graphical and statistical support for primary claims.

      Reviewer #2 (Recommendations For The Authors):

      Recommendation 2.1:

      Data presentation: In several places, the authors discuss important features of cortical responses as measured with iEEG that need to be carefully considered. This is totally appropriate and a strength of the author's work, however, I feel the reader would benefit from more depiction of the time-domain responses, to help better understand the authors frequency domain approach. For example, Figure 1 would benefit from showing some form of voltage trace (ERP) and spectrogram, not just the power spectra. In addition, part (a) of Figure 1 could convey some basic information about the timing of the experimental paradigm.

      We changed panel A of Figure 1 to include the timing of the experimental paradigm, and we added panels C and D to show the electrode time series before and after regression out of the ERP.

      Recommendation 2.2

      Update introduction to include references to prior EEG/iEEG work on spatial distribution across frequency spectrum, and importantly, prior work mapping RFs with different frequencies.

      We have addressed this issue and re-written our introduction. Please refer to our response in Public Review for further details.

      Recommendation 2.3

      Figure 3 has several panels and should be labeled to make it easier to follow.The dashed line in lower power spectra isn't defined in a legend and is missing from the upper panel - please clarify.

      We updated Figure 3 and reordered the panels to clarify how we computed the summary metrics in broadband and alpha for each stimulus location (i.e., the “ratio” values plotted in panel B). We also simplified the plot of the alpha power spectrum. It now shows a dashed line representing a baseline-corrected response to the mapping stimulus, which is defined in the legend and explained in the caption.

      Recommendation 2.4

      Power spectra are always shown without error shading, but they are mean estimates.

      We added error shading to Figures 1, 2 and 3.

      Recommendation 2.5

      The authors deal with voltage transients in response to visual stimulation, by subtracting out the trail averaged mean (commonly performed). However, the efficacy of this approach depends on signal quality and so some form of depiction for this processing step is needed.

      We added a depiction of the processing steps for regressing out the averaged responses in Figure 1 in an example electrode (panels C and D). We also show in the supplement the effect of regressing out the ERP on all the electrode pRFs. We have added Supplementary Figure 1-2.

      Recommendation 2.6

      I have a similar request for the authors latency correction of their data, where they identified a timing error and re-aligned the data without ground truth. Again, this is appropriate, but some depiction of the success of this correction is very critical for confirming the integrity of the data.

      We now report more detail on the latency correction, and also point out that any small error in the estimate would not affect our conclusions (4.6 ECoG data analysis | Data epoching). The correction was important for a prior paper on temporal dynamics (Groen et al, 2022), which used data from the same participants and estimated the latency of responses. In this paper, our analyses are in the spectral domain (and discard phase), so small temporal shifts are not critical. We now also link to the public code associated with that paper, which implemented the adjustment and quantified the uncertainty in the latency adjustment.

      More details on latency adjustment provided in section 4.6.

      Recommendation 2.7

      In many places the authors report their data shows a 'summary' value, please clarify if this means averaging or summation over a range.

      For both broadband and alpha, we derive one summary value (a scalar) for trial for each stimulus. For broadband, the summary metric is the ratio of power during a given trial and power during blanks, where power in a trial is the geometric mean of the power at each frequency within the defined band). This is equation 3 in the methods, which is now referred to the first time that summary metrics are mentioned in the results.  For alpha, the summary metric is the height of the Gaussian from our model-based approach. This is in equations 1 and 2, and is also now referred to the first time summary metrics are mentioned in the results.

      We added explanation of the summary metrics in the figure captions and results where they are first used, and also referred to the equations in the methods where they are defined.

      Recommendation 2.8

      The authors conclude: "we have discovered that spectral power changes in the alpha range reflect both suppression of alpha oscillations and elevation of broadband power." It might not have been the intention, but 'discovered' seems overstated.

      We agree and changed this sentence.

      Recommendation 2.9

      Supp Fig 9 is a great effort by the authors to convey their findings to the reader, it should be a main figure.

      We are glad you found Supplementary Figure 9 valuable. We moved this figure to the main text.

      Reviewer #3 (Public Review):

      Summary:

      This study tackles the important subject of sensory driven suppression of alpha oscillations using a unique intracranial dataset in human patients. Using a model-based approach to separate changes in alpha oscillations from broadband power changes, the authors try to demonstrate that alpha suppression is spatially tuned, with similar center location as high broadband power changes, but much larger receptive field. They also point to interesting differences between low-order (V1-V3) and higher-order (dorsolateral) visual cortex. While I find some of the methodology convincing, I also find significant parts of the data analysis, statistics and their presentation incomplete. Thus, I find that some of the main claims are not sufficiently supported. If these aspects could be improved upon, this study could potentially serve as an important contribution to the literature with implications for invasive and non-invasive electrophysiological studies in humans.

      We thank the reviewer for the summary.

      Strengths:

      The study utilizes a unique dataset (ECOG & high-density ECOG) to elucidate an important phenomenon of visually driven alpha suppression. The central question is important and the general approach is sound. The manuscript is clearly written and the methods are generally described transparently (and with reference to the corresponding code used to generate them). The model-based approach for separating alpha from broadband power changes is especially convincing and well-motivated. The link to exogenous attention behavioral findings (figure 8) is also very interesting. Overall, the main claims are potentially important, but they need to be further substantiated (see weaknesses).

      We thank the reviewer for the positive comments.

      Weaknesses:

      I have three major concerns:

      Weakness 3.1. Low N / no single subject results/statistics:

      The crucial results of Figure 4,5 hang on 53 electrodes from four patients (Table 2). Almost half of these electrodes (25/53) are from a single subject. Data and statistical analysis seem to just pool all electrodes, as if these were statistically independent, and without taking into account subject-specific variability. The mean effect per each patient was not described in text or presented in figures. Therefore, it is impossible to know if the results could be skewed by a single unrepresentative patient. This is crucial for readers to be able to assess the robustness of the results. N of subjects should also be explicitly specified next to each result.

      We have added substantial changes to deal with subject specific effects, including new results and new figures.

      • Figure 4 now shows variance explained by the alpha pRF broken down by each participant for electrodes in V1 to V3. We also now show a similar figure for dorsolateral electrodes in Supplementary Figure 4-2.

      • Figure 5, which shows results from individual electrodes in V1 to V3, now includes color coding of electrodes by participant to make it clear how the electrodes group with participant. Similarly, for dorsolateral electrodes, we show electrodes grouped by participant in Supplementary Figure 5-1. Same for Supplementary Figure 6-2.

      • Supplementary Figure 7-2 now shows the benefits of our model-based approach for estimating alpha broken down by individual participants.

      • We also now include a new section in the supplement that summarizes for every major claim, what the supporting data are and how we addressed the issue of nesting electrodes by participant, section Graphical and statistical support for primary claims.

      Weakness 3.2. Separation between V1-V3 and dorsolateral electrodes:

      Out of 53 electrodes, 27 were doubly assigned as both V1-V3 and dorsolateral (Table 2, Figures 4,5). That means that out of 35 V1-V3 electrodes, 27 might actually be dorsolateral. This problem is exasperated by the low N. for example all the 20 electrodes in patient 8 assigned as V1-V3 might as well be dorsolateral. This double assignment didn't make sense to me and I wasn't convinced by the authors' reasoning. I think it needlessly inflates the N for comparing the two groups and casts doubts on the robustness of these analyses.

      Electrode assignment was probabilistic to reflect uncertainty in the mapping between location and retinotopic map. The probabilistic assignment is handled in two ways.

      (1) For visualizing results of single electrodes, we simply go with the maximum probability, so no electrode is visualized for both groups of data. For example, Figure 5a (V1-V3) and supplementary Figure 5-1a (dorsolateral electrodes) have no electrodes in common: no electrode is in both plots.

      (2) For quantitative summaries, we sample the electrodes probabilistically (for example Figures 4, 5c). So, if for example, an electrode has a 20% chance of being in V1 to V3, and 30% chance of being in dorsolateral maps, and a 50% chance of being in neither, the data from that electrode is used in only 20% of V1-V3 calculations and 30% of dorsolateral calculations. In 50% of calculations, it is not used at all. This process ensures that an electrode with uncertain assignment makes no more contribution to the results than an electrode with certain assignment. An electrode with a low probability of being in, say, V1-V3, makes little contribution to any reported results about V1-V3. This procedure is essentially a weighted mean, which the reviewer suggests in the recommendations. Thus, we believe there is not a problem of “double counting”.

      The alternative would have been to use maximum probability for all calculations. However, we think that doing so would be misleading, since it would not take into account uncertainty of assignment, and would thus overstate differences in results between the maps.

      We now clarify in the Results that for probabilistic calculations, the contribution of an electrode is limited by the likelihood of assignment (Section 2.3). We also now explain in the methods why we think probabilistic sampling is important.

      Weakness 3.3. Alpha pRFs are larger than broadband pRFs:

      First, as broadband pRF models were on average better fit to the data than alpha pRF models (dark bars in Supp Fig 3. Top row), I wonder if this could entirely explain the larger Alpha pRF (i.e. worse fits lead to larger pRFs). There was no anlaysis to rule out this possibility.

      We addressed this question in a new paragraph in Discussion section 3.1 (“What is the function of the large alpha pRFs?”, paragraph beginning… “Another possible interpretation is that the poorer model fit in the alpha pRF is due to lower signal-to-noise”). This paragraph both refers to prior work on the relationship between noise and pRF size and to our own control analyses (Supplementary Figure 5-2).

      Weakness 3.4 Statistics

      Second, examining closely the entire 2.4 section there wasn't any formal statistical test to back up any of the claims (not a single p-value is mentioned). It is crucial in my opinion to support each of the main claims of the paper with formal statistical testing.

      We agree that it is important for the reader to be able to link specific results and analyses to specific claims. We are not convinced that null hypothesis statistical testing is always the best approach. This is a topic of active debate in the scientific community.

      We added a new section that concisely states each major claim and explicitly annotates the supporting evidence. (Section 4.7). Please also refer to our responses to Reviewer #2 regarding statistical testing (Reviewer weakness 2.4 “Statistical testing”)

      Weakness 3.5 Summary

      While I judge these issues as crucial, I can also appreciate the considerable effort and thoughtfulness that went into this study. I think that addressing these concerns will substantially raise the confidence of the readership in the study's findings, which are potentially important and interesting.

      We again thank the reviewer for the positive comments.

      Reviewer #3 (Recommendations For The Authors):

      Suggestions for how to address the three major concerns:

      Suggestion 3.1.

      I am very well aware that it's very hard to have n=30 in a visual cortex ECOG study. That's fine. Best practice would be to have a linear mixed effects model with patients as a random effect. However, for some figures with just 3-4 patients (Figure 4,5) the sample size might be too small even for that. At the very minimum, I would expect to show in figures/describe in text all results per patient (perhaps one can do statistics within each patient, and show for each patient that the effect is significant). Even in primate studies with just two subjects it is expected to show that the results replicate for subject A and B. It is necessary to show that your results don't depend on a single unrepresentative subject. And if they do, at least be transparent about it.

      We have addressed this thoroughly. Please see response to Weakness 3.1 (“Low N / no single subject results/statistics”).

      Suggestion 3.2.

      I just don't get it. I would simply assign an electrode to V1-V3 or dorsolateral cortex based on which area has the highest probability. It doesn't make sense to me that an electrode that has 60% of being in dorsolateral cortex and only 10% to be in V1-V3 would be assigned as both V1-V3 and dorsolateral. Also, what's the rationale to include such electrode in the analysis for let's say V1-V3 (we have weak evidence to believe it's there)? I would either assign electrodes based on the highest probability, or alternatively do a weighted mean based on the probability of each electrode belonging to each region group (e.g. electrode with 40% to be in V1-V3, will get twice the weight as an electrode who has 20% to be in V1-V3) but this is more complicated.

      We have addressed this issue. Please refer to our response in Public Review (“Weakness 3.2 Separation between V1-V3 and dorsolateral”) for details.

      Suggestion 3.3.

      First, to exclude the possibility that alpha pRF are larger simply because they have a worse fit to the neural data, I would show if there is a correlation between the goodnessof-fit and pRF size (for alpha and broadband signals, separately). No [negative] correlation between goodness-of-fit and pRF size would be a good sign. I would also compare alpha & broadband receptive field size when controlling for the goodness-of-fit (selecting electrodes with similar goodness-of-fit for both signals). If the results replicate this way it would be convincing.

      Second, there are no statistical tests in section 2.4, possibly also in others. Even if you employ bootstrap / Monte-Carlo resampling methods you can extract a p-value.

      We have addressed this issue. Please refer to our response in Public Review Point 3.3 (“Alpha pRFs are larger than broadband pRFs”) for further details.

      Suggestion 3.4.

      Also, I don't understand the resampling procedure described in lines 652-660: "17.7 electrodes were assigned to V1-V3, 23.2 to dorsolateral, and 53 to either " - but 17.7 + 23.2 doesn't add up to 53. It also seems as if you assign visual areas differently in this resampling procedure than in the real data - "and randomly assigned each electrode to a visual area according to the Wang full probability distributions". If you assign in your actual data 27 electrodes to both visual areas, the same should be done in the resampling procedure (I would expect exactly 35 V1-V3 and 45 dorsolateral electrodes in every resampling, just the pRFs will be shuffled across electrodes).

      We apologize for the confusion.

      We fixed the sentence above, clarified the caption to Table 2, and also explained the overall strategy of probabilistic resampling better. See response to Public Review point 3.2 for details.

      Suggestion 3.5.

      These are rather technical comments but I believe they are crucial points to address in order to support your claims. I genuinely think your results are potentially interesting and important but these issues need to be first addressed in a revision. I also think your study may carry implications beyond just the visual domain, as alpha suppression is observed for different sensory modalities and cortical regions. Might be useful to discuss this in the discussion section.

      Agree. We added a paragraph on this point to the Discussion (very end of 3.2).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Astrocytes are known to express neuroligins 1-3. Within neurons, these cell adhesion molecules perform important roles in synapse formation and function. Within astrocytes, a significant role for neuroligin 2 in determining excitatory synapse formation and astrocyte morphology was shown in 2017. However, there has been no assessment of what happens to synapses or astrocyte morphology when all three major forms of neuroligins within astrocytes (isoforms 1-3) are deleted using a well characterized, astrocyte specific, and inducible cre line. By using such selective mouse genetic methods, the authors here show that astrocytic neuroligin 1-3 expression in astrocytes is not consequential for synapse function or for astrocyte morphology. They reach these conclusions with careful experiments employing quantitative western blot analyses, imaging and electrophysiology. They also characterize the specificity of the cre line they used. Overall, this is a very clear and strong paper that is supported by rigorous experiments. The discussion considers the findings carefully in relation to past work. This paper is of high importance, because it now raises the fundamental question of exactly what neuroligins 1-3 are actually doing in astrocytes. In addition, it enriches our understanding of the mechanisms by which astrocytes participate in synapse formation and function. The paper is very clear, well written and well illustrated with raw and average data.

      We thank the reviewer for the balanced and informative summary.

      Reviewer #2 (Public Review):

      In the present manuscript, Golf et al. investigate the consequences of astrocyte-specific deletion of Neuroligin family cell adhesion proteins on synapse structure and function in the brain. Decades of prior research had shown that Neuroligins mediate their effects at synapses through their role in the postsynaptic compartment of neurons and their transsynaptic interaction with presynaptic Neurexins. More recently, it was proposed for the first time that Neuroligins expressed by astrocytes can also bind to presynaptic Neurexins to regulate synaptogenesis (Stogsdill et al. 2017, Nature). However, several aspects of the model proposed by Stogsdill et al. on astrocytic Neuroligin function conflict with prior evidence on the role of Neuroligins at synapses, prompting Golf et al. to further investigate astrocytic Neuroligin function in the current study. Using postnatal conditional deletion of Neuroligins 1, 2 and 3 specifically from astrocytes, Golf et al. show that virtually no changes in the expression of synaptic proteins or in the properties of synaptic transmission at either excitatory or inhibitory synapses are observed. Moreover, no alterations in the morphology of astrocytes themselves were found. The authors conclude that while Neuroligins are indeed expressed in astrocytes and are hence likely to play some role there, this role does not include any direct consequences on synaptic structure and function, in direct contrast to the model proposed by Stogsdill et al.

      Overall, this is a strong study that addresses an important and highly relevant question in the field of synaptic neuroscience. Neuroligins are not only key regulators of synaptic function, they have also been linked to numerous psychiatric and neurodevelopmental disorders, highlighting the need to precisely define their mechanisms of action. The authors take a wide range of approaches to convincingly demonstrate that under their experimental conditions, no alterations in the levels of synaptic proteins or in synaptic transmission at excitatory or inhibitory synapses, or in the morphology of astrocytes, are observed.

      We are also grateful for this reviewer’s constructive comments.

      One caveat to this study is that the authors do not directly provide evidence that their Tamoxifen-inducible conditional deletion paradigm does indeed result in efficient deletion of all three Neuroligins from astrocytes. Using a Cre-dependent tdTomato reporter line, they show that tdTomato expression is efficiently induced by the current paradigm, and they refer to a prior study showing efficient deletion of Neuroligins from neurons using the same conditional Nlgn1-3 mouse lines but a different Cre driver strategy. However, neither of these approaches directly provide evidence that all three Neuroligins are indeed deleted from astrocytes in the current study. In contrast, Stogsdill et al. employed FACS and qPCR to directly quantify the loss of Nlgn2 mRNA from astrocytes. This leaves the current Golf et al. study somewhat vulnerable to the criticism, however unlikely, that their lack of synaptic effects may be a consequence of incomplete Neuroligin deletion, rather than a true lack of effect of astrocytic Neuroligins.

      The concern is valid. In the original submission of this paper, we did not establish that the Cre recombinase we used actually deleted neuroligins in astrocytes. We have now addressed this issue in the revised paper with new experiments as described below.

      However, the reviewer’s impression that the Stogsdill et al. paper confirmed full deletion of Nlgn2 is a misunderstanding of the data in that paper. The reviewer is correct that Stogsdill et al. performed FACS to test the efficacy of the GLAST-Cre mediated deletion of Nlgn2-flox mice, followed by qRT-PCR comparing heterozygous with homozygous mutant mice. With their approach, no wild-type control could be used, as these would lack reporter expression. However, this experiment does NOT allow conclusions about the degree of recombination, both overall recombination (i.e. recombination in all astrocytes regardless of TdT+) and recombination in TdT+ astrocytes because it doesn’t quantify recombination. To quantify the degree of recombination, the paper would have had to perform genomic PCR measurements.  

      The problem with the data on the degree of recombination in the Stogsdill et al. (2017) paper, as we understand them, is two-fold.

      First, the GLAST-Cre line only targets ~40-70% of astrocytes, at least as evidenced by highly sensitive Cre-reporter mice in a variety of studies using this Cre line. The 40-70% variation is likely due to differences in the reporter mice and the tamoxifen injection schedule used. In comparison, we are targeting most astrocytes using the Aldh1l1-CreERT2 mice. Moreover, GLAST-Cre mice exhibit neuronal off-targeting, consistent with at least some of the remaining Nlgn2 qRT-PCR signal in the FACS-sorted cells. As we describe next, this signal also likely comes from astrocytes where recombination was incomplete This is the reason why we, like everyone else, are now using the Aldh1l1-Cre line that has been shown to be more efficient both in terms of the overall targeting of astrocytes (i.e. nearly complete) and the level of recombination observed in reporter(+) astrocytes.

      Second, Stogsdill et al. detected a significant decrease in the Nlgn2 qRT-PCR signal in the FACS-sorted homozygous Nlgn2 KO cells compared to the heterozygous Nlgn2 KO cells but the Nlgn2 qRT-PCR signal was still quite large. The data is presented as normalized to the HET condition. As a result, we don’t know the true level of gene deletion (i.e. compared to TdT- astrocytes). For example, based on the Stogsdill et al. data the HET manipulation could have induced only a 20% reduction in Nlgn2 mRNA levels in TdT(+) astrocytes, in which case the KO would have produced a 40% reduction in Nlgn2 mRNA in TdT(+) astrocytes. Moreover, it is possible based on our own experience with the GLAST-Cre line, that the reporter may also not turn on in some astrocytes where other alleles have been independently recombined – just as some astrocytes that are Td(+) would still be wild-type or heterozygous for Nlgn2. Thus, it is impossible to calculate the actual percentage of recombination from these data, even in TdT(+) cells, absent of PCR of genomic DNA from isolated cells. Alternatively, comparison of mRNA levels using primers sensitive to floxed sequences in wild-type controls versus cKO mice would have also yielded a much better idea of the recombination efficiency.

      In summary, it is unclear whether the Nlgn2 deletion in the Stogsdill et al. paper was substantial or marginal – it is simply impossible to tell.

      Reviewer #3 (Public Review):

      This study investigates the roles of astrocytes in the regulation of synapse development and astrocyte morphology using conditional KO mice carrying mutations of three neuroligins1-3 in astrocytes with the deletion starting at two different time points (P1 and P10/11). The authors use morphological, electrophysiological, and cell-biological approaches and find that there are no differences in synapse formation and astrocyte cytoarchitecture in the mutant hippocampus and visual cortex. These results differ from the previous results (Stogsdill et al., 2017), although the authors make several discussion points on how the differences could have been induced. This study provides important information on how astrocytes and neurons interact with each other to coordinate neural development and function. The experiments were well-designed, and the data are of high quality.

      We also thank this reviewer for helpful comments!

      Recommendations for the authors:

      This project was meant to rigorously test the intriguing overall question whether neuroligins, which are abundantly expressed in astrocytes, regulate synapse formation as astrocytic synapse organizers. The goal of the paper was NOT to confirm or dispute the conclusion by Stogsdill et al. (Nature 2017) that Nlgn2 expressed in astrocytes is essential for excitatory synapse formation and that astrocytic Nlgn1-3 are required for proper astrocyte morphogenesis. Instead, the project was meant to address the much broader question whether the abundant expression of any neuroligin, not just Nlgn2, in astrocytes is essential for neuronal excitatory or inhibitory synapse formation and/or for the astrocyte cytoarchitecture. We felt that this was an important question independent of the Stogsdill et al. paper. We analyzed in our experiments young adult mice, a timepoint that was chosen deliberately to avoid the possibility of observing a possible developmental delay rather than a fundamental function that extends beyond development.

      We do recognize that the conclusion by Stogsdill et al. (2017) that Nlgn2 expression in astrocytes is essential for excitatory synapse formation was very exciting to the field but contradicted a large literature demonstrating that Nlgn2 protein is exclusively localized to inhibitory synapses and absent from excitatory synapses (to name just a few papers, see Graf et al., Cell 2004; Varoqueaux et al., Eur. J. Cell Biol. 2004; Patrizi et al., PNAS 2008;  Hoon et al., J. Neurosci. 2009). In addition, the conclusion of Stogsdill et al. that astrocytic Nlgn2 specifically drove excitatory synapse formation was at odds with previous findings documenting that the constitutive deletion of Nlgn2 in all cells, including astrocytes, has no effect on excitatory synapse numbers (again, to name a few papers, see Varoqueaux et al., Neuron 2006; Blundell et al., Genes Brain Behav. 2008; Poulopoulos et al., Neuron 2009; Gibson et al., J. Neurosci. 2009). These contradictions conferred further urgency to our project, but please note that this project was primarily driven by our curiosity about the function of astrocytic neuroligins, not by a fruitless desire to test the validity of one particular Nature paper.

      The general goal of our paper notwithstanding, few papers from our lab have received as much attention and as many negative comments on social media as this paper when it was published as a preprint. Because we take these criticisms seriously, we have over the last year performed extensive additional experiments to ensure that our findings are well founded. We feel that, on balance, our data are incompatible with the notion that astrocytic neuroligins play a fundamental role in excitatory synapse formation but are consistent with other prior findings obtained with neuroligin KO mice. In the new data we added to the paper, we not only characterized the Cre-mediated deletion of neuroligins in depth, but also employed an independent second system -human neurons cultured on mouse glia- to further validate our conclusions as described below. Although we believe that our results are incompatible with the notion that astrocytic neuroligins fundamentally regulate excitatory or inhibitory synapse formation, we also conclude with regret that we still don’t know what astrocytic neuroligins actually do. Thus, the function of astrocytic neuroligins, as there surely must be one, remains a mystery.

      Finally, there are many possible explanations for the discrepancies between our conclusions and those of Stogsdill et al. as described in our paper. Most of these explanations are technical and may explain why not only our, but also the results of many other previous studies from multiple labs, are inconsistent with the conclusions by Stogsdill et al. (2017), as discussed in detail in the revised paper.

      Reviewer #1 (Recommendations For The Authors):

      The paper is very clear and well written. I have only one comment and that is to increase the sizes of Figs 2, 4 and 6 so that the imaging panels can be seen more clearly. Also, although I know the n numbers are provided in the figure legends, the authors may help the reader by providing them in the results when key data and findings are reported.

      We agree and have followed the reviewer’s suggestions as best as we could.

      Reviewer #2 (Recommendations For The Authors):

      (1) Given the strength and importance of the claims that the authors make, I would highly recommend adding some quantitative evidence regarding the efficacy of deletion in astrocytes, e.g. using the same strategy as in Stogsdill et al. As unlikely as it may be that Neuroligin deletion is in fact incomplete, this possibility cannot be excluded unless directly measured. To avoid future discussions on this subject, it seems that the onus is on the authors to provide this information.

      We concur that this is an important point and have devoted a year-long effort to address it. Note, however, that the strategy employed by Stogsdill et al. does not actually allow conclusions about their recombination efficiency. As described above, it only allows the conclusion that some recombination took place. The Stogsdill et al. Nature paper (2017) is a bit confusing on this point. This approach is thus not appropriate to address the question raised by the reviewer.

      We have performed two experiments to address the issue raised by the reviewer.

      First, we used a viral (i.e. AAV2/5) approach to express Rpl22 with a triple HA-tag, also known as Ribotag, which allows us to purify ribosome-bound mRNA from targeted cells for downstream gene expression analysis. The novel construct is driven by the GfaABC1D promoter and includes two additional features which make it particularly useful. First, upstream of Ribotag is a membrane-targeted, Lck-mVenus followed by a self-cleaving P2A sequence. This allows easy visualization of targeted astrocytes. Second, we have incorporated a cassette of four copies of six miRNA targeting sequences (4x6T) for mIR-124 as was recently published (Gleichman et al., 2023) to eliminate off-target expression in neurons. Based on qPCR analysis, the updated construct allowed >95% de-enrichment of neuronal mRNA and slightly improved observed recombination rates (~10% per gene) relative to an earlier version without 4x6T. Mice that were injected with tamoxifen at P1, similar to other experiments in the paper, were then stereotactically injected at ~P35-40 within the dorsal hippocampus with AAV2/5-GfaABC1D-Lck-mVenus-P2A-Rpl22-HA-4x6T. Approximately 3 weeks later, acute slices were prepared, visualized for fluorescence, and both CA1 and nearby cortex that was partially targeted were isolated for downstream ribosome affinity purification with HA antibodies. Total RNA was saved as input. qPCR was performed using assays that are sensitive to the exons that are floxed in the Nlgn123 cKO mice, so that our quantifications are not confounded by potential differences in non-sense mediated decay. Our control data reveals a striking enrichment of an astrocyte marker gene (e.g. aquaporin-4) and de-enrichment of genes for other cell types. In the CA1, we observed robust loss of Nlgn3 (~96%), Nlgn2 (~86%), and Nlgn1 (65%) gene expression. Similarly, in the cortex, we observed a similarly robust loss of Nlgn3 (93%), Nlgn2 (83%), and Nlgn1 (72%) expression. Given that our targeting of astrocytes based on Ai14 Cre-reporter mice was ~90-99%, these reductions are striking and definitive. The existence of some residual transcript reflects the presence of a small population of astrocytes heterozygous for Nlgn2 and Nlgn3. In contrast, Nlgn1 appears more difficult to recombine and it is likely that some astrocytes are either heterozygous or homozygous knockout cells. Although it is thus possible that Nlgn1 could provide some compensation in our experiments, it is worth noting that Stogsdill et al. found that only Nlgn2 and Nlgn3 knockdown with shRNAs resulted in impaired astrocyte morphology by P21. Moreover, they found that Nlgn2 cKO in astrocytes with PALE of a Cre-containing pDNA impaired astrocyte morphology in a gene-dosage dependent manner and suppressed excitatory synapse formation at P21. Thus, our inability to delete all of Nlgn1 doesn’t readily explain contradictions between our findings and theirs.

      Second, in an independent approach we have cultured glia from mouse quadruple conditional Nlgn1234 KO mice and infected the glia with lentiviruses expressing inactive (DCre, control) or active Cre-recombinase. We confirmed complete recombination by PCR. We then cultured human neurons forming excitatory synapses on the glia expressing or lacking neuroligins and measured the frequency and amplitude of mEPSCs as a proxy for synapse numbers and synaptic function. As shown in the new Figure 9, we detected no significant changes in mEPSCs, demonstrating in this independent system that the glial neuroligins do not detectably influence excitatory synapse formation.

      (2) Along the same lines, the authors should be careful not to overstate their findings in this direction. For example, the figure caption for Figure 2 reads 'Nlgn1-3 are efficiently and selectively deleted in astrocytes by crossing triple Nlgn1-3 conditional KO mice with Adh1l1-CreERT2 driver mice and inducing Cre-activity with tamoxifen early during postnatal development'. This is not technically correct and should be modified to reflect that the authors are not in fact assessing deletion of Nlgn1-3, but only expression of a tdTomato reporter.

      We agree – this is essentially the same criticism as comment #1.

      (3) In general, the animal numbers used for the experiments are rather low. With an n = 4 for most experiments, only large abnormalities would be detected anyway, while smaller alterations would not reach statistical significance due to the inherent biological and technical variance. For the most part, this is not a concern, since there really is no difference between WTs and Nlgn1-3 cKOs. However, trends are observed in some cases, and it is conceivable that these would become significant changes with larger n's, e.g. Figure 3H (Vglut2); Figure 4E (VGlut2 S.P., D.G.); Figure 6D (Vglut2). Increasing the numbers to n = 6 here would greatly strengthen the claims that no differences are observed.

      We concur that small differences would not have been detected in our experiments but feel that given the very large phenotypes of the neuroligin deletions in neurons and of the phenotypes reported by Stogsdill et al. (2017), which also did not employ a large number of animals, a very small phenotype in astrocytes would not have been very informative.

      Minor points:

      (1) Please state the exact genetic background for the mouse lines used.

      Our lab generally uses hybrid CD1/Bl6 mice to avoid artifacts produced by inbred genetic mutations in so-called ‘pure’ lines, especially Bl6 mice. This standard protocol was followed in the present study. Thus, the mice are on a mixed CD1/Bl6 hybrid background.

      Reviewer #3 (Recommendations For The Authors):

      (1) Figure 4 demonstrates that neuroligin 1-3 deletions restricted to astrocytes do not affect the number of excitatory and inhibitory synapses in layer IV of the primary visual cortex. This conclusion could be further strengthened if the authors could provide electrophysiological evidence such as mE/IPSCs.

      We agree but have chosen a different avenue to further test our conclusions because slice electrophysiological experiments are time-consuming, labor intensive, and difficult to quantitate, especially in cortex.

      Specifically, we have co-cultured human neurons with astrocytes that either contain or lack neuroligins (new Fig. 9). With this experimental design, we have total control over ALL neuroligins in astrocytes. Electrophysiological recordings then demonstrated that the complete deletion of all glial neuroligins has no effect on mEPSC frequencies and amplitudes. Although clearly much more needs to be done, the new results confirm in an independent system that glial neuroligins have no effect on synapse formation in the neurons, even though neurons depend on astrocytes for synaptogenic factors as Ben Barres brilliantly showed a decade ago. However, it is important to note that dissociated glia in culture, while synaptogenic, are reactive and may not faithfully recapitulate all roles of astrocytes in synaptogenesis.

      (2) It would help readers if the images showing the punctate double marker stainings of excitatory/inhibitory synapses are presented in merged colors (i.e., yellow colors for red and green puncta colors).

      We have tried to improve the visualization of the rather voluminous studies we performed and illustrate in the figures as best as we could.

      (3) The resolutions of the images in the figures are not good, although I guess it is because the images are for review processes.

      We apologize and would like to assure the reviewer that we are supplying high-resolution images to the journal.

      (4) Typos in lines 82 and 274.

      We have corrected these errors.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for their thoughtful feedback. We have made substantial revisions to the manuscript to address each of their comments, as we detail below. We want to highlight one major change in particular that addresses a concern raised by both reviewers: the role of the drift rate in our models. Motivated by their astute comments, we went back through our models and realized that we had made a particular assumption that deserved more scrutiny. We previously assumed that the process of encoding the observations made correct use of the objective, generative correlation, but then the process of calculating the weight of evidence used a mis-scaled, subjective version of the correlation. These assumptions led us to scale the drift rate in the model by a term that quantified how the standard deviation of the observation distribution was affected by the objective correlation (encoding), but to scale the bound height by the subjective estimate of the correlation (evidence weighing). However, we realized that encoding may also depend on the subjective correlation experienced by the participant. We have now tested several alternative models and found that the best-fitting model assumes that a single, subjective estimate of the correlation governs both encoding and evidence weighing. An important consequence of updating our models in this way is that we can now account for the behavioral data without needing the additional correlation-dependent drift terms (which, as reviewer #2 pointed out, were difficult to explain).

      We also note that we changed the title slightly, replacing “weighting” with “weighing” for consistency with our usage throughout the manuscript.

      Please see below for more details about this important point and our responses to the reviewers’ specific concerns. 

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The behavioral strategies underlying decisions based on perceptual evidence are often studied in the lab with stimuli whose elements provide independent pieces of decision-related evidence that can thus be equally weighted to form a decision. In more natural scenarios, in contrast, the information provided by these pieces is often correlated, which impacts how they should be weighted. Tardiff, Kang & Gold set out to study decisions based on correlated evidence and compare the observed behavior of human decision-makers to normative decision strategies. To do so, they presented participants with visual sequences of pairs of localized cues whose location was either uncorrelated, or positively or negatively correlated, and whose mean location across a sequence determined the correct choice. Importantly, they adjusted this mean location such that, when correctly weighted, each pair of cues was equally informative, irrespective of how correlated it was. Thus, if participants follow the normative decision strategy, their choices and reaction times should not be impacted by these correlations. While Tardiff and colleagues found no impact of correlations on choices, they did find them to impact reaction times, suggesting that participants deviated from the normative decision strategy. To assess the degree of this deviation, Tardiff et al. adjusted drift-diffusion models (DDMs) for decision-making to process correlated decision evidence. Fitting these models to the behavior of individual participants revealed that participants considered correlations when weighing evidence, but did so with a slight underestimation of the magnitude of this correlation. This finding made Tardiff et al. conclude that participants followed a close-to-normative decision strategy that adequately took into account correlated evidence.

      Strengths:

      The authors adjust a previously used experimental design to include correlated evidence in a simple, yet powerful way. The way it does so is easy to understand and intuitive, such that participants don't need extensive training to perform the task. Limited training makes it more likely that the observed behavior is natural and reflective of everyday decision-making. Furthermore, the design allowed the authors to make the amount of decision-related evidence equal across different correlation magnitudes, which makes it easy to assess whether participants correctly take account of these correlations when weighing evidence: if they do, their behavior should not be impacted by the correlation magnitude.

      The relative simplicity with which correlated evidence is introduced also allowed the authors to fall back to the well-established DDM for perceptual decisions, which has few parameters, is known to implement the normative decision strategy in certain circumstances, and enjoys a great deal of empirical support. The authors show how correlations ought to impact these parameters, and which changes in parameters one would expect to see if participants misestimate these correlations or ignore them altogether (i.e., estimate correlations to be zero). This allowed them to assess the degree to which participants took into account correlations on the full continuum from perfect evidence weighting to complete ignorance. With this, they could show that participants in fact performed rational evidence weighting if one assumed that they slightly underestimated the correlation magnitude.

      Weaknesses:

      The experiment varies the correlation magnitude across trials such that participants need to estimate this magnitude within individual trials. This has several consequences:

      (1) Given that correlation magnitudes are estimated from limited data, the (subjective) estimates might be biased towards their average. This implies that, while the amount of evidence provided by each 'sample' is objectively independent of the correlation magnitude, it might subjectively depend on the correlation magnitude. As a result, the normative strategy might differ across correlation magnitudes, unlike what is suggested in the paper. In fact, it might be the case that the observed correlation magnitude underestimates corresponds to the normative strategy.

      We thank the reviewer for raising this interesting point, which we now address directly with new analyses including model fits (pp. 15–24). These analyses show that the participants were computing correlation-dependent weights of evidence from observation distributions that reflected suboptimal misestimates of correlation magnitudes. This strategy is normative in the sense that it is the best that they can do, given the encoding suboptimality. However, as we note in the manuscript, we do not know the source of the encoding suboptimality (pp. 23–24). We thus do not know if there might be a strategy they could have used to make the encoding more optimal.

      (2) The authors link the normative decision strategy to putting a bound on the log-likelihood ratio (logLR), as implemented by the two decision boundaries in DDMs. However, as the authors also highlight in their discussion, the 'particle location' in DDMs ceases to correspond to the logLR as soon as the strength of evidence varies across trials and isn't known by the decision maker before the start of each trial. In fact, in the used experiment, the strength of evidence is modulated in two ways:

      (i) by the (uncorrected) distance of the cue location mean from the decision boundary (what the authors call the evidence strength) and

      (ii) by the correlation magnitude. Both vary pseudo-randomly across trials, and are unknown to the decision-maker at the start of each trial. As previous work has shown (e.g. Kiani & Shadlen (2009), Drugowitsch et al. (2012)), the normative strategy then requires averaging over different evidence strength magnitudes while forming one's belief. This averaging causes the 'particle location' to deviate from the logLR. This deviation makes it unclear if the DDM used in the paper indeed implements the normative strategy, or is even a good approximation to it.

      We appreciate this subtle, but important, point. We now clarify that the DDM we use includes degrees of freedom that are consistent with normative decision processes that rely on the imperfect knowledge that participants have about the generative process on each trial, specifically: 1) a single drift-rate parameter that is fit to data across different values of the mean of the generative distribution, which is based on the standard assumption for these kinds of task conditions in which stimulus strength is varied randomly from trial-to-trial and thus prevents the use of exact logLR (which would require stimulus strength-specific scale factors; Gold and Shadlen, 2001); 2) the use of a collapsing bound, which in certain cases (including our task) is thought to support a stimulus strength-dependent calibration of the decision variable to optimize decisions (Drugowitsch et al, 2012); and 3) free parameters (one per correlation) to account for subjective estimates of the correlation, which affected the encoding of the observations that are otherwise weighed in a normative manner in the best-fitting model.

      Also, to clarify our terminology, we define the objective evidence strength as the expected logLR in a given condition, which for our task is dependent on both the distance of the mean from the decision boundary and the correlation (p. 7). 

      Given that participants observe 5 evidence samples per second and on average require multiple seconds to form their decisions, it might be that they are able to form a fairly precise estimate of the correlation magnitude within individual trials. However, whether this is indeed the case is not clear from the paper.

      These points are now addressed directly in Results (pp. 23–24) and Figure 7 supplemental figures 1–3. Specifically, we show that, as the reviewer correctly surmised above, empirical correlations computed on each trial tended to be biased towards zero (Fig 7–figure supplement 1). However, two other analyses were not consistent with the idea that participants’ decisions were based on trial-by-trial estimates of the empirical correlations: 1) those with the shortest RTs did not have the most-biased estimates (Fig 7–figure supplement 2), and 2) there was no systematic relationship between objective and subjective fit correlations across participants (Fig 7–figure supplement 3).

      Furthermore, the authors capture any underestimation of the correlation magnitude by an adjustment to the DDM bound parameter. They justify this adjustment by asking how this bound parameter needs to be set to achieve correlation-independent psychometric curves (as observed in their experiments) even if participants use a 'wrong' correlation magnitude to process the provided evidence. Curiously, however, the drift rate, which is the second critical DDM parameter, is not adjusted in the same way. If participants use the 'wrong' correlation magnitude, then wouldn't this lead to a mis-weighting of the evidence that would also impact the drift rate? The current model does not account for this, such that the provided estimates of the mis-estimated correlation magnitudes might be biased.

      We appreciate this valuable comment, and we agree that we previously neglected the potential impact of correlation misestimates on evidence strength. As we now clarify, the correlation enters these models in two ways: 1) via its effect on how the observations are encoded, which involves scaling both the drift and the bound; and 2) via its effect on evidence weighing, which involves scaling only the bound (pp. 15–18). We previously assumed that only the second form of scaling might involve a subjective (mis-)estimate of the correlation. We now examine several models that also include the possibility of either or both forms using subjective correlation estimates. We show that a model that assumes that the same subjective estimate drives both encoding and weighing (the “full-rho-hat” model) best accounts for the data. This model provides better fits (after accounting for differences in numbers of parameters) than models with: 1) no correlation-dependent adjustments (“base” model), 2) separate drift parameters for each correlation condition (“drift” model), 3) optimal (correlation-dependent) encoding but suboptimal weighing (“bound-rho-hat” model, which was our previous formulation), 4) suboptimal encoding and weighing (“scaled-rho-hat” model), and 5) optimal encoding but suboptimal weighing and separate correlation-dependent adjustments to the drift rate (“boundrho-hat plus drift” model). We have substantially revised Figures 5–7 and the associated text to address these points.

      Lastly, the paper makes it hard to assess how much better the participants' choices would be if they used the correct correlation magnitudes rather than underestimates thereof. This is important to know, as it only makes sense to strictly follow the normative strategy if it comes with a significant performance gain.

      We now include new analyses in Fig. 7 that demonstrate how much participants' choices and RT deviate from: 1) an ideal observer using the objective correlations, and 2) an observer who failed to adjust for the fit subjective correlation when weighing the evidence (i.e., using the subjective correlation for encoding but a correlation of zero for weighing). We now indicate that participants’ performance was quite close to that predicted by the ideal observer (using the true, objective correlation) for many conditions. Thus, we agree that they might not have had the impetus to optimize the decision process further, assuming it were possible under these task conditions.

      Reviewer #2 (Public review):

      Summary:

      This study by Tardiff, Kang & Gold seeks to: i) develop a normative account of how observers should adapt their decision-making across environments with different levels of correlation between successive pairs of observations, and ii) assess whether human decisions in such environments are consistent with this normative model.

      The authors first demonstrate that, in the range of environments under consideration here, an observer with full knowledge of the generative statistics should take both the magnitude and sign of the underlying correlation into account when assigning weight in their decisions to new observations: stronger negative correlations should translate into stronger weighting (due to the greater information furnished by an anticorrelated generative source), while stronger positive correlations should translate into weaker weighting (due to the greater redundancy of information provided by a positively correlated generative source). The authors then report an empirical study in which human participants performed a perceptual decision-making task requiring accumulation of information provided by pairs of perceptual samples, under different levels of pairwise correlation. They describe a nuanced pattern of results with effects of correlation being largely restricted to response times and not choice accuracy, which could partly be captured through fits of their normative model (in this implementation, an extension of the well-known drift-diffusion model) to the participants' behaviour while allowing for misestimation of the underlying correlations.

      Strengths:

      As the authors point out in their very well-written paper, appropriate weighting of information gathered in correlated environments has important consequences for real-world decisionmaking. Yet, while this function has been well studied for 'high-level' (e.g. economic) decisions, how we account for correlations when making simple perceptual decisions on well-controlled behavioural tasks has not been investigated. As such, this study addresses an important and timely question that will be of broad interest to psychologists and neuroscientists. The computational approach to arrive at normative principles for evidence weighting across environments with different levels of correlation is very elegant, makes strong connections with prior work in different decision-making contexts, and should serve as a valuable reference point for future studies in this domain. The empirical study is well designed and executed, and the modelling approach applied to these data showcases a deep understanding of relationships between different parameters of the drift-diffusion model and its application to this setting. Another strength of the study is that it is preregistered.

      Weaknesses:

      In my view, the major weaknesses of the study center on the narrow focus and subsequent interpretation of the modelling applied to the empirical data. I elaborate on each below:

      Modelling interpretation: the authors' preference for fitting and interpreting the observed behavioural effects primarily in terms of raising or lowering the decision bound is not well motivated and will potentially be confusing for readers, for several reasons. First, the entire study is conceived, in the Introduction and first part of the Results at least, as an investigation of appropriate adjustments of evidence weighting in the face of varying correlations. The authors do describe how changes in the scaling of the evidence in the drift-diffusion model are mathematically equivalent to changes in the decision bound - but this comes amidst a lengthy treatment of the interaction between different parameters of the model and aspects of the current task which I must admit to finding challenging to follow, and the motivation behind shifting the focus to bound adjustments remained quite opaque. 

      We appreciate this valuable feedback. We have revised the text in several places to make these important points more clearly. For example, in the Introduction we now clarify that “The weight of evidence is computed as a scaled version of each observation (the scaling can be applied to the observations or to the bound, which are mathematically equivalent; Green and Swets, 1966) to form the logLR” (p. 3). We also provide more details and intuition in the Results section for how and why we implemented the DDM the way we did. In particular, we now emphasize that the correlation enters these models in two ways: 1) via its effect on encoding the observations, which scales both the drift and the bound; and 2) via its effect on evidence weighing, which scales only the bound (pp. 15–18).

      Second, and more seriously, bound adjustments of the form modelled here do not seem to be a viable candidate for producing behavioural effects of varying correlations on this task. As the authors state toward the end of the Introduction, the decision bound is typically conceived of as being "predefined" - that is, set before a trial begins, at a level that should strike an appropriate balance between producing fast and accurate decisions. There is an abundance of evidence now that bounds can change over the course of a trial - but typically these changes are considered to be consistently applied in response to learned, predictable constraints imposed by a particular task (e.g. response deadlines, varying evidence strengths). In the present case, however, the critical consideration is that the correlation conditions were randomly interleaved across trials and were not signaled to participants in advance of each trial - and as such, what correlation the participant would encounter on an upcoming trial could not be predicted. It is unclear, then, how participants are meant to have implemented the bound adjustments prescribed by the model fits. At best, participants needed to form estimates of the correlation strength/direction (only possible by observing several pairs of samples in sequence) as each trial unfolded, and they might have dynamically adjusted their bounds (e.g. collapsing at a different rate across correlation conditions) in the process. But this is very different from the modelling approach that was taken. In general, then, I view the emphasis on bound adjustment as the candidate mechanism for producing the observed behavioural effects to be unjustified (see also next point).

      We again appreciate this valuable feedback and have made a number of revisions to try to clarify these points. In addition to addressing the equivalence of scaling the evidence and the bound in the Introduction, we have added the following section to Results (Results, p.18):

      “Note that scaling the bound in these formulations follows conventions of the DDM, as detailed above, to facilitate interpretation of the parameters. These formulations also raise an apparent contradiction: the “predefined” bound is scaled by subjective estimates of the correlation, but the correlation was randomized from trial to trial and thus could not be known in advance. However, scaling the bound in these ways is mathematically equivalent to using a fixed bound on each trial and scaling the observations to approximate logLR (see Methods). This equivalence implies that in the brain, effectively scaling a “predefined” bound could occur when assigning a weight of evidence to the observations as they are presented.”

      We also note in Methods (pp. 40–41):

      “In the DDM, this scaling of the evidence is equivalent to assuming that the decision variable accumulates momentary evidence of the form (x1 + x2) and then dividing the bound height by the appropriate scale factor. An alternative approach would be to scale both the signal and noise components of the DDM by the scale factor. However, scaling the bound is both simpler and maintains the conventional interpretation of the DDM parameters in which the bound reflects the decision-related components of the evidence accumulation process, and the drift rate represents sensory-related components.”

      We believe we provide strong evidence that participants adjust their evidence weighing to account for the correlations (see response below), but we remain agnostic as to how exactly this weighing is implemented in the brain.

      Modelling focus: Related to the previous point, it is stated that participants' choice and RT patterns across correlation conditions were qualitatively consistent with bound adjustments (p.20), but evidence for this claim is limited. Bound adjustments imply effects on both accuracy and RTs, but the data here show either only effects on RTs, or RT effects mixed with accuracy trends that are in the opposite direction to what would be expected from bound adjustment (i.e. slower RT with a trend toward diminished accuracy in the strong negative correlation condition; Figure 3b). Allowing both drift rate and bound to vary with correlation conditions allowed the model to provide a better account of the data in the strong correlation conditions - but from what I can tell this is not consistent with the authors' preregistered hypotheses, and they rely on a posthoc explanation that is necessarily speculative and cannot presently be tested (that the diminished drift rates for higher negative correlations are due to imperfect mapping between subjective evidence strength and the experimenter-controlled adjustment to objective evidence strengths to account for effects of correlations). In my opinion, there are other candidate explanations for the observed effects that could be tested but lie outside of the relatively narrow focus of the current modelling efforts. Both explanations arise from aspects of the task, which are not mutually exclusive. The first is that an interesting aspect of this task, which contrasts with most common 'univariate' perceptual decision-making tasks, is that participants need to integrate two pieces of information at a time, which may or may not require an additional computational step (e.g. averaging of two spatial locations before adding a single quantum of evidence to the building decision variable). There is abundant evidence that such intermediate computations on the evidence can give rise to certain forms of bias in the way that evidence is accumulated (e.g. 'selective integration' as outlined in Usher et al., 2019, Current Directions in Psychological Science; Luyckx et al., 2020, Cerebral Cortex) which may affect RTs and/or accuracy on the current task. The second candidate explanation is that participants in the current study were only given 200 ms to process and accumulate each pair of evidence samples, which may create a processing bottleneck causing certain pairs or individual samples to be missed (and which, assuming fixed decision bounds, would presumably selectively affect RT and not accuracy). If I were to speculate, I would say that both factors could be exacerbated in the negative correlation conditions, where pairs of samples will on average be more 'conflicting' (i.e. further apart) and, speculatively, more challenging to process in the limited time available here to participants. Such possibilities could be tested through, for example, an interrogation paradigm version of the current task which would allow the impact of individual pairs of evidence samples to be more straightforwardly assessed; and by assessing the impact of varying inter-sample intervals on the behavioural effects reported presently.

      We thank the reviewer for this thoughtful and valuable feedback. We have thoroughly updated the modeling section to include new analysis and clearer descriptions and interpretations of our findings (including Figs. 5–7 and additional references to the Usher, Luyckx, and other studies that identified decision suboptimalities). The comment about “an additional computational step” in converting the observations to evidence was particularly useful, in that it made us realize that we were making what we now consider to be a faulty assumption in our version of the DDM. Specifically, we assumed that subjective misestimates of the correlation affected how observations were converted to evidence (logLR) to form the decision (implemented as a scaling of the bound height), but we neglected to consider how suboptimalities in encoding the observations could also lead to misestimates of the correlation. We have retained the previous best-fitting models in the text, for comparison (the “bound-rho-hat” and “bound-rho-hat + drift” models). In addition, we now include a “full-rho-hat” model that assumes that misestimates of rho affect both the encoding of the observations, which affects the drift rate and bound height, and the weighing of the evidence, which affects only the bound height. This was the best-fitting model for most participants (after accounting for different numbers of parameters associated with the different models we tested). Note that the full-rho-hat model predicts the lack of correlation-dependent choice effects and the substantial correlation-dependent RT effects that we observed, without requiring any additional adjustments to the drift rate (as we resorted to previously).

      In summary, we believe that we now have a much more parsimonious account of our data, in terms of a model in which subjective estimates of the correlation are alone able to account for our patterns of choice and RT data. We fully agree that more work is needed to better understand the source of these misestimates but also think those questions are outside the scope of the present study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      A few minor comments:

      (1) Evidence can be correlated in multiple ways. It could be correlated within individual pieces of evidence in a sequence, or across elements in that sequence (e.g., across time). This distinction is important, as it determines how evidence ought to be accumulated across time. In particular, if evidence is correlated across time, simply summing it up might be the wrong thing to do. Thus, it would be beneficial to make this distinction in the Introduction, and to mention that this paper is only concerned with the first type of correlation.

      We now clarify this point in the Introduction (p. 5–6).

      (2) It is unclear without reading the Methods how the blue dashed line in Figure 4c is generated. To my understanding, it is a prediction of the naive DDM model. Is this correct?

      We now specify the models used to make the predictions shown in Fig. 4c (which now includes an additional model that uses unscaled observations as evidence).

      (3) In Methods, given the importance of the distribution of x1 + x2, it would be useful to write it out explicitly, e.g., x1 + x2 ~ N(2 mu_g, ..), specifying its mean and its variance.

      Excellent suggestion, added to p. 38.

      (4) From Methods and the caption of Figure 6 - Supplement 1 it becomes clear that the fitted DDM features a bound that collapses over time. I think that this should also be mentioned in the main text, as it is a not-too-unimportant feature of the model.

      Excellent suggestion, added to p. 15, with reference to Fig. 6-supplement 1 on p. 20.

      (5) The functional form of the bound is 2 (B - tb t). To my understanding, the effective B changes as a function of the correlation magnitude. Does tb as well? If not, wouldn't it be better if it does, to ensure that 2 (B - tb t) = 0 independent of the correlation magnitude?

      In our initial modeling, we also considered whether the correlation-dependent adjustment, which is a function of both correlation sign and magnitude, should be applied to the initial bound or to the instantaneous bound (i.e., after collapse, affecting tb as well). In a pilot analysis of data from 22 participants in the 0.6 correlation-magnitude group, we found that this choice had a negligible effect on the goodness-of-fit (deltaAIC = -0.9, protected exceedance probability = 0.63, in favor of the instantaneous bound scaling). We therefore used the instantaneous bound version in the analyses reported in the manuscript but doubt this choice was critical based on these results. We have clarified our implementation of the bound in Methods (p. 43–44).

      Reviewer #2 (Recommendations for the authors):

      In addition to the points raised above, I have some minor suggestions/open questions that arose from my reading of the manuscript:

      (1) Are the predictions outlined in the paper specific to cases where the two sources are symmetric around zero? If distributions are allowed to be asymmetric then one can imagine cases (i.e. when distribution means are sufficiently offset from one another) where positive correlations can increase evidence strength and negative correlations decrease evidence strength. There's absolutely still value and much elegance in what the authors are showing with this work, but if my intuition is correct, it should ideally be acknowledged that the predictions are restricted to a specific set of generative circumstances.

      We agree that there are a lot of ways to manipulate correlations and their effect on the weight of evidence. At the end of the Discussion, we emphasize that our results apply to this particular form of correlation (p. 32).

      (2) Isn't Figure 4C misleading in the sense that it collapses across the asymmetry in the effect of negative vs positive correlations on RT, which is clearly there in the data and which simply adjusting the correlation-dependent scale factor will not reproduce?

      We agree that this analysis does not address any asymmetries in suboptimal estimates of positive versus negative correlations. We believe that those effects are much better addressed using the model fitting, which we present later in the Results section. We have now simplified the analyses in Fig. 4c, reporting the difference in RT between positive and negative correlation conditions instead of a linear regression.

      (3) I found the transition on p.17 of the Results section from the scaling of drift rate by correlation to scaling of bound height to be quite abrupt and unclear. I suspect that many readers coming from a typical DDM modelling background will be operating under the assumption that drift rate and bound height are independent, and I think more could be done here to explain why scaling one parameter by correlation in the present case is in fact directly equivalent to scaling the other.

      Thank you for the very useful feedback, we have substantially revised this text to make these points more clearly.

      (4) P.3, typo: Alan *Turing*

      That’s embarrassing. Fixed.

      (5) P.27, typo: "participants adopt a *fixed* bound"

      Fixed.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This study presents valuable findings related to seasonal brain size plasticity in the Eurasian common shrew (Sorex araneus), which is an excellent model system for these studies. The evidence supporting the authors' claims is convincing. However, the authors should be careful when applying the term adaptive to the gene expression changes they observe; it would be challenging to demonstrate the differential fitness effects of these gene expression changes. The work will be of interest to biologists working on neuroscience, plasticity, and evolution.

      We appreciate the reviewers’ suggestions and comments. For the phylogenetic ANOVA we used (EVE), which tests for a separate RNA expression optimum specific to the shrew lineage consistent with expectations for adaptive evolution of gene expression. But, as you noted, while this analysis highlights many candidate genes evolving in a manner consistent with positive selection, further functional validation is required to confirm if and how these genes contribute to Dehnel’s phenomenon. In the discussion, we now emphasize that inferred adaptive expression of these genes is putative and outline that future studies are needed to test the function of proposed adaptations. For example, cell line validations of BCL2L1 on apoptosis is a case study that tests the function of a putatively adaptive change in gene expression, and it illuminates this limitation. We also have refined our discussion to focus more on pathway-level analyses rather than on individual genes, and have addressed other issues presented, including clarity of methods and using sex as a covariate in our analyses.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this paper, Thomas et al. set out to study seasonal brain gene expression changes in the Eurasian common shrew. This mammalian species is unusual in that it does not hibernate or migrate but instead stays active all winter while shrinking and then regrowing its brain and other organs. The authors previously examined gene expression changes in two brain regions and the liver. Here, they added data from the hypothalamus, a brain region involved in the regulation of metabolism and homeostasis. The specific goals were to identify genes and gene groups that change expression with the seasons and to identify genes with unusual expression compared to other mammalian species. The reason for this second goal is that genes that change with the season could be due to plastic gene regulation, where the organism simply reacts to environmental change using processes available to all mammals. Such changes are not necessarily indicative of adaptation in the shrew. However, if the same genes are also expression outliers compared to other species that do not show this overwintering strategy, it is more likely that they reflect adaptive changes that contribute to the shrew's unique traits.

      The authors succeeded in implementing their experimental design and identified significant genes in each of their specific goals. There was an overlap between these gene lists. The authors provide extensive discussion of the genes they found.

      The scope of this paper is quite narrow, as it adds gene expression data for only one additional tissue compared to the authors' previous work in a 2023 preprint. The two papers even use the same animals, which had been collected for that earlier work. As a consequence, the current paper is limited in the results it can present. This is somewhat compensated by an expansive interpretation of the results in the discussion section, but I felt that much of this was too speculative. More importantly, there are several limitations to the design, making it hard to draw stronger conclusions from the data. The main contribution of this work lies in the generated data and the formulation of hypotheses to be tested by future work.

      Thank you for your interest in our manuscript and for your insights. We addressed your comments below: we now highlight the limitations of our study design in the discussion and emphasize that, while a second optimum of gene expression in shrews is consistent with adaptive evolution, we recognize that not all sources of variation in gene expression can be fully accounted for. We highlight the putative nature of these results in our revisions, especially in our new limitations section (lines 541-555).

      Strengths:

      The unique biological model system under study is fascinating. The data were collected in a technically sound manner, and the analyses were done well. The paper is overall very clear, well-written, and easy to follow. It does a thorough job of exploring patterns and enrichments in the various gene sets that are identified.

      I specifically applaud the authors for doing a functional follow-up experiment on one of the differentially expressed genes (BCL2L1), even if the results did not support the hypothesis. It is important to report experiments like this and it is terrific to see it done here.

      We are glad to hear that you found our manuscript fascinating and clearly written. While we hoped to see an effect of BCL2L1 on apoptosis as proposed, we agree that reporting null results is valuable when validating evolutionary inferences.

      Weaknesses:

      While the paper successfully identifies differentially expressed seasonal genes, the real question is (as explained by the authors) whether these are evolved adaptations in the shrews or whether they reflect plastic changes that also exist in other species. This question was the motivation for the inter-species analyses in the paper, but in my view, these cannot rigorously address this question. Presumably, the data from the other species were not collected in comparable environments as those experienced by the shrews studied here. Instead, they likely (it is not specified, and might not be knowable for the public data) reflect baseline gene expression. To see why this is problematic, consider this analogy: if we were to compare gene expression in the immune system of an individual undergoing an acute infection to other, uninfected individuals, we would see many, strong expression differences. However, it would not be appropriate to claim that the infected individual has unique features - the relevant physiological changes are simply not triggered in the other individuals. The same applies here: it is hard to draw conclusions from seasonal expression data in the shrews to non-seasonal data in the other species, as shrew outlier genes might still reflect physiological changes that weren't active in the other species.

      There is no solution for this design flaw given the public data available to the authors except for creating matched data in the other species, which is of course not feasible. The authors should acknowledge and discuss this shortcoming in the paper.

      Thank you for taking the time to provide such insightful feedback. As you noted, whiles shrews experience seasonal size changes, their environments may differ from the other species used in this experiment, leading to increased or decreased expression of certain genes and reducing our ability accurately detect selection across the phylogeny. Although we sought to control for as many sources of variation as possible, such as using only post-pubescent, wild, or non-domesticated individuals when feasible, we recognize that not all sources of variation can be fully accounted for within a practical experiment. We agree that these sources of variation can introduce both false positives and negatives into our results, and we have now highlighted this limitation within our discussion (lines 538-552).

      Related to the point above: in the section "Evolutionary Divergence in Expression" it is not clear which of the shrew samples were used. Was it all of them, or only those from winter, fall, etc? One might expect different results depending on this. E.g., there could be fewer genes with inferred adaptive change when using only summer samples. The authors should specify which samples were included in these analyses, and, if all samples were used, conduct a robustness analysis to see which of their detected genes survive the exclusion of certain time points.

      Thank you for this attention to detail. We used spring adults for this analysis. This decision was made as only used post pubescent individuals for all species in the analysis, and this was the only season where adult shrews were going through Dehnel’s phenomenon. We have now clarified this in both the methods and results (line 247 and line 667)

      In the same section, were there also genes with lower shrew expression? None are mentioned in the text, so did the authors not test for this direction, or did they test and there were no significant hits?

      We did test for decreased shrew expression compared to the rest of the species, but there were no significant genes with significant decreases. We hypothesize that there are two potential reasons for this results; 1) If a gene were to be selected for decreased expression, selection for constitutive expression of the gene across all species may be weak, and thus found in other lineages as well, or 2) decreased or no expression may relax selection on the coding regions, and thus these genes are not pulled out as we identify 1:1 orthologs. This is consistent with results provided from the original methods manuscript. Thank you for pointing out that we did not discuss this information in the text, and we now include it in our results (lines 250-251).

      The Discussion is too long and detailed, given that it can ultimately only speculate about what the various expression changes might mean. Many of the specific points made (e.g. about the blood-brain-barrier being more permissive to sensing metabolic state, about cross-organ communication, the paragraphs on single, specific genes) are a stretch based on the available data. Illustrating this point, the one follow-up experiment the authors did (on BCL2L1) did not give the expected result. I really applaud the authors for having done this experiment, which goes beyond typical studies in this space. At the same time, its result highlights the dangers of reading too much into differential expression analyses.

      We agree with your point, while our extensive discussion is useful for testing future hypotheses, ultimately some of the discussion may be too speculative for our readers. To amend this, we have reduced some portions of our discussion and focused more on pathways than individual genes, including removing mechanisms related to HRH2, FAM57B, GPR3, and GABAergic neurons. We hope that this highlights to the reader the speculative nature of many of our results.

      There is no test of whether the five genes observed in both analyses (seasonal change and inter-species) exceed the number expected by chance. When two gene sets are drawn at random, some overlap is expected randomly. The expected overlap can be computed by repeated draws of pairs of random sets of the same size as seen in real data and by noting the overlap between the random pairs. If this random distribution often includes sets of five genes, this weakens the conclusions that can be drawn from the genes observed in the real data.

      Thank you for highlighting this approach, it is greatly needed. After running this test, we found that observed overlapping genes were more than the expected overlap, yet not significant. We now show this in our methods (lines 277-278) and results (lines 719-720).

      Reviewer #2 (Public review):

      Summary:

      Shrews go through winter by shrinking their brain and most organs, then regrow them in the spring. The gene expression changes underlying this unusual brain size plasticity were unknown. Here, the authors looked for potential adaptations underlying this trait by looking at differential expression in the hypothalamus. They found enrichments for DE in genes related to the blood-brain barrier and calcium signaling, as well as used comparative data to look at gene expression differences that are unique in shrews. This study leverages a fascinating organismal trait to understand plasticity and what might be driving it at the level of gene expression. This manuscript also lays the groundwork for further developing this interesting system.

      We are glad you found our manuscript interesting and thank and thank you for your feedback. We hope that we have addressed all of your concerns as described below.

      Strengths:

      One strength is that the authors used OU models to look for adaptation in gene expression. The authors also added cell culture work to bolster their findings.

      Weaknesses:

      I think that there should be a bit more of an introduction to Dehnel's phenomenon, given how much it is used throughout.

      Thank you for this insight. With a lengthy introduction and discussion, we agree that the importance of Dehnel’s phenomenon may have been overshadowed. We have shortened both sections and emphasized the background on Dehnel’s phenomenon in the first two paragraphs of the introduction, allowing this extraordinary seasonal size plasticity to stand out.

      Reviewer #3 (Public review):

      Summary:

      In their study, the authors combine developmental and comparative transcriptomics to identify candidate genes with plastic, canalized, or lineage-specific (i.e., divergent) expression patterns associated with an unusual overwintering phenomenon (Dehnel's phenomenon - seasonal size plasticity) in the Eurasian shrew. Their focus is on the shrinkage and regrowth of the hypothalamus, a brain region that undergoes significant seasonal size changes in shrews and plays a key role in regulating metabolic homeostasis. Through combined transcriptomic analysis, they identify genes showing derived (lineage-specific), plastic (seasonally regulated), and canalized (both lineage-specific and plastic) expression patterns. The authors hypothesize that genes involved in pathways such as the blood-brain barrier, metabolic state sensing, and ion-dependent signaling will be enriched among those with notable transcriptomic patterns. They complement their transcriptomic findings with a cell culture-based functional assessment of a candidate gene believed to reduce apoptosis.

      Strengths:

      The study's rationale and its integration of developmental and comparative transcriptomics are well-articulated and represent an advancement in the field. The transcriptome, known for its dynamic and plastic nature, is also influenced by evolutionary history. The authors effectively demonstrate how multiple signals-evolutionary, constitutive, and plastic-can be extracted, quantified, and interpreted. The chosen phenotype and study system are particularly compelling, as it not only exemplifies an extreme case of Dehnel's phenotype, but the metabolic requirements of the shrew suggest that genes regulating metabolic homeostasis are under strong selection.

      Weaknesses:

      (1) In a number of places (described in detail below), the motivation for the experimental, analytical, or visualization approach is unclear and may obscure or prevent discoveries.

      Thank you for finding our research and manuscript compelling, as well as the valuable feedback that will drastically improve our manuscript. We hope that we have alleviated your concerns below by following your instructions below.

      (2) Temporal Expression - Figure 1 and Supplemental Figure 2 and associated text:

      - It is unclear whether quantitative criteria were used to distinguish "developmental shift" clusters from "season shift" clusters. A visual inspection of Supplemental Figure 2 suggests that some clusters (e.g., clusters 2, 8, and to a lesser extent 12) show seasonal variation, not just developmental differences between stages 1 and 2. While clustering helps to visualize expression patterns, it may not be the most appropriate filter in this case, particularly since all "season shift" clusters are later combined in KEGG pathway and GO analyses (Figure 1B).

      - The authors do not indicate whether they perform cluster-specific GO or KEGG pathway enrichment analyses. The current analysis picks up relevant pathways for hypothalamic control of homeostasis, which is a useful validation, but this approach might not fully address the study's key hypotheses.

      Thank you for this valuable feedback. We did not want to include clusters we deemed to be related to development, as this should not be attributed to changes associated with Dehnel’s phenomenon. We did this through qualitative, visual inspection, which we realize can differ between parties (i.e., clusters 2, 8, and 12 appeared to be seasonal). Qualitatively, we were looking for extreme divergence between Stage 1 and Stage 5 individuals, as expression was related to season and not development, then the average of these stages within cluster should be relatively similar. We have now quantified this as large differences in z-score (abs(summer juvenile-summer adult)>1.25) without meaningful interseason variations determined by a second local maximum (abs(autumn-winter)<0.5 and abs(winter-summer)<0.5)), and added it both our methods (lines 699-702) and results (line 192).

      Regarding the combination of clusters for pathway enrichment compared to individual pathways, we agree that combining clusters may be more informative for overall homeostasis, compared to individual clusters which may inform us on processes directly related to Dehnel’s phenomenon. Initially, we were tentative to conduct this analysis, as clusters contain small gene sets, reducing the ability to detect pathway enrichments. We have now included this analysis, which is reported in our methods (lines 703-704), results (lines 203-204)., and new supplemental table.

      (3) Differential expression between shrinkage (stage 2) and regrowth (stage 4) and cell culture targets

      - The rationale for selecting BCL2L1 for cell culture experiments should be clarified. While it is part of the apoptosis pathway, several other apoptosis-related genes were identified in the differential gene expression (DGE) analysis, some showing stronger differential expression or shrew-specific branch shifts. Why was BCL2L1 prioritized over these other candidates?

      We agree that our rationale for validating BCL2L1 function in neural cell lines was not clearly explained in the manuscript. We selected BCL2L1 because it is the furthest downstream gene in the apoptotic pathway, thus making it the most directly involved gene in programmed cell death, whereas upstream genes could influence additional genes or alternative processes. We have clarified this choice in the revised methods section (lines 748-750).

      - The authors mention maintaining (or at least attempting to maintain) a 1:1 sex ratio for the comparative analysis, but it is unclear if this was also done for the S. araneus analysis. If not, why? If so, was sex included as a covariate (e.g., a random effect) in the differential expression analysis? Sex-specific expression elevates with group variation and could impact the discovery of differentially expressed genes.

      Regarding the use of sex as a covariate, we acknowledge the concerns raised. In our evolutionary analyses, we maintained a balanced sex ratio within species when possible. EVE models handle the effect of sex on gene expression as intraspecific variation. In shrews, however, we used males exclusively, as females were only found among juvenile individuals. Including those juvenile females would have introduced age effects, with perhaps a larger effect on our results. For the seasonal data, we have now included sex as a covariate in differential expression analyses. However, our design is imbalanced in relation to sex, which we have now discussed in our methods (lines 713-714) and discussion limitations (lines 544-548).

      (4) Discussion: The term "adaptive" is used frequently and liberally throughout the discussion. The interpretation of seasonal changes in gene expression as indicators of adaptive evolution should be done cautiously as such changes do not necessarily imply causal or adaptive associations.

      Thank you for this insight. We have reviewed our discussion and clarified that adaptations are putative (i.e. lines 146, 285, and 332), and highlighted this in our limitations section.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I would recommend always spelling out "Dehnel's phenomenon" or even replacing this term (after crediting the DP term) with the more informative "seasonal size plasticity". Every time I saw "DP", I had to remind myself what this referred to. If the authors choose not to do so, please use the acronym consistently (e.g. line 186 has it spelled out).

      We have replaced the acronym DP with either the full term or the more informative “seasonal size plasticity” throughout the text.

      (2) Line 202: "DEG" has not been defined. Simply add to the line before.

      Thank you for this attention to detail. We have added this to the line above (210).

      (3) Please add a reference for the "AnAge" tool that was used to determine if samples were pubescent.

      Thank you for identifying this oversight. We have now cited the proper paper in line 634.

      (4) In the BCL2L1 section in the results, add a callout to Figure 2D.

      We have now added a callout to Figure 2D within the results (line 234).

      Reviewer #2 (Recommendations for the authors):

      (1) Line 122: is associated? These adaptations?

      Thank you for identifying that we were missing the words “associated with” here. We have fixed this in the revision.

      (2) The first paragraph of the Results should be moved to the methods, except maybe the number of orthologs.

      Thank you for this insight. We have removed this portion from the results section.

      (3) Why a Bonferroni correction on line 188? That seems too strict.

      We agree the Bonferroni correction is strict. Results when using other less strict methods for controlling false discovery rate are also not significant after correction. These corrections can be found within the data, however, we only report on the Bonferroni correction.

      (4) Line 427: "is a novel candidate gene for several neurological disorders" needs some references. I see them a couple of sentences later, but that's quite a sentence with no references at the end.

      We have added the proper citations for this sentence (line 524).

      Reviewer #3 (Recommendations for the authors):

      (1) Temporal Expression - Figure 1 and Supplemental Figure 2 and associated text Line176-193:

      - The authors report the total number of genes meeting inclusion criteria (>0.5-fold change between any two stages and 2 samples >10 normalized reads), but it would be more informative to also provide the number of genes within each temporal cluster. This would offer a clearer understanding of how gene expression patterns are distributed over time.

      Unfortunately, this information is difficult to depict on our figure and would use too much space in the text. We have thus added a description of the range of genes in a new supplemental table depicting this information.

      - It is unclear whether quantitative criteria were used to distinguish "developmental shift" clusters from "season shift" clusters. A visual inspection of Supplemental Figure 2 suggests that some clusters (e.g., clusters 2, 8, and to a lesser extent 12) show seasonal variation, not just developmental differences between stages 1 and 2. While clustering helps to visualize expression patterns, it may not be the most appropriate filter in this case, particularly since all "season shift" clusters are later combined in KEGG pathway and GO analyses (Fig. 1B). Using a differential gene expression criterion might be more suitable. For example, do excluded genes show significant log-fold differences between late-stage comparisons?

      As previously mentioned, we have now quantified seasonal shifts as large differences in z-score (abs(summer juveniles-summer adults)>1.25) without meaningful interseason variations determined by a second local maximum (abs(autumn-winter)<0.5 and abs(winter-summer)<0.5)), and added it to our methods (lines 699-702).  We then follow this up with differential expression analyses as described in Figure 2.

      - Did the authors perform cluster-specific GO or KEGG pathway enrichment analyses instead of focusing on the combined set of genes across the season shift clusters? While I understand that the small number of genes in each cluster may be limiting, if pathways emerge from cluster-specific analysis, they could provide more detailed insights into the functional significance of these temporal expression patterns. The current analysis picks up relevant pathways for hypothalamic control of homeostasis, which is a useful validation, but this approach might not fully address the study's key hypotheses. Additionally, no corrections for multiple hypothesis testing were applied, as noted in the results. A more refined gene set (e.g., using differential expression criteria, described above) could be more appropriate for these analyses.

      We have now included cluster-specific KEGG enrichments as previously described.

      (2) Differential expression between shrinkage (stage 2) and regrowth (stage 4) and cell culture targets - Figure 2 and lines195-227:

      - The rationale for selecting BCL2L1 for cell culture experiments should be clarified. While it is part of the apoptosis pathway, several other apoptosis-related genes were identified in the differential gene expression (DGE) analysis, some showing stronger differential expression or shrew-specific branch shifts. Why was BCL2L1 prioritized over these other candidates?

      We have now included the reasoning for further validation of BCL2L1 as described above.

      - The relevance of the "higher degree" differentially expressed genes needs more explanation. Although this group of genes is highlighted in the results, they are not featured in any subsequent analyses, leaving their importance unclear.

      Thank you for this insight. We have removed this from the methods as it is not relevant to subsequent analyses or conclusions.

      - The authors mention maintaining (or at least attempting to maintain) a 1:1 sex ratio for the comparative analysis (Line 525), but it is unclear if this was also done for the S. araneus analysis. If so, was sex included as a covariate (e.g., a random effect) in the differential expression analysis?

      We have now incorporated information on sex as described above.

      (3) Discussion:

      The term "adaptive" is used frequently and liberally throughout the discussion, but the authors should be cautious in interpreting seasonal changes in gene expression as indicators of adaptive evolution. Such changes do not necessarily imply causal or adaptive associations, and this distinction should be clearly stated when discussing the results.

      Thank you for this feedback and we agree with your conclusion, while a second expression optimum in the shrew lineage is indicative of adaptive expression, we cannot fully determine whether these are caused by genetic or environmental factors, despite careful attention to experimental design. We have highlighted this as a limitation in the discussion.

      (4) Minor Editorial Comment:

      Line 105: "... maintenance of an energy budgets..." delete "an"

      We have removed this grammatical error.

    1. Author response:

      Reviewer #1:

      Strengths:

      (1) Using a fairly generic ecological model, the method can identify the change in the relative importance of different ecological forces (distribution of interspecies interactions, demographic noise, and immigration) in different sample groups. The authors focus on the case of the human gut microbiota, showing that the data are consistent with a higher influence of species interactions (relative to demographic noise and immigration) in a disease microbiota state than in healthy ones. (2) The method is novel, original, and it improves the state-of-the-art methodology for the inference of ecologically relevant parameters. The analysis provides solid evidence for the conclusions. 

      Weaknesses:

      In the way it is written, this work might be mostly read by physicists. We believe that, with some rewriting, the authors could better highlight the ecological implications of the results and make the method more accessible to a broader audience.

      We thank the reviewer for their positive and constructive feedback. We particularly appreciate the recognition of the novelty and robustness of our method, as well as the insight that it sheds light on the shifting ecological forces between healthy and diseased microbiomes. In response to the concern about the manuscript’s accessibility, we aim to revise key sections – including the Introduction, Results, and Discussion – to more clearly articulate the ecological relevance of our theoretical findings. We would like to emphasize that our approach offers a novel perspective for analyzing individual species' abundances, as well as for understanding interaction patterns and stability at the community level. By placing our results within a broader context accessible to readers from diverse backgrounds, we aim for the revised version to appeal to a wider audience, including ecologists and microbiome scientists, while preserving the rigor of our underlying statistical physics framework.

      Reviewer #2:

      Strengths:

      A well-written article, relatively easy to follow and transparent despite the high degree of technicality of the underlying theory. The authors provide a powerful inferring procedure, which bypasses the issue of having only compositional data. 

      Weaknesses:

      (1) This sentence in the introduction seems key to me: "Focusing on single species properties as species abundance distribution (SAD), it fails to characterise altered states of microbiome." Yet it is not explained what is meant by 'fail', and thus what the proposed approach 'solves'. (2) Lack of validation, following arbitrary modelling choices made (symmetry of interactions, weak-interaction limit, uniform carrying capacity). Inconsistent interpretation of instability. Here, instability is associated with the transition to the marginal phase, which becomes chaotic when interaction symmetry is broken. But as the authors acknowledge, the weak interaction limit does not reproduce fat-tailed abundance distributions found in data. On the other hand, strong interaction regimes, where chaos prevails, tend to do so (Mallmin et al, PNAS 2024). Thus, the nature of the instability towards which unhealthy microbiomes approach is unclear. (3) Three technical points about the methodology and interpretation. a) How can order parameters ℎ and 𝑞0 can be inferred, if in the compositional data they are fixed by definition? b) How is it possible that weaker interaction variance is associated with an approach to instability, when the opposite is usually true? c) Having an idea of what the empirical data compares to the theoretical fits would be valuable. Implications: As the authors say, this is a proof of concept. They point at limits and ways to go forward, in particular pointing at ways in which species abundance distributions could be better reproduced by the predicted dynamical models. One implication that is missing, in my opinion, is the interpretability of the results, and what this work achieves that was missing from other approaches (see weaknesses section above): what do we learn from the fact that changes in microbial interactions characterise healthy from unhealthy microbiota? For instance, what does this mean for medical research?

      We greatly appreciate the reviewer’s thoughtful analysis highlighting both the strengths and areas of ambiguity in our work.

      (1) To clarify the sentence on the limitations of species abundance distributions (SADs), we aim to explain in the revised version that while SADs summarize the relative abundance of individual species, they fail to capture the species-species correlations that we have shown (Seppi et al., Biomolecules 2023) to be more susceptible to the healthy state of the host. Our method thus focused on the interaction statistics among species, providing insights into underlying dynamics and stability of the microbiomes and their differences between healthy and unhealthy hosts.

      (2) Regarding model assumptions, we acknowledge that the weak interaction regime and symmetry hypotheses simplify the analysis and may not capture all empirical richness, such as fat-tailed distributions of species abundance. However, we interpret instability not as a path to chaos per se, but as a transition toward a multi-attractor phase, where each microbiome reaches a different fixed point. This is consistent with prior empirical findings invoking the “Anna Karenina principle”, where healthy microbiomes resemble one another, but disease states tend to deviate from this picture (see Pasqualini et al., PLOS Comp. Bio. 2024). We consider our framework as a starting point and agree that further extensions incorporating strong interaction regimes (as suggested by Mallmin et al., PNAS 2024) or relaxing other model assumptions could reveal even richer dynamical patterns. The computational pipeline we present can be, in fact, easily generalizable to include different population dynamics models.

      On the technical questions: (a) While compositional data constrain relative abundances, we can still estimate diversity-dependent parameters (h and q0) using alpha-diversity statistics across samples, which show meaningful variation; (b) The counter-intuitive instability that the reviewer pointed out arises from the interplay between demographic stochasticity and quenched disorder. It is the combined contribution of these two factors in phase space – not either one alone – that drives the transition. For clarity, see Figure 1 in Altieri et al., Phys. Rev. Lett. 2021; (c) We plan to include plots that compare empirical data to theoretical model fits. This will help visualize how well the model captures observed microbial community properties demographic noise (𝑇), healthy communities are more stable (i.e., distantσ from the and how even with larger species interaction heterogeneity (σ) and larger critical line), as measured, by the replicon eigenvalue. Finally, regarding interpretability and implications: by showing that ecological interaction networks – not just species identities – differ between healthy and unhealthy states, our work suggests a conceptual shift. This could inform medical strategies aimed at restoring community-level stability rather than targeting individual microbes. In the revised Discussion section, we will elaborate on this point to better highlight its practical implications and outline potential directions for future research.

      Reviewer #3:

      Strengths:

      The modeling efforts of this study primarily rely on a disordered form of the generalized Lotka-Volterra (gLV) model. This model can be appropriate for investigating certain systems, and the authors are clear about when and how more mechanistic models (i.e., consumer-resource) can lead to gLV. Phenomenological models such as this have been found to be highly useful for investigating the ecology of microbiomes, so this modeling choice seems justified, and the limitations are laid out. 

      Weaknesses:

      The authors use metagenomic data of diseased and healthy patients that were first processed in Pasqualini et al. (2024). The use of metagenomic data leads me to a question regarding the role of sampling effort (i.e., read counts) in shaping model parameters such as h. This parameter is equal to the average of 1/# species across samples because the data are compositional in nature. My understanding is that it was calculated using total abundances (i.e., read counts). The number of observed species is strongly influenced by sampling effort, so it would be useful if the number of reads were plotted against the number of species for healthy and diseased subjects. However, the role of sampling effort can depend on the type of data, and my instinct about the role that sampling effort plays in species detection is primarily based on 16S data. The dependency between these two variables may be less severe for the authors' metagenomic pipeline. This potential discrepancy raises a broader issue regarding the investigation of microbial macroecological patterns and the inference of ecological parameters. Often microbial macroecology researchers rely on 16S rRNA amplicon data because that type of data is abundant and comparatively low-cost. Some in microbiology and bioinformatics are increasingly pushing researchers to choose metagenomics over 16S. Sometimes this choice is valid (discovery of new MAGs, investigate allele frequency changes within species, etc.), sometimes it is driven by the false equivalence "more data = better". The outcome, though, is that we have a body of more-or-less established microbial macroecological patterns which rest on 16S data and are now slowly incorporating results from metagenomics. To my knowledge, there has not been a systematic evaluation of the macroecological patterns that do and do not vary by one's choice in 16S vs. metagenomics. Several of the authors in this manuscript have previously compared the MAD shape for 16S and metagenomic datasets in Pasqualini et al., but moving forward, a more comprehensive study seems necessary.

      We thank the reviewer for this insightful and nuanced comment, which particularly highlights the broader methodological context of our data sources. Indeed, metagenomic sequencing introduces different biases with respect to 16S data. First, we would like to emphasize that we estimated the order parameters from the data by using relative abundances. Second, while the concern regarding the influence of sequencing depth and species diversity on the estimation of the order parameters is valid, we refer to a previous publication by some of the authors (Pasqualini et al., 2024; see Figure 4, panels g and h). There, we pointed out that the observed outcome is weakly influenced by sequencing depth in our dataset, while the main impact on the order parameters estimate comes from the species diversity of the two groups. In the same publication, we showed that other well-known patterns (species abundance distribution, mean abundance distribution) are also observed. Also, to mitigate the effect of the number of samples and sequencing depth, we estimated the order parameters by a bootstrap procedure (90% of samples for healthy and diseased groups, 5000 resamples), which resulted in the error bars in Figure 2.

      We also fully agree with the broader call for a systematic comparison of macroecological patterns derived from 16S and metagenomic data. While some of us have already begun exploring this direction (e.g., Pasqualini et al., 2024), the reviewer’s comment highlights its significance and motivates us to pursue a more comprehensive, integrative analysis across data types. While we found qualitative agreement of these patterns with previous publications (e.g., Grilli, Nature Comm. 2020), we will acknowledge this as an important future direction in the Discussion section.

      References

      (1) Seppi, M., Pasqualini, J., Facchin, S., Savarino, E.V. and Suweis, S., 2023. Emergent functional organization of gut microbiomes in health and diseases. Biomolecules, 14(1), p.5.

      (2) Pasqualini, J., Facchin, S., Rinaldo, A., Maritan, A., Savarino, E. and Suweis, S., 2024. Emergent ecological patterns and modelling of gut microbiomes in health and in disease. PLOS Computational Biology, 20(9), p.e1012482.

      (3) Mallmin, E., Traulsen, A. and De Monte, S., 2024. Chaotic turnover of rare and abundant species in a strongly interacting model community. Proceedings of the National Academy of Sciences, 121(11), p.e2312822121.

      (4) Altieri, A., Roy, F., Cammarota, C., & Biroli, G. (2021). Properties of equilibria and glassy phases of the random Lotka-Volterra model with demographic noise. Physical Review Letters, 126(25), 258301.

      (5) Grilli, J. (2020). Macroecological laws describe variation and diversity in microbial communities. Nature communications, 11(1), 4743.

    1. Author response:

      Reviewer 1:

      (1) Clarification of axon mistargeting patterns and model interpretation

      We will clarify the apparent discrepancy between chick and mouse axon mistargeting data. Specifically, we will expand the explanation in the main text and Figure 7 legend and/or revise the model in Figure 7 to better reflect observed phenotypes and clarify how Sp1 overexpression contributes to mistargeting.

      (2) Evidence for Sp1-dependent ephrin expression

      We agree that demonstrating ephrin expression changes in motor neurons is essential. We will: • Conduct in situ hybridization and/or immunostaining for ephrins in control and Sp1 mutant spinal cords from both chick and mouse embryos.

      Clarify and expand the methodological details of the NSC-34 cell experiments shown in Figure 4G.

      (3) RNA-seq experiment details

      We will revise the Methods section to provide additional experimental details.

      (4) Use of Syn1-cre

      We acknowledge concerns about the broad expression of Syn1-cre. To address this:

      We will clarify our rationale for using Syn1-cre and describe its expression pattern in the spinal cord.

      We are evaluating the feasibility of additional experiments using a motor neuron-specific Cre driver to confirm cell-type specificity.

      We will include a new paragraph in the Discussion addressing potential contributions from other neuronal populations.

      Reviewer 2:

      (1) & (2) Clarification and localization of RNA-seq data

      We will expand the Methods section to provide greater detail on the RNA-seq approach. In addition, we will validate ephrin downregulation in LMC neurons using in situ hybridization and/or immunostaining.

      (3) Integration of ChIP and RNA-seq data We will:

      Report additional ChIP peaks for ephrinA5 and other differentially expressed genes such as Sema7a.

      Add a summary figure that integrates ChIP and RNA-seq results to strengthen the link between Sp1 binding and transcriptional regulation.

      (4) Clarification of the cis-attenuation model

      We recognize that our data do not yet directly demonstrate Sp1’s role in cis-attenuation. To address this:

      We will revise the abstract and main text to frame Sp1's role in cis-attenuation as a hypothesis. • We are exploring the feasibility of ephrinA5 and B2 rescue experiments in Sp1-deficient embryos to test specificity.

      (5) Behavioral phenotypes and cell-type specificity

      We will clarify that behavioral phenotypes may result from combined effects across neuron populations due to Syn1-cre expression. To address this:

      We are planning rescue experiments with Sp1 expression in chick embryos to test for rescue of axon misrouting.

      We will include a new paragraph in the Discussion to highlight this limitation and discuss alternative interpretations.

      Reviewer 3:

      We appreciate your positive evaluation and support for the rigor of our study.

      In response to your suggestions:

      We are revising the manuscript to improve clarity and flow, particularly the transitions between datasets.

      We will update Figure 7 and the associated text to more clearly convey the working model and avoid overinterpretation.

      We thank all reviewers for their constructive feedback and are committed to addressing each point thoroughly. All revisions will be clearly marked in the resubmitted manuscript.

    1. Author response:

      (This author response relates to the first round of peer review by Biophysics Colab. Reviews and responses to both rounds of review are available here: https://sciety.org/articles/activity/10.1101/2023.10.23.563601.)

      General Assessment:

      Pannexin (Panx) hemichannels are a family of heptameric membrane proteins that form pores in the plasma membrane through which ions and relatively large organic molecules can permeate. ATP release through Panx channels during the process of apoptosis is one established biological role of these proteins in the immune system, but they are widely expressed in many cells throughout the body, including the nervous system, and likely play many interesting and important roles that are yet to be defined. Although several structures have now been solved of different Panx subtypes from different species, their biophysical mechanisms remain poorly understood, including what physiological signals control their activation. Electrophysiological measurements of ionic currents flowing in response to Panx channel activation have shown that some subtypes can be activated by strong membrane depolarization or caspase cleavage of the C-terminus. Here, Henze and colleagues set out to identify endogenous activators of Panx channels, focusing on the Panx1 and Panx2 subtypes, by fractionating mouse liver extracts and screening for activation of Panx channels expressed in mammalian cells using whole-cell patch clamp recordings. The authors present a comprehensive examination with robust methodologies and supporting data that demonstrate that lysophospholipids (LPCs) directly Panx-1 and 2 channels. These methodologies include channel mutagenesis, electrophysiology, ATP release and fluorescence assays, molecular modelling, and cryogenic electron microscopy (cryo-EM). Mouse liver extracts were initially used to identify LPC activators, but the authors go on to individually evaluate many different types of LPCs to determine those that are more specific for Panx channel activation. Importantly, the enzymes that endogenously regulate the production of these LPCs were also assessed along with other by-products that were shown not to promote pannexin channel activation. In addition, the authors used synovial fluid from canine patients, which is enriched in LPCs, to highlight the importance of the findings in pathology. Overall, we think this is likely to be a landmark study because it provides strong evidence that LPCs can function as activators of Panx1 and Panx2 channels, linking two established mediators of inflammatory responses and opening an entirely new area for exploring the biological roles of Panx channels. Although the mechanism of LPC activation of Panx channels remains unresolved, this study provides an excellent foundation for future studies and importantly provides clinical relevance.

      We thank the reviewers for their time and effort in reviewing our manuscript. Based on their valuable comments and suggestions, we have made substantial revisions. The updated manuscript now includes two new experiments supporting that lysophospholipid-triggered channel activation promotes the release of signaling molecules critical for immune response and demonstrates that this novel class of agonist activates the inflammasome in human macrophages through endogenously expressed Panx1. To better highlight the significance of our findings, we have excluded the cryo-EM panel from this manuscript. We believe these changes address the main concerns raised by the reviewers and enhance the overall clarity and impact of our findings. Below, we provide a point-by-point response to each of the reviewers’ comments.

      Recommendations:

      (1) The authors present a tremendous amount of data using different approaches, cells and assays along with a written presentation that is quite abbreviated, which may make comprehension challenging for some readers. We would encourage the authors to expand the written presentation to more fully describe the experiments that were done and how the data were analysed so that the 2 key conclusions can be more fully appreciated by readers. A lot of data is also presented in supplemental figures that could be brought into the main figures and more thoroughly presented and discussed.

      We appreciate and agree with the reviewers’ observation. Our initial manuscript may have been challenging to follow due to our use of both wild-type and GS-tagged versions of Panx1 from human and frog origins, combined with different fluorescence techniques across cell types. In this revision, we used only human wild-type Panx1 expressed in HEK293S GnTI- cells, except for activity-guided fractionation experiments, where we used GS-tagged Panx1 expressed in HEK293 cells (Fig. 1). For functional reconstitution studies, we employed YO-PRO-1 uptake assays, as optimizing the Venus-based assay was challenging. We have clarified these exceptions in the main text. We think these adjustments simplify the narrative and ensure an appropriate balance between main and supplemental figures.

      (2) It would also be useful to present data on the ion selectivity of Panx channels activated by LPC. How does this compare to data obtained when the channel is activated by depolarization? If the two stimuli activate related open states then the ion selectivity may be quite similar, but perhaps not if the two stimuli activate different open states. The authors earlier work in eLife shows interesting shifts in reversal potentials (Vrev) when substituting external chloride with gluconate but not when substituting external sodium with N-methyl-D-glucamine, and these changed with mutations within the external pore of Panx channels. Related measurements comparing channels activated by LPC with membrane depolarization would be valuable for assessing whether similar or distinct open states are activated by LPC and voltage. It would be ideal to make Vrev measurements using a fixed step depolarization to open the channel and then various steps to more negative voltages to measure tail currents in pinpointing Vrev (a so called instantaneous IV).

      We fully agree with the reviewer on the importance of ion selectivity experiments. However, comparing the properties of LPC-activated channels with those activated by membrane depolarization presented technical challenges, as LPC appears to stimulate Panx1 in synergy with voltage. Prolonged LPC exposure destabilizes patches, complicating G-V curve acquisition and kinetic analyses. While such experiments could provide mechanistic insights, we think they are beyond the scope of current study.

      (3) Data is presented for expression of Panx channels in different cell types (HEK vs HEKS GnTI-) and different constructs (Panx1 vs Panx1-GS vs other engineered constructs). The authors have tried to be clear about what was done in each experiment, but it can be challenging for the reader to keep everything straight. The labelling in Fig 1E helps a lot, and we encourage the authors to use that approach systematically throughout. It would also help to clearly identify the cell type and channel construct whenever showing traces, like those in Fig 1D. Doing this systematically throughout all the figures would also make it clear where a control is missing. For example, if labelling for the type of cell was included in Fig 1D it would be immediately clear that a GnTI- vector alone control for WT Panx1 is missing as the vector control shown is for HEK cells and formally that is only a control for Panx2 and 3. Can the authors explain why PLC activates Panx1 overexpressed in HEK293 GnTl- cells but not in HEK293 cells? Is this purely a function of expression levels? If so, it would be good to provide that supporting information.

      As mentioned above, we believe our revised version is more straightforward to digest. We have improved labeling and provided explanations where necessary to clarify the manuscript. While Panx1 expression levels are indeed higher in GnTI- than in HEK293 cells, we are uncertain whether the absence of detectable currents in HEK293 cells is solely due to expression levels. Some post-translational modifications that inhibit Panx1, such as lysine acetylation, may also impact activity. Future studies are needed to explore these mechanisms further.

      (4) The mVenus quenching experiments are somewhat confusing in the way data are presented. In Fig 2B the y axis is labelled fluorescence (%) but when the channel is closed at time = 0 the value of fluorescence is 0 rather than 100 %, and as the channel opens when LPC is added the values grow towards 100 instead of towards 0 as iodide permeates and quenches. It would be helpful if these types of data could be presented more intuitively. Also, how was the initial rate calculated that is plotted in Fig 2C? It would be helpful to show how this is done in a figure panel somewhere. Why was the initial rate expressed as a percent maximum, what is the maximum and why are the values so low? Why is the effect of CBX so weak in these quenching experiments with Panx1 compared to other assays? This assay is used in a lot of experiments so anything that could be done to bolster confidence is what it reports on would be valuable to readers. Bringing in as many control experiments that have been done, including any that are already published, would be helpful.

      We modified the Y-axis in Figure 2 to “Quench (%)” for clarity. The data reflects fluorescence reduction over time, starting from LPC addition, normalized to the maximal decrease observed after Triton-X100 addition (3 minutes), enabling consistent quenching value comparisons. Although the quenching value appears small, normalization against complete cell solubilization provides reproducible comparisons. We do not fully understand why CBX effects vary in Venus quenching experiments, but we speculate that its steroid-like pentacyclic structure may influence the lysophospholipid agonistic effects. As noted in prior studies (DOI: 10.1085/jgp.201511505; DOI: 10.7554/eLife.54670), CBX likely acts as an allosteric modulator rather than a simple pore blocker, potentially contributing to these variations.

      (5) Could provide more information to help rationalize how Yo-Pro-1, which has a charge of +2, can permeate what are thought to be anion favouring Panx channels? We appreciate that the biophysical properties of Panx channel remain mysterious, but it would help to hear how a bit more about the authors thinking. It might also help to cite other papers that have measured Yo-Pro-1 uptake through Panx channels. Was the Strep-tagged construct of Panx1 expressed in GnTI- cells and shown to be functional using electrophysiology?

      Our recent study suggest that the electrostatic landscape along the permeation pathway may influence its ion selectivity (DOI: 10.1101/2024.06.13.598903). However, we have not yet fully elucidated how Panx1 permeates both anions and cations. Based on our findings, ion selectivity may vary with activation stimulus intensity and duration. Cation permeation through Panx1 is often demonstrated with YO-PRO-1, which measures uptake over minutes, unlike electrophysiological measurements conducted over milliseconds to seconds. We referenced two representative studies employing YO-PRO-1 to assess Panx1 activity. Whole-cell current measurements from a similar construct with an intracellular loop insertion indicate that our STREP-tagged construct likely retains functional capacity.

      (6) In Fig 5 panel C, data is presented as the ratio of LPC induced current at -60 mV to that measured at +110 mV in the absence of LPC. What is the rationale for analysing the data this way? It would be helpful to also plot the two values separately for all of the constructs presented so the reader can see whether any of the mutants disproportionately alter LPC induced current relative to depolarization activated current. Also, for all currents shown in the figures, the authors should include a dashed coloured line at zero current, both for the LPC activated currents and the voltage steps.

      We used the ratio of LPC-induced current to the current measured at +110 mV to determine whether any of the mutants disproportionately affect LPC-induced current relative to depolarization-activated current. Since the mutants that did not respond to LPC also exhibited smaller voltage-stimulated currents than those that did respond, we reasoned that using this ratio would better capture the information the reviewer is suggesting to gauge. Showing the zero current level may be helpful if the goal was to compare basal currents, which in our experience vary significantly from patch to patch. However, since we are comparing LPC- and voltage-induced currents within the same patch, we believe that including basal current measurements would not add useful information to our study.

      Given that new experiments included to further highlight the significance of the discovery of Panx1 agonists, we opted to separate structure-based mechanistic studies from this manuscript and removed this experiment along with the docking and cryo-EM studies.

      (7) The fragmented NTD density shown in Fig S8 panel A may resemble either lipid density or the average density of both NTD and lipid. For example, Class7 and Class8 in Fig.S8 panel D displayed split densities, which may resemble a phosphate head group and two tails of lipid. A protomer mask may not be the ideal approach to separate different classes of NTD because as shown in Fig S8 panel D, most high-resolution features are located on TM1-4, suggesting that the classification was focused on TM1-4. A more suitable approach would involve using a smaller mask including NTD, TM1, and the neighbouring TM2 region to separate different NTD classes.

      We agree with the reviewer and attempted 3D classification using multiple smaller masks including the suggested region. However, the maps remained poorly defined, and we were unable to confidently assign the NTD.

      (8) The authors don’t discuss whether the LPC-bound structures display changes in the external part of the pore, which is the anion-selective filter and the narrower part of the pore. If there are no conformational changes there, then the present structures cannot explain permeability to large molecules like ATP. In this context, a plot for the pore dimension will be helpful to see differences along the pore between their different structures. It would also be clearer if the authors overlaid maps of protomers to illustrate differences at the NTD and the "selectivity filter."

      Both maps show that the narrowest constriction, formed by W74, has a diameter of approximately 9 Å. Previous steered molecular dynamics simulations suggest that ATP can permeate through such a constriction, implying an ion selection mechanism distinct from a simple steric barrier.

      (9) The time between the addition of LPC to the nanodisc-reconstituted protein and grid preparation is not mentioned. Dynamic diffusion of LPC could result in equal probabilities for the bound and unbound forms. This raises the possibility of finding the Primed state in the LPC-bound state as well. Additionally, can the authors rationalize how LPC might reach the pore region when the channel is in the closed state before the application of LPC?

      We appreciate the reviewer’s insight. We incubated LPC and nanodisc-reconstituted protein for 30 minutes, speculating that LPC approaches the pore similarly to other lipids in prior structures. In separate studies, we are optimizing conditions to capture more defined conformations.

      (10) In the cryo-EM map of the “resting” state (EMDB-21150), a part of the density was interpreted as NTD flipped to the intracellular side. This density, however, is poorly defined, and not connected to the S1 helix, raising concerns about whether this density corresponds to the NTD as seen in the “resting” state structure (PDB-ID: 6VD7). In addition, some residues in the C-terminus (after K333 in frog PANX1) are missing from the atomic model. Some of these residues are predicted by AlphaFold2 to form a short alpha helix and are shown to form a short alpha helix in some published PANX1 structures. Interestingly, in both the AF2 model and 6WBF, this short alpha helix is located approximately in the weak density that the authors suggest represents the “flipped” NTD. We encourage the authors to be cautious in interpreting this part as the “flipped” NTD without further validation or justification.

      We agree that the density corresponding the extended NTD into the cytoplasm is relatively weak. In our recent study, we compared two Panx1 structures with or without the mentioned C-terminal helix and found evidence suggesting the likelihood of NTD extension (DOI: 10.1101/2024.06.13.598903). Nevertheless, to prevent potential confusion, we have removed the cryo-EM panel from this manuscript.

      (11) Since the authors did not observe densities of bound PLC in the cryo-EM map, it is important to acknowledge in the text the inherent limitations of using docking and mutagenesis methods to locate where PLC binds.

      Thank you for the suggestion. We have removed this section to avoid potential confusion.

      Optional suggestions:

      (1) The authors used MeOH to extract mouse liver for reversed-phase chromatography. Was the study designed to focus on hydrophobic compounds that likely bind to the TMD? Panx1 has both ECD and ICD with substantial sizes that could interact with water soluble compounds? Also, the use of whole-cell recordings to screen fractions would not likely identify polar compounds that interact with the cytoplasmic part of the TMD? It would be useful for the authors to comment on these aspects of their screen and provide their rationale for fractionating liver rather than other tissues.

      We have added a rationale in line 90, stating: “The soluble fractions were excluded from this study, as the most polar fraction induced strong channel activities in the absence of exogenously expressed pannexins.” Additionally, we have included a figure to support this rationale (Fig. S1A).

      (2) The authors show that LPCs reversibly increase inward currents at a holding voltage of -60 mV (not always specified in legends) in cells expressing Panx1 and 2, and then show families of currents activated by depolarizing voltage steps in the absence of LPC without asking what happens when you depolarize the membrane after LPC activation? If LPCs can be applied for long enough without disrupting recordings, it would be valuable to obtain both I-V relations and G-V relations before and after LPC activation of Panx channels. Does LPC disproportionately increase current at some voltages compared to others? Is the outward rectification reduced by LPC? Does Vrev remain unchanged (see point above)? Its hard to predict what would be observed, but almost any outcome from these experiments would suggest additional experiments to explore the extent to which the open states activated by LPC and depolarization are similar or distinct.

      Unfortunately, in our hands, the prolonged application of lysolipids at concentrations necessary to achieve significant currents tends to destabilize the patch. This makes it challenging to obtain G-V curves or perform the previously mentioned kinetic analyses. We believe this destabilization may be due to lysolipids’ surfactant-like qualities, which can disrupt the giga seal. Additionally, prolonged exposure seems to cause channel desensitization, which could be another confounding factor.

      (3) From the results presented, the authors cannot rule out that mutagenesis-induced insensitivity of Panx channels to LPCs results from allosteric perturbations in the channels rather than direct binding/gating by LPCs. In Fig 5 panel A-C, the authors introduced double mutants on TM1 and TM2 to interfere with LPC binding, however, the double mutants may also disrupt the interaction network formed within NTD, TM1, and TM2. This disruption could potentially rearrange the conformation of NTD, favouring the resting closed state. Three double Asn mutants, which abolished LPC induced current, also exhibited lower currents through voltage activation in Fig 5S, raising the possibility the mutant channels fail to activate in response to LPC due to an increased energy barrier. One way to gain further insight would be to mutate residues in NTD that interact with those substituted by the three double Asn mutants and to measuring currents from both voltage activation and LPC activation. Such results might help to elucidate whether the three double Asn mutants interfere with LPC binding. It would also be important to show that the voltage-activated currents in Fig. S5 are sensitive to CBX?

      Thank you for the comment, with which we agree. Our initial intention was to use the mutagenesis studies to experimentally support the docking study. Due to uncertainties associated with the presented cryo-EM maps, we have decided to remove this study from the current manuscript. We will consider the proposed experiments in a future study.

      (4) Could the authors elaborate on how LPC opens Panx1 by altering the conformation of the NTDs in an uncoordinated manner, going from “primed” state to the “active” state. In the “primed” state, the NTDs seem to be ordered by forming interactions with the TMD, thus resulting in the largest (possible?) pore size around the NTDs. In contrast, in the “active” state, the authors suggest that the NTDs are fragmented as a result of uncoordinated rearrangement, which conceivably will lead to a reduction in pore size around NTDs (isn’t it?). It is therefore not intuitive to understand why a conformation with a smaller pore size represents an “active” state.

      We believe the uncoordinated arrangement of NTDs is dynamic, allowing for potential variations in pore size during the activated conformation. Alternatively, NTD movement may be coupled with conformational changes in TM1 and the extracellular domain, which in turn could alter the electrostatic properties of the permeation pathway. We believe a functional study exploring this mechanism would be more appropriately presented as a separate study.

      (5) Can the authors provide a positive control for these negative results presented in Fig S1B and C?

      The positive results are presented in Fig. 1D and E.

      (6) Raw images in Fig S6 and Fig S7 should contain units of measurement.

      Thank you for pointing this out.

      (7) It may be beneficial to show the superposition between primed state and activated state in both protomer and overall structure. In addition, superposition between primed state and PDB 7F8J.

      We attempted to superimpose the cryo-EM maps; however, visually highlighting the differences in figure format proved challenging. Higher-resolution maps would allow for model building, which would more effectively convey these distinctions.

      (8) Including particles number in each class in Fig S8 panel C and D would help in evaluating the quality of classification.

      Noted.

      (9) A table for cryo-EM statistics should be included.

      Thanks, noted.

      (10) n values are often provided as a range within legends but it would be better to provide individual values for each dataset. In many figures you can see most of the data points, which is great, but it would be easy to add n values to the plots themselves, perhaps in parentheses above the data points.

      While we agree that transparency is essential, adding n-values to each graph would make some figures less clear and potentially harder to interpret in this case. We believe that the dot plots, n-value range, and statistical analysis provide adequate support for our claims.

      (11) The way caspase activation of Panx channels is presented in the introduction could be viewed as dismissive or inflammatory for those who have studied that mechanism. We think the caspase activation literature is quite convincing and there is no need to be dismissive when pointing out that there are good reasons to believe that other mechanisms of activation likely exist. We encourage you to revise the introduction accordingly.

      Thank you for this comment. Although we intended to support the caspase activation mechanism in our introduction, we understand that the reviewer’s interpretation indicates a need for clarification. We hope the revised introduction removes any perception of dismissiveness.

      (12) Why is the patient data in Fig 4F normalized differently than everything else? Once the above issues with mVenus quenching data are clarified, it would be good to be systematic and use the same approach here.

      For Fig. 4F, we used a distinct normalization method to account for substantial day-to-day variation in experiments involving body fluids. Notably, we did not apply this normalization to other experimental panels due to their considerably lower day-to-day variation.

      (13) What was the rational for using the structure from ref 35 in the docking task?

      The docking task utilized the human orthologue with a flipped-up NTD. We believe that this flipped-up conformation is likely the active form that responds to lysolipids. As our functional experiments primarily use the human orthologue for biological relevance, this structure choice is consistent. Our docking data shows that LPC does not dock at this site when using a construct with the downward-flipped NTD.

      (14) Perhaps better to refer to double Asn ‘substitutions’ rather than as ‘mutations’ because that makes one think they are Asn in the wt protein.

      Done.

      (15) From Fig S1, we gather that Panx2 is much larger than Panx1 and 3. If that is the case, its worth noting that to readers somewhere.

      We have added the molecular weight of each subtype in the figure legend.

      (16) Please provide holding voltages and zero current levels in all figures presenting currents.

      We provided holding voltages. However, the zero current levels vary among the examples presented, making direct comparisons difficult. Since we are comparing currents with and without LPC, we believe that indicating zero current levels is unnecessary for this study.

      (17) While the authors successfully establish lysophospholipid-gating of Panx1 and Panx2, Panx3 appears unaffected. It may be advisable to be more specific in the title of the article.

      We are uncertain whether Panx3 is unaffected by lysophospholipids, as we have not observed activation of this subtype under any tested conditions.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This a comprehensive study that sheds light on how Wag31 functions and localises in mycobacterial cells. A clear link to interactions with CL is shown using a combination of microscopy in combination with fusion fluorescent constructs, and lipid specific dyes. Furthermore, studies using mutant versions of Wag31 shed light on the functionalities of each domain in the protein. My concerns/suggestions for the manuscript are minor:

      (1) Ln 130. A better clarification/discussion is required here. It is clear that both depletion and overexpression have an effect on levels of various lipids, but subsequent descriptions show that they affect different classes of lipids.

      We thank the reviewer for the comment. We have added a better clarification on this in the discussion of revised manuscript. The lipid classes that get impacted by the depletion of Wag31 vs overexpression are different. Wag31 is an adaptor protein that interacts with proteins of the ACCase complex (Meniche et al., 2014; Xu et al., 2014) that synthesize fatty acid precursors and regulate their activity (Habibi Arejan et al., 2022).

      The varied response on lipid homeostasis could be attributed to a change in the stoichiometry of these interactions of Wag31. While Wag31 depletion would prevent such interactions from occurring and might affect lipid synthesis that directly depends on Wag31-protein partner interactions, its overexpression would lead to promiscuous interactions and a change in the stoichiometry of native interactions that would ultimately modulate lipid synthesis pathways.

      (2) The pulldown assays results are interesting, but links are tentative.

      We thank the reviewer for the comment. The interactome of Wag31 was identified through the immunoprecipitation of FLAG-Wag31 complemented at an integrative locus in Wag31 mutant background to avoid overexpression artifacts. We used Msm::gfp expressing an integrative copy (at L5 locus) of FLAG-GFP as a control to subtract non-specific interactions. The experiment was performed in biological triplicates, and interactors that appeared in all replicates but not in the control were selected for further analysis. Although we identified more than 100 interactors of Wag31, we analyzed only the top 25 hits, with a PSM cut-off 18 and unique peptides5. Additionally, two of Wag31's established interactors, AccD5 and Rne, were among the top five hits, thus validating our data.

      As mentioned in line 139 of the previous version of the manuscript, we agree that the interactions can either be direct or through a third partner. The fact that we obtained known interactors of Wag31 makes us believe these interactions are genuine. Moreover, for validation, we performed pulldown experiments by mixing E. coli lysates expressing His-Wag31 full-length or truncated protein with M. smegmatis lysates expressing FLAG-tagged interacting proteins. The wash conditions used were quite stringent for these pull-down assays—the wash buffer contained 1% Triton X100 that eliminates all non-specific and indirect interactions. However, we agree that we cannot conclusively state that the interactions are direct without purifying the proteins and performing the experiment. As mentioned above, this caveat was stated in the previous version of the manuscript.

      (3) The authors may perhaps like to rephrase claims of effects lipid homeostasis, as my understanding is that lipid localisation rather than catabolism/breakdown is affected.

      We thank the reviewer for the comment. In this manuscript, we are trying to convey that Wag31 is a spatiotemporal regulator of lipid metabolism. It is a peripheral protein that is hooked to the membrane via Cardiolipin and forms a scaffold at the poles, which helps localize several enzymes involved in lipid metabolism.

      Homeostasis is the process by which an organism maintains a steady-state of balance and stability in response to changes. Depletion of Wag31 not only results in delocalisation of lipids in intracellular lipid inclusions but also leads to changes in the levels of various lipid classes. Advancement in the field of spatial biology underscores the importance of native localization of various biological molecules crucial for maintaining a steady-cell of the cell. Hence, we have used the word “homeostasis” to describe both the changes observed in lipid metabolism.

      Reviewer #2 (Public review):

      Summary:

      Kapoor et. al. investigated the role of the mycobacterial protein Wag31 in lipid and peptidoglycan synthesis and sought to delineate the role of the N- and C- terminal domains of Wag31. They demonstrated that modulating Wag31 levels influences lipid homeostasis in M. smegmatis and cardiolipin (CL) localisation in cells. Wag31 was found to preferentially bind CL-containing liposomes, and deleting the N-terminus of the protein significantly decreased this interaction. Novel interactions between Wag31 and proteins involved in lipid metabolism and cell wall synthesis were identified, suggesting that Wag31 recruits proteins to the intracellular membrane domain by direct interaction.

      Strengths:

      (1) The importance of Wag31 in maintaining lipid homeostasis is supported by several lines of evidence. (2) The interaction between Wag31 and cardiolipin, and the role of the N-terminus in this interaction was convincingly demonstrated.

      Weaknesses:

      (1) MS experiments provide some evidence for novel protein-protein interactions. However, the pulldown experiments lack a valid negative control.

      We thank the reviewer for the comment. We have included two non-interactors of Wag31 i.e. MmpL4 and MmpS5 which were not identified in our interactome database as negative controls in the experiment. As shown in Figure S3, we performed His pull-down experiments with both of them independently twice, each time with a positive control (known interactor of Wag31 (Msm2092)). Fig. S3b revised shows E. coli lysate expressing His-Wag31 which was incubated with Msm lysates expressing either FLAG tagged-MmpL4 or -MmpS5 or Msm2092 (revised Fig. S3c). The mixed lysates were pulled down with Cobalt beads that bind to the His-tagged protein and analysed using Western blot analysis by probing with anti-FLAG antibody (revised Fig. S3d.). The data presented confirms that the interactions validated through the pull down assay were indeed specific.

      (2) The role of the N-terminus in the protein-protein interaction has not been ruled out.

      We thank the reviewer for the comment. Wag31<sub>Msm</sub> is a 272 amino acids long protein. The Nterminal of Wag31, which houses the DivIVA-domain, comprises the first 60 amino acids. Previously, we attempted to express the N-terminal (60 aa long) and the C-terminal (212 aa long) truncated proteins in various mycobacterial shuttle vectors to perform MS/MS experiments. Despite numerous efforts, neither expressed with the N/C-terminal FLAG tag or no tag in episomal or integrative vectors due to instability of the protein. Eventually, we successfully expressed the C-terminal Wag31 with an N and Cterminal hexa-His tag. However, this expression was not sufficient or stable enough for us to perform Ni<sup>2+</sup>-affinity pull-down experiments for mass spectrometry. N-terminal of Wag31 could not be expressed in M. smegmatis even with N and C-terminal Hexa-His tags.

      To rule out the role of the N-terminal in mediating protein-protein interactions, we cloned the N-terminal of Wag31 that comprises the DivIVA-domain in pET28b vector (Fig. 7a revised). Subsequently, the truncated protein, hereafter called  Wag31<sub>∆C</sub>  flanked by 6X His tags at both the termini was expressed in E. coli and mixed with Msm lysates expressing interactors of Wag31 (Fig. 7b-c revised). Earlier experiments with Wag31<sub>∆1-60</sub or Wag31<sub>∆N</sub> (in the revised manuscript) were performed with MurG, SepIVA, Msm2092 and AccA3 (Fig. 7e-g). Thus, we used the same set of interactors to test our hypothesis. Briefly, His-  Wag31<sub>∆C</sub>  was mixed with Msm lysates expressing either FLAG-MurG, -SepIVA, -Msm2092 or -AccA3 and pull down experiments were performed as described previously. FLAGMmpS5, a non-interactor of Wag31 was used as a negative control. As shown in Fig. 7d revised, His-Wag31 could bind to all the four interactors whereas His- Wag31<sub>∆C</sub>  couldn’t, strengthening the conclusion that interactions of Wag31 with other proteins are mediated by its Cterminal. However, we can’t ignore the possibility of other interactors binding to the N-terminal of Wag31. Unfortunately, due to poor expression/instability of  Wag31<sub>∆C</sub>  in mycobacterial shuttle vectors, we are unable to perform a global interactome analysis of  Wag31<sub>∆C</sub>

      Reviewer #3 (Public review):

      Summary:

      This manuscript describes the characterization of mycobacterial cytoskeleton protein Wag31, examining its role in orchestrating protein-lipid and protein-protein interactions essential for mycobacterial survival. The most significant finding is that Wag31, which directs polar elongation and maintains the intracellular membrane domain, was revealed to have membrane tethering capabilities.

      Strengths:

      The authors provided a detailed analysis of Wag31 domain architecture, revealing distinct functional roles: the N-terminal domain facilitates lipid binding and membrane tethering, while the C-terminal domain mediates protein-protein interactions. Overall, this study offers a robust and new understanding of Wag31 function.

      Weaknesses:

      The following major concerns should be addressed.

      • Authors use 10-N-Nonyl-acridine orange (NAO) as a marker for cardiolipin localization. However, given that NAO is known to bind to various anionic phospholipids, how do the authors know that what they are seeing is specifically visualizing cardiolipin and not a different anionic phospholipid? For example, phosphatidylinositol is another abundant anionic phospholipid in mycobacterial plasma membrane.

      We thank the reviewer for the comment. Despite its promiscuous binding to other anionic phospholipids, 10-N-Nonyl-acridine orange is widely used to stain Cardiolipin and determine its localisation in bacterial cells and mitochondria of eukaryotes (Garcia Fernandez et al., 2004; Mileykovskaya & Dowhan, 2000; Renner & Weibel, 2011). This is because it has a stronger affinity for Cardiolipin than other anionic phospholipids with the affinity constant being 2 × 10<sup>6</sup> M−<sup>1</sup> for Cardiolipin association and 7 × 10<sup>4</sup> M−<sup>1</sup> for that of phosphatidylserine and phosphatidylinositol association (Petit et al., 1992). Additionally, there is not yet another stain available for detecting Cardiolipin. Our proteinlipid binding assays suggest that Wag31 preferentially binds to Cardiolipin over other anionic phospholipids (Fig. 4b), hence it is likely that the majority of redistribution of NAO fluorescence that we observe might be contributed by Cardiolipin mislocalization due to altered Wag31 levels, with smaller degree of NAO redistribution intensity coming indirectly from other anionic phospholipids displaced from the membrane due to the loss of membrane integrity and cell shape changes due to Wag31.

      • Authors' data show that the N-terminal region of Wag31 is important for membrane tethering. The authors' data also show that the N-terminal region is important for sustaining mycobacterial morphology. However, the authors' statement in Line 256 "These results highlight the importance of tethering for sustaining mycobacterial morphology and survival" requires additional proof. It remains possible that the N-terminal region has another unknown activity, and this yet-unknown activity rather than the membrane tethering activity drives the morphological maintenance. Similarly, the N-terminal region is important for lipid homeostasis, but the statement in Line 270, "the maintenance of lipid homeostasis by Wag31 is a consequence of its tethering activity" requires additional proof. The authors should tone down these overstatements or provide additional data to support their claims.

      We agree with the reviewer that there exists a possibility for another function of the N-terminal that may contribute to sustaining mycobacterial physiology and survival. We would revise our statements in the paper to reflect the data. Results shown suggest that the tethering activity of the Nterminal region may contribute to mycobacterial morphology and survival. However, additional functions of this region can’t be ruled out. Similarly, the maintenance of lipid homeostasis by Wag31 may be associated with its tethering activity, although other mechanisms could also contribute to this process.

      • Authors suggest that Wag31 acts as a scaffold for the IMD (Fig. 8). However, Meniche et. al. has shown that MurG as well as GlfT2, two well-characterized IMD proteins, do not colocalize with Wag31 (DivIVA) (https://doi.org/10.1073/pnas.1402158111). IMD proteins are always slightly subpolar while Wag31 is located to the tip of the cell. Therefore, the authors' biochemical data cannot be easily reconciled with microscopic observations in the literature. This raises a question regarding the validity of protein-protein interaction shown in Figure 7. Since this pull-down assay was conducted by mixing E. coli lysate expressing Wag31 and Msm lysate expression Wag31 interactors like MurG, it is possible that the interactions are not direct. Authors should interpret their data more cautiously. If authors cannot provide additional data and sufficient justifications, they should avoid proposing a confusing model like Figure 8 that contradicts published observations.

      In the literature, MurG and GlfT2 have been shown to have polar localisation (Freeman et al., 2023; Hayashi et al., 2016; Kado et al., 2023) and two groups have shown slightly sub-polar localisation of MurG (García-Heredia et al., 2021; Meniche et al., 2014). Additionally, (Freeman et al., 2023) showed SepIVA to be a spatio-temporal regulator of MurG. MS/MS analysis of Wag31 immunoprecipitation data yielded both MurG and SepIVA to be interactors of Wag31 (Fig. 3). Given Wag31 also displays polar localisation, it is likely that it associates with the polar MurG. However, since a sub-polar localisation of MurG has also been reported, it is possible that they do not interact directly and another protein mediates their interaction. Based on the above, we will modify the model proposed in Fig. 8.

      We agree that for validation of interaction, we performed pulldown experiments by mixing E. coli lysates expressing His-Wag31 full-length or truncated protein with M. smegmatis lysates expressing FLAG-tagged interacting proteins. The wash conditions used were quite stringent for these pull-down assays—the wash buffer contained 1% Triton X100 that eliminates all non-specific and indirect interactions. However, we agree that we cannot conclusively state that the interactions are direct without purifying the proteins and performing the experiment. We will describe this caveat in the revised manuscript and propose a model that reflects the results we obtained.

      References:

      Freeman, A. H., Tembiwa, K., Brenner, J. R., Chase, M. R., Fortune, S. M., Morita, Y. S., & Boutte, C. C. (2023). Arginine methylation sites on SepIVA help balance elongation and septation in Mycobacterium smegmatis. Mol Microbiol, 119(2), 208-223. https://doi.org/10.1111/mmi.15006

      Garcia Fernandez, M. I., Ceccarelli, D., & Muscatello, U. (2004). Use of the fluorescent dye 10-N-nonyl acridine orange in quantitative and location assays of cardiolipin: a study on different experimental models. Anal Biochem, 328(2), 174-180. https://doi.org/10.1016/j.ab.2004.01.020

      García-Heredia, A., Kado, T., Sein, C. E., Puffal, J., Osman, S. H., Judd, J., Gray, T. A., Morita, Y. S., & Siegrist, M. S. (2021). Membrane-partitioned cell wall synthesis in mycobacteria. eLife, 10. https://doi.org/10.7554/eLife.60263

      Habibi Arejan, N., Ensinck, D., Diacovich, L., Patel, P. B., Quintanilla, S. Y., Emami Saleh, A., Gramajo, H., & Boutte, C. C. (2022). Polar protein Wag31 both activates and inhibits cell wall metabolism at the poles and septum. Front Microbiol, 13, 1085918. https://doi.org/10.3389/fmicb.2022.1085918

      Hayashi, J. M., Luo, C. Y., Mayfield, J. A., Hsu, T., Fukuda, T., Walfield, A. L., Giffen, S. R., Leszyk, J. D., Baer, C. E., Bennion, O. T., Madduri, A., Shaffer, S. A., Aldridge, B. B., Sassetti, C. M., Sandler, S. J., Kinoshita, T., Moody, D. B., & Morita, Y. S. (2016). Spatially distinct and metabolically active membrane domain in mycobacteria. Proc Natl Acad Sci U S A, 113(19), 5400-5405. https://doi.org/10.1073/pnas.1525165113

      Kado, T., Akbary, Z., Motooka, D., Sparks, I. L., Melzer, E. S., Nakamura, S., Rojas, E. R., Morita, Y. S., & Siegrist, M. S. (2023). A cell wall synthase accelerates plasma membrane partitioning in mycobacteria. eLife, 12, e81924. https://doi.org/10.7554/eLife.81924

      Meniche, X., Otten, R., Siegrist, M. S., Baer, C. E., Murphy, K. C., Bertozzi, C. R., & Sassetti, C. M. (2014). Subpolar addition of new cell wall is directed by DivIVA in mycobacteria. Proc Natl Acad Sci U S A, 111(31), E32433251. https://doi.org/10.1073/pnas.1402158111

      Mileykovskaya, E., & Dowhan, W. (2000). Visualization of phospholipid domains in Escherichia coli by using the cardiolipin-specific fluorescent dye 10-N-nonyl acridine orange. J Bacteriol, 182(4), 1172-1175. https://doi.org/10.1128/JB.182.4.1172-1175.2000

      Petit, J. M., Maftah, A., Ratinaud, M. H., & Julien, R. (1992). 10N-nonyl acridine orange interacts with cardiolipin and allows the quantification of this phospholipid in isolated mitochondria. Eur J Biochem, 209(1), 267273. https://doi.org/10.1111/j.1432-1033.1992.tb17285.x

      Renner, L. D., & Weibel, D. B. (2011). Cardiolipin microdomains localize to negatively curved regions of Escherichia coli membranes. Proc Natl Acad Sci U S A, 108(15), 6264-6269. https://doi.org/10.1073/pnas.1015757108

      Schägger, H. (2006). Tricine-SDS-PAGE. Nat Protoc, 1(1), 16-22. https://doi.org/10.1038/nprot.2006.4

      Xu, W. X., Zhang, L., Mai, J. T., Peng, R. C., Yang, E. Z., Peng, C., & Wang, H. H. (2014). The Wag31 protein interacts with AccA3 and coordinates cell wall lipid permeability and lipophilic drug resistance in Mycobacterium smegmatis. Biochem Biophys Res Commun, 448(3), 255-260. https://doi.org/10.1016/j.bbrc.2014.04.116

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Ln 130. A better clarification/discussion is required here. It is clear that both depletion and overexpression have an effect in levels of various lipids, but subsequent descriptions show that they affect different classes of lipids.

      We thank the reviewer for the comment. We have included a clarification for this in the discussion section.

      (2) The pulldown assays results are interesting, but the links are tentative.

      We thank the reviewer for the comment. The interactome of Wag31 was identified through the immunoprecipitation of Flag-tagged Wag31 complemented at an integrative locus in Wag31 mutant background to avoid overexpression artifacts. We used Msm::gfp expressing an integrative copy (at L5 locus) of FLAG-GFP as a control to subtract non-specific interactions. The experiment was performed in biological triplicates, and interactors that appeared in all replicates were selected for further analysis. Although we identified more than 100 interactors of Wag31, we analyzed only the top 25 hits, with a PSM cut-off 18 and unique peptides5. Additionally, two of Wag31's established interactors, AccD5 and Rne, were among the top five hits, thus validating our data.

      Though we agree that the interactions can either be direct or through a third partner, the fact that we obtained known interactors of Wag31 makes us believe these interactions are genuine. Moreover, for validation, we performed pulldown experiments by mixing E. coli lysates expressing HisWag31 full-length or truncated protein with M. smegmatis lysates expressing FLAG-tagged interacting proteins. The wash conditions used were quite stringent for these pull-down assays—the wash buffer contained 1% Triton X100 that eliminates all non-specific and indirect interactions. However, we agree that we cannot conclusively state that the interactions are direct without purifying the proteins and performing the experiment. We will describe this caveat in the revised manuscript.

      (3) The authors may perhaps like to rephrase claims of effects lipid homeostasis, as my understanding is that lipid localisation rather than catabolism/breakdown is affected.

      We thank the reviewer for the comment. In this manuscript, we are trying to convey that Wag31 is a spatiotemporal regulator of lipid metabolism. It is a peripheral protein that is hooked to the membrane via Cardiolipin and forms a scaffold at the poles, which helps localize several enzymes involved in lipid metabolism.

      Homeostasis is the process by which an organism maintains a steady-state of balance and stability in response to changes. Depletion of Wag31 not only results in delocalisation of lipids in intracellular lipid inclusions but also leads to changes in the levels of various lipid classes. Advancement in the field of spatial biology underscores the importance of native localization of various biological molecules crucial for maintaining a steady-cell of the cell. Hence, we have used the word “homeostasis” to describe both the changes observed in lipid metabolism.

      Reviewer #2 (Recommendations for the authors):

      I recommend the following experiments to strengthen the data presented:

      (1) Include a non-interacting FLAG-tagged protein as a negative control in the pull-down experiment to strengthen this data.

      We thank the reviewer for the comment. As suggested, we have included non-interacting FLAGtagged proteins as negative controls in the pulldown experiment. We chose MmpL4 and MmpS5 which were not found in the Wag31 interactome data. We performed pull-down experiments with both of them and included an interactor of Wag31 i.e. Msm2092 as a positive control. Fig. S3b revised shows E. coli lysate expressing His-Wag31 which was incubated with Msm lysates expressing either FLAG taggedMmpL4 or -MmpS5 or -Msm2092 (Fig. S3c revised). The mixed lysates were pulled down with Cobalt beads that bind to the His-tagged protein and analysed using Western blot analysis by probing with anti-FLAG antibody. The pull down experiments were performed independently twice, every time with Msm2092 as the positive control (Fig. S3d. revised).

      (2) Perform the pull-down experiments using only the Wag31 N-terminus to rule out any role that it may have in the protein-protein interactions.

      We thank the reviewer for the comment. To rule out the possibility of N-terminal of Wag31 in mediating protein-protein interactions, we cloned the N-terminal of Wag31 that comprises the DivIVAdomain in pET28b vector (Fig. 7a revised). Subsequently, the truncated protein, hereafter called Wag31<sub>∆C</sub> flanked by 6X His tags at both the termini was expressed in E. coli and subsequently mixed with Msm lysates expressing interactors of Wag31 (Fig. 7b-c revised). Earlier experiments with Wag31<sub>∆1-60</sub> or Wag31<sub>∆N</sub>  were performed with MurG, SepIVA, Msm2092 and AccA3 (Fig. 7 previous) so we used the same set of interactors to test our hypothesis. Briefly, His-Wag31<sub>∆C</sub>was mixed with Msm lysates expressing either FLAG-MurG, -SepIVA, -Msm2092 or -AccA3 and pull down experiments were performed as described previously. FLAG-MmpS5, a non-interactor of Wag31 was used as a negative control. As shown in Fig. 7d revised, His-Wag31 could bind to all the four interactors whereas His-Wag31<sub>∆C</sub> couldn’t, strengthening the conclusion that interactions of Wag31 with other proteins are mediated by its C-terminal. However, we can’t ignore the possibility of other proteins binding to the Nterminal of Wag31. Unfortunately, due to poor expression/instability of Wag31<sub>∆C</sub> in mycobacterial shuttle vectors, we couldn’t perform a global interactome analysis of Wag31<sub>∆C</sub>.

      Minor comments:

      - Please check the legend of Fig. 1g, it appears to be labelled incorrectly.

      We have checked it. It is correct. From Fig. 1g we are trying to reflect on the percentages of cells of the three strains i.e. Msm+ATc, Δwag31-ATc, and Δwag31+ATc displaying rod, round or bulged morphology.

      - For MS/MS analysis, a GFP control is mentioned but it is not indicated how this was incorporated in the data analysis. This information should be added.

      We have incorporated that in the revised methodology.

      - The information presented in Fig. 3a, e and f could be combined in one table.

      We appreciate the idea of the reviewer but we prefer a pictorial representation of the data. It allows readers to consume the information in parts, make quicker comparisons and understand trends easily.

      - Fig. 4c Wag31K20A appears smaller in size than the wild-type protein - why is this the case? Is this not a single amino acid substitution?

      Though K20A is a single amino acid substitution, it alters the mobility of Wag31 on SDS-PAGE gel. The sequence analysis of the plasmid expressing Wag31<sub>K20A</sub> doesn’t show additional mutations other than the desired K20A. The change in mobility could be due to a change in the conformation of Wag31<sub>K20A</sub> or its ability to bind to SDS or both that modify its mobility under the influence of electric field.

      - Please clarify what is contained in the first panel of fig 4e. compared to what is in the second panel.

      The first panel represents CL-Dil-Liposomes before incubation with Wag31-GFP and the second panel shows CL-Dil-Liposomes after incubation with Wag31-GFP. The third panel shows the mixture as observed in the green channel to investigate the localisation of Wag31-GFP in the liposome-protein mix. Fourth panel shows the merged of second and third.

      - The data in Fig 6d suggests higher levels of CL in the ∆wag31 compared to wild-type - how do the authors reconcile this with the MS data in Fig. 2g showing lower CL levels?

      Fig. 6d represents the distribution of CL localisation in the tested strains of mycobacteria whereas Fig. 2g shows the absolute levels of CL in various strains. We attribute greater confidence on the lipidomics data which suggests down regulation of CL species. The NAO staining and microscopy is merely for studying localization of the CL along the cell, and cannot be used to reliably quantify or equate it to CL levels. The staining using a probe such as NAO is dependent on factors such as hydrophobicity and permeability of the cell wall, which we expect to be severely altered in a Wag31 mutant. Therefore, the increased staining of NAO seen in Wag31 mutant could just be reflective of the increased uptake of the dye rather than absolute levels of CL. The specificity of staining and localization however can be expected to be unaltered.

      Reviewer #3 (Recommendations for the authors):

      Following are suggestions for improving the writing and presentation.

      • Figure 1, the meaning of the yellow arrows present in f and h should be mentioned in the figure legend.

      We have incorporated that in the revised legend. In Fig.1f, the yellow arrowhead represents the bulged pole morphology whereas in Fig. 1h, it indicates intracellular lipid inclusions.

      • Figure 7 legend refers to panels g, h, and i. However, Figure 7 only has panels a-c. The legend lacks a description of panel c.

      We have corrected the typos and the legend.

      • Figure S1, F2-R2 and F3-R3 expected sizes should be stated in the legend of the figure.

      We have updated the legends.

      • Figure S5, is this the same figure as 5e? If so, there is no need for this figure.

      We have removed Fig. S5.

      • Methods need to be written more carefully with enough details. I listed some of the concerns below.

      Detailed methodology was previously provided in the supplementary material and now we have moved it to the materials and methods in the revised manuscript.

      • Line 392, provide more details on western blotting. What is the secondary antibody? What image documentation system was used?

      We have updated the methodology.

      • Line 400, while the methods may be the same as the reference 64, authors should still provide key details such as the way samples were fixed and processed for SEM and TEM.

      We have provided a detailed description of the same in methodology in the revised version.

      • Line 437, how do authors calculate the concentration of liposome to be 10 µM? Do they possibly mean the concentration of phospholipids used to make the liposomes?

      Yes, this is the concentration of total lipids used to make liposomes. 1 μM of Wag31 or its mutants were mixed with 100 nm extruded liposomes containing 10 μm total lipid in separate Eppendorf tubes.

      • Supplemental Line 9, "turns of" should read "turns off".

      We have edited this.

      • Supplemental Line 13, define LHS and RHS.

      LHS or left hand sequence and RHS or right hand sequence refers to the upstream and downstream flanking regions of the gene of interest.

      • Supplemental Line 20, indicate the manufacturer of the microscope and type of the objective lens.

      We have added these details now.

      • Supplemental Line 31, define MeOH, or use a chemical formula like chloroform.

      MeOH is methanol. We have provided a chemical formula in the revised version.

      • Supplemental Line 53, indicate the concentration of trypsin.

      We have included that in the revised version.

      • Supplemental Line 72, g is not a unit. "30,000 g" should be "30,000x g".

      We have revised this in the manuscript.

      • Supplemental Line 114, provide more details on western blotting. What is the manufacturer of antiFLAG antibody? What is the secondary antibody? How was the antibody binding visualized? What image documentation system was used?

      We have provided these details in the revised version.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigate ligand and protein-binding processes in GPCRs (including dimerization) by the multiple walker supervised molecular dynamics method. The paper is interesting and it is very well written.

      Strengths:

      The authors' method is a powerful tool to gain insight on the structural basis for the pharmacology of G protein-coupled receptors.

      We thank the Reviewer for the positive comment on the manuscript and the proposed methods.

      Reviewer #2 (Public review):

      The study by Deganutti and co-workers is a methodological report on an adaptive sampling approach, multiple walker supervised molecular dynamics (mwSuMD), which represents an improved version of the previous SuMD.

      Case-studies concern complex conformational transitions in a number of G protein Coupled Receptors (GPCRs) involving long time-scale motions such as binding-unbinding and collective motions of domains or portions. GPCRs are specialized GEFs (guanine nucleotide exchange factors) of heterotrimeric Gα proteins of the Ras GTPase superfamily. They constitute the largest superfamily of membrane proteins and are of central biomedical relevance as privileged targets of currently marketed drugs.

      MwSuMD was exploited to address:

      a) binding and unbinding of the arginine-vasopressin (AVP) cyclic peptide agonist to the V2 vasopressin receptor (V2R);

      b) molecular recognition of the β2-adrenergic receptor (β2-AR) and heterotrimeric GDPbound Gs protein;

      c) molecular recognition of the A1-adenosine receptor (A1R) and palmotoylated and geranylgeranylated membrane-anchored heterotrimeric GDP-bound Gi protein;

      d) the whole process of GDP release from membrane-anchored heterotrimeric Gs following interaction with the glucagon-like peptide 1 receptor (GLP1R), converted to the active state following interaction with the orthosteric non-peptide agonist danuglipron.

      The revised version has improved clarity and rigor compared to the original also thanks to the reduction in the number of complex case studies treated superficially.

      The mwSuMD method is solid and valuable, has wide applicability and is compatible with the most world-widely used MD engines. It may be of interest to the computational structural biology community.

      The huge amount of high-resolution data on GPCRs makes those systems suitable, although challenging, for method validation and development.

      While the approach is less energy-biased than other enhanced sampling methods, knowledge, at the atomic detail, of binding sites/interfaces and conformational states is needed to define the supervised metrics, the higher the resolution of such metrics is the more accurate the outcome is expected to be. Definition of the metrics is a user- and system-dependent process.

      We thank the Reviewer for the positive comment on the revised manuscript and mwSuMD. We agree that the choice of supervised metrics is user- and systemdependent. We aim to improve this aspect in the future with the aid of interpretable machine learning.

      Reviewer #3 (Public review):

      Summary:

      In the present work Deganutti et al. report a structural study on GPCR functional dynamics using a computational approach called supervised molecular dynamics.

      Strengths:

      The study has potential to provide novel insight into GPCR functionality. Example is the interaction between D344 and R385 identified during the Gs coupling by GLP-1R. However, validation of the findings, even computationally through for instance in silico mutagenesis study, is advisable.

      Weaknesses:

      No significant advance of the existing structural data on GPCR and GPCR/G protein coupling is provided. Most of the results are reproductions of the previously reported structures.

      The method focus of our study (mwSuMD) is an enhancement of the supervised molecular dynamics that allows supervising two metrics at the same time and uses a score, rather than a tabù-like algorithm, for handing the simulation. Further changes are the seeding of parallel short replicas (walkers) rather than a series of short simulations, and the software implementation on different MD engines (e.g. Acemd, OpenMM, NAMD, Gromacs).

      We agree with the Reviewer that experimental validation of the findings would be advisable, in line with any computational prediction. We are positive that future studies from our group employing mwSuMD will inform mutagenesis and BRET-based experiments.

      Reviewer #2 (Recommendations for the authors):

      As for GLP1R, I remain convinced that the 7LCI would have been better as a reference for all simulations than 7LCJ, also because 7LCI holds a slightly more complete ECD.

      We agree that 7LCJ would have been a better starting point than 7LCI for simulations because it presents the stalk region, contrary to 7LCJ. However, we do not think it might have influenced the output because the stalk is the most flexible segment of GLP1R, and any initial conformation is usually not retained during MD simulations.

      Please, correct everywhere the definition of the 6LN2 structure of GPL1R as a ligand-free or apo, because that structure is indeed bound to a negative allosteric modulator docked on the cytosolic end of helix-6

      We thank the reviewer for this precision. The text has been modified accordingly.

      As for the beta2-AR, the "full-length" AlphaFold model downloaded from the GPCRdb is not an intermediate active state because it is very similar to the receptor in the 3SN6 complex with Gs. Please, eliminate the inappropriate and speculative adjective "intermediate".

      We have changed “intermediate” to “not fully active”, which is less speculative since full activation can be achieved only in the presence of the G protein.

      Incidentally, in that model, the C-tail, eliminated by the authors, is completely wrong and occupies the G protein binding site. It is not clear to me the reason why the authors preferred to used an AlphaFold model as an input of simulations rather than a high resolution structural model, e.g. 4LDO. Perhaps, the reason is that all ICL regions, including ICL3, were modeled by AlphaFold even if with low confidence. I disagree with that choice.

      We understand the reviewer’s point of view. Should we have simulated an “equilibrium” receptor-ligand complex, we would have made the same choice. However, the conformational changes occurring during a G protein binding are so consistent that the starting conformation of the receptor becomes almost irrelevant as long as a sensate structure is used.  

      Reviewer #3 (Recommendations for the authors):

      The revised version of the manuscript is more concise, focusing only on two systems. However, the authors have responded superficially to the reviewers' comments, merely deleting sections of text, making minor corrections, or adding small additions to the text. In particular, the authors have not addressed the main critical points raised by both Reviewer 2 and Reviewer 3. 

      For example, the RMSD values for the binding of PF06882961 to GLP-1R remain high, raising doubts about the predictive capabilities of the method, at least for this type of system.

      What is the RMSD of the ligand relative to the experimental pose obtained in the simulations? This value must be included in the text.

      We have added this piece of information about PF06882961 RMSD in the text, which on page 6 now reads “We simulated the binding of PF06882961, reaching an RMSD to its bound conformation in 7LCJ of 3.79 +- 0.83 Å (computed on the second half of the merged trajectory, superimposing on GLP-1R Ca atoms of TMD residues 150 to 390), using multistep supervision on different system metrics (Figure 2) to model the structural hallmark of GLP-1R activation (Video S5, Video S6).”

      Similarly, the activation mechanism of GLP-1R is only partially simulated.

      Furthermore, it is not particularly meaningful to justify the high RMSD values of the SuMD simulations for the binding of Gs to GLP-1R by comparing them with those reported under unbiased MD conditions. "Replica 2, in particular, well reproduced the cryo-EM GLP-1R complex as suggested by RMSDs to 7LCI of 7.59{plus minus}1.58Å, 12.15{plus minus}2.13Å, and 13.73{plus minus}2.24Å for Gα, Gβ, and Gγ respectively. Such values are not far from the RMSDs measured in our previous simulations of GLP-1R in complex with Gs and GLP-149 (Gα = 6.18 {plus minus} 2.40 Å; Gβ = 7.22 {plus minus} 3.12 Å; Gγ = 9.30 {plus minus} 3.65 Å), which indicates overall higher flexibility of Gβ and Gγ compared to Gα, which acts as a sort of fulcrum bound to GLP-1R."

      Without delving into the accuracy of the various calculations, the authors should acknowledge that comparing protein structures with such high RMSD values has no meaningful significance in terms of convergence toward the same three-dimensional structure.

      The text has been edited to accommodate the reviewer’s suggestion and still give the readers the measure of the high flexibility of Gs bound to GLP-1R. It now reads “Such values do not support convergence with the static experimental structure but are not far from the RMSDs measured in our previous simulations of GLP-1R in complex with G<sub>s</sub> and GLP-1 (G<sub>α</sub> = 6.18 ± 2.40 Å; G<sub>b</sub> = 7.22 ± 3.12 Å; G<sub>g</sub> = 9.30 ± 3.65 Å), which indicates overall higher flexibility of G<sub>b</sub> and G<sub>g</sub> compared to G<sub>α</sub>, which acts as a sort of fulcrum bound to GLP-1R.”

      Have the authors simulated the binding of the Gs protein using the experimentally active structure of GLP-1R in complex with the ligand PF06882961 (PDB ID 7LCJ)? Such a simulation would be useful to assess the quality of the binding simulation of Gs to the GLP1R/PF06882961 complex obtained from the previous SuMD.

      We considered performing the Gs binding simulation to the active structure of GLP-1R.

      However, the GLP-1R (and other class B receptors) fully active state, as reported in 7LCJ, depends on the presence of the Gs and can be reached only upon effector coupling. Since it is unlikely that the unbound receptor is already in the fully active state, we reasoned that considering it as a starting point for Gs binding simulations would have been an artifact.

      An example of the insufficient depth of the authors' replies can be seen in their response: "We note that among the suggested references, only Mafi et al report about a simulated G protein (in a pre-formed complex) and none of the work sampled TM6 rotation without input of energy."

      This statement is inaccurate. For instance, D'Amore et al. (Chem 2024, doi: 10.1016/j.chempr.2024.08.004) simulated Gs coupling to A2A as well as TM6 rotation, as did Maria-Solano and Choi (eLife 2023, doi: 10.7554/eLife.90773.1). The former employed path collective variables metadynamics, which is not cited in the introduction or the discussion, despite its relevance to the methodologies mentioned.

      Respectfully, our previous reply is correct, as all of the mentioned articles used enhanced (energy-biased) approaches, so the claim “none of the work sampled TM6 rotation without input of energy” stands. The reference to D’Amore et al. (published after the previous round of reviews of this manuscript) has been added to the introduction; we thank the reviewer for pointing it out. 

      Additionally, SuMD employs a tabu algorithm that applies geometric supervision to the simulation, serving as an alternative approach to enhancing sampling compared to the "input of energy" techniques as called by the authors. A fair discussion should clearly acknowledge this aspect of the SuMD methodology.

      We have now specified in the Methods that a tabù-like algorithm is part of SuMD, which, despite being the parent technique of mwSuMD, is not the focus of the present work. We provide extended references for readers interested in SuMD. mwSuMD, on the other hand, does not use a tabù-like algorithm but rather a continuative approach based on a score to select the best walker for each batch, as described in the Methods.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper contains what could be described as a "classic" approach towards evaluating a novel taste stimuli in an animal model, including standard behavioral tests (some with nerve transections), taste nerve physiology, and immunocytochemistry of taste cells of the tongue. The stimulus being tested is ornithine, from a class of stimuli called "kokumi" (in terms of human taste); these kokumi stimuli appear to enhance other canonical tastes, increasing what are essentially hedonic attributes of other stimuli. The mechanism for ornithine detection is thought to be GPRC6A receptors expressed in taste cells. The authors showed evidence for this in an earlier paper with mice; this paper evaluates ornithine taste in a rat model, and comes to a similar conclusion, albeit with some small differences between the two rodent species.

      Strengths:

      The data show effects of ornithine on taste/intake in laboratory rats: In two-bottle and briefer intake tests, adding ornithine results in higher intake of most, but all not all stimuli tested. Bilateral chorda tympani (CT) nerve cuts or the addition of GPRC6A antagonists decreased or eliminated these effects. Ornithine also evoked responses by itself in the CT nerve, but mainly at higher concentrations; at lower concentrations it potentiated the response to monosodium glutamate. Finally, immunocytochemistry of taste cell expression indicated that GPRC6A was expressed predominantly in the anterior tongue, and co-localized (to a small extent) with only IP3R3, indicative of expression in a subset of type II taste receptor cells.

      Weaknesses:

      As the authors are aware, it is difficult to assess a complex human taste with complex attributes, such as kokumi, in an animal model. In these experiments they attempt to uncover mechanistic insights about how ornithine potentiates other stimuli by using a variety of established experimental approaches in rats. They partially succeed by finding evidence that GPRC6A may mediate effects of ornithine when it is used at lower concentrations. In the revision they have scaled back their interpretations accordingly. A supplementary experiment measuring certain aspects of the effects of ornithine added to Miso soup in human subjects is included for the express purpose of establishing that the kokumi sensation of a complex solution is enhanced by ornithine; however, they do not use any such complex solutions in the rat studies. Moreover, the sample size of the human experiment is (still) small - it really doesn't belong in the same manuscript with the rat studies.

      Despite the reviewer’s suggestion, we would like to include the human sensory experiment. Our rationale is that we must first demonstrate that the kokumi of miso soup is enhanced by the addition of ornithine, which is then followed by basic animal experiments to investigate the underlying mechanisms of kokumi in humans.

      We did not present the additive effects of ornithine on miso soup in the present rat study because our previous companion paper (Fig. 1B in Mizuta et al., 2021, Ref. #26) already confirmed that miso soup supplemented with 3 mM L-ornithine (but not D-ornithine) was statistically significantly (P < 0.001) preferred to plain miso soup by mice.

      Furthermore, we believe that our sample size (n = 22) is comparable to those employed in other studies. For example, the representative kokumi studies by Ohsu et al. (Ref. #9), Ueda et al. (Ref. #10), Shibata et al. (Ref. #20), Dunkel et al. (Ref. #37), and Yang et al. (Ref. #44) used sample sizes of 20, 19, 17, 9, and 15, respectively.

      Reviewer #2 (Public review):

      Summary:

      The authors used rats to determine the receptor for a food-related perception (kokumi) that has been characterized in humans. They employ a combination of behavioral, electrophysiological, and immunohistochemical results to support their conclusion that ornithine-mediated kokumi effects are mediated by the GPRC6A receptor. They complemented the rat data with some human psychophysical data. I find the results intriguing, but believe that the authors overinterpret their data.

      Strengths:

      The authors provide compelling evidence that ornithine enhances the palatability of several chemical stimuli (i.e., IMP, MSG, MPG, Intralipos, sucrose, NaCl, quinine). Ornithine also increases CT nerve responses to MSG. Additionally, the authors provide evidence that the effects of ornithine are mediated by GPRC6A, a G-protein-coupled receptor family C group 6 subtype A, and that this receptor is expressed primarily in fungiform taste buds. Taken together, these results indicate that ornithine enhances the palatability of multiple taste stimuli in rats and that the enhancement is mediated, at least in part, within fungiform taste buds. This is an important finding that could stand on its own. The question of whether ornithine produces these effects by eliciting kokumi-like perceptions (see below) should be presented as speculation in the Discussion section.

      Weaknesses:

      I am still unconvinced that the measurements in rats reflect the "kokumi" taste percept described in humans. The authors conducted long-term preference tests, 10-min avidity tests and whole chorda tympani (CT) nerve recordings. None of these procedures specifically model features of "kokumi" perception in humans, which (according to the authors) include increasing "intensity of whole complex tastes (rich flavor with complex tastes), mouthfulness (spread of taste and flavor throughout the oral cavity), and persistence of taste (lingering flavor)." While it may be possible to develop behavioral assays in rats (or mice) that effectively model kokumi taste perception in humans, the authors have not made any effort to do so. As a result, I do not think that the rat data provide support for the main conclusion of the study--that "ornithine is a kokumi substance and GPRC6A is a novel kokumi receptor."

      Kokumi can be assessed in humans, as demonstrated by the enhanced kokumi perception observed when miso soup is supplemented with ornithine (Fig. S1). Currently, we do not have a method to measure the same kokumi perception in animals. However, in the two-bottle preference test, our previous companion paper (Fig. 1B in Mizuta et al. 2021, Ref. #26) confirmed that miso soup supplemented with 3 mM L-ornithine (but not D-ornithine) was statistically significantly (P < 0.001) preferred over plain miso soup by mice.

      Of the three attributes of kokumi perception in humans, the “intensity of whole complex tastes (rich flavor with complex tastes)” was partly demonstrated in the present rat study. In contrast, “mouthfulness (the spread of taste and flavor throughout the oral cavity)” could not be directly detected in animals and had to be inferred in the Discussion. “Persistence of taste (lingering flavor)” was evident at least in the chorda tympani responses; however, because the tongue was rinsed 30 seconds after the onset of stimulation, the duration of the response was not fully recorded.

      It is well accepted in sensory physiology that the stronger the stimulus, the larger the tonic response—and consequently, the longer it takes for the response to return to baseline. For example, Kawasaki et al. (2016, Ref. #45) clearly showed that the duration of sensation increased proportionally with the concentration of MSG, lactic acid, and NaCl in human sensory tests. The essence of this explanation has been incorporated into the Discussion (p. 12).

      Why are the authors hypothesizing that the primary impacts of ornithine are on the peripheral taste system? While the CT recordings provide support for peripheral taste enhancement, they do not rule out the possibility of additional central enhancement. Indeed, based on the definition of human kokumi described above, it is likely that the effects of kokumi stimuli in humans are mediated at least in part by the central flavor system.

      We agree with the reviewer’s comment. Our CT recordings indicate that the effects of kokumi stimuli on taste enhancement occur primarily at the peripheral taste organs. The resulting sensory signals are then transmitted to the brain, where they are processed by the central gustatory and flavor systems, ultimately giving rise to kokumi attributes. This central involvement in kokumi perception is discussed on page 12. Although kokumi substances exert their effects at low concentrations—levels at which the substance itself (e.g., ornithine) does not become more favorable or (in the case of γ-Glu-Val-Gly) exhibits no distinct taste—we cannot rule out the possibility that even faint taste signals from these substances are transmitted to the brain and interact with other taste modalities.

      The authors include (in the supplemental data section) a pilot study that examined the impact of ornithine on variety of subjective measures of flavor perception in humans. The presence of this pilot study within the larger rat study does not really mice sense. While I agree with the authors that there is value in conducting parallel tests in both humans and rodents, I think that this can only be done effectively when the measurements in both species are the same. For this reason, I recommend that the human data be published in a separate article.

      Despite the reviewer’s suggestion, we intend to include the human sensory experiment. Our rationale is that we must first demonstrate that the kokumi of miso soup is enhanced by the addition of ornithine, and then follow up with basic animal experiments to investigate the potential underlying mechanisms of kokumi in humans.

      In our previous companion paper (Fig. 1B in Mizuta et al., 2021, Ref. #26), we confirmed with statistical significance (P < 0.001) that mice preferred miso soup supplemented with 3 mM L-ornithine (but not D-ornithine) over plain miso soup. However, as explained in our response to Reviewer #2’s first concern (in the Public review), it is difficult to measure two of the three kokumi attributes—aside from the “intensity of whole complex tastes (rich flavor with complex tastes)”—in animal models.

      The authors indicated on several occasions (e.g., see Abstract) that ornithine produced "synergistic" effects on the CT nerve response to chemical stimuli. "Synergy" is used to describe a situation where two stimuli produce an effect that is greater than the sum of the response to each stimulus alone (i.e., 2 + 2 = 5). As far as I can tell, the CT recordings in Fig. 3 do not reflect a synergism.

      We appreciate your comments regarding the definition of synergy. In Fig. 5 (not Fig. 3), please note the difference in the scaling of the ordinate between Fig. 5D (ornithine responses) and Fig. 5E (MSG responses). When both responses are presented on the same scale, it becomes evident that the response to 1 mM ornithine is negligibly small compared to the MSG response, which clearly indicates that the response to the mixture of MSG and 1 mM ornithine exceeds the sum of the individual responses to MSG and 1 mM ornithine. Therefore, we have described the effect as “synergistic” rather than “additive.” The same observation applies to the mice experiments in our previous companion paper (Fig. 8 in Mizuta et al. 2021, Ref. #26), where synergistic effects are similarly demonstrated by graphical representation. We have also added the following sentence to the legend of Fig. 5:

      “Note the different scaling of the ordinate in (D) and (E).”

      Reviewer #3 (Public review):

      Summary:

      In this study the authors set out to investigate whether GPRC6A mediates kokumi taste initiated by the amino acid L-ornithine. They used Wistar rats, a standard laboratory strain, as the primary model and also performed an informative taste test in humans, in which miso soup was supplemented with various concentrations of L-ornithine. The findings are valuable and overall the evidence is solid. L-Ornithine should be considered to be a useful test substance in future studies of kokumi taste and the class C G protein coupled receptor known as GPRC6A (C6A) along with its homolog, the calcium-sensing receptor (CaSR) should be considered candidate mediators of kokumi taste. The researchers confirmed in rats their previous work on Ornithine and C6A in mice (Mizuta et al Nutrients 2021).

      Strengths:

      The overall experimental design is solid based on two bottle preference tests in rats. After determining the optimal concentration for L-Ornithine (1 mM) in the presence of MSG, it was added to various tastants including: inosine 5'-monophosphate; monosodium glutamate (MSG); mono-potassium glutamate (MPG); intralipos (a soybean oil emulsion); sucrose; sodium chloride (NaCl; salt); citric acid (sour) and quinine hydrochloride (bitter). Robust effects of ornithine were observed in the cases of IMP, MSG, MPG and sucrose; and little or no effects were observed in the cases of sodium chloride, citric acid; quinine HCl. The researchers then focused on the preference for Ornithine-containing MSG solutions. Inclusion of the C6A inhibitors Calindol (0.3 mM but not 0.06 mM) or the gallate derivative EGCG (0.1 mM but not 0.03 mM) eliminated the preference for solutions that contained Ornithine in addition to MSG. The researchers next performed transections of the chord tympani nerves (with sham operation controls) in anesthetized rats to identify a role of the chorda tympani branches of the facial nerves (cranial nerve VII) in the preference for Ornithine-containing MSG solutions. This finding implicates the anterior half-two thirds of the tongue in ornithine-induced kokumi taste. They then used electrical recordings from intact chorda tympani nerves in anesthetized rats to demonstrate that ornithine enhanced MSG-induced responses following the application of tastants to the anterior surface of the tongue. They went on to show that this enhanced response was insensitive to amiloride, selected to inhibit 'salt tastant' responses mediated by the epithelial Na+ channel, but eliminated by Calindol. Finally they performed immunohistochemistry on sections of rat tongue demonstrating C6A positive spindle-shaped cells in fungiform papillae that partially overlapped in its distribution with the IP3 type-3 receptor, used as a marker of Type-II cells, but not with (i) gustducin, the G protein partner of Tas1 receptors (T1Rs), used as a marker of a subset of type-II cells; or (ii) 5-HT (serotonin) and Synaptosome-associated protein 25 kDa (SNAP-25) used as markers of Type-III cells.

      At least two other receptors in addition to C6A might mediate taste responses to ornithine: (i) the CaSR, which binds and responds to multiple L-amino acids (Conigrave et al, PNAS 2000), and which has been previously reported to mediate kokumi taste (Ohsu et al., JBC 2010) as well as responses to Ornithine (Shin et al., Cell Signaling 2020); and (ii) T1R1/T1R3 heterodimers which also respond to L-amino acids and exhibit enhanced responses to IMP (Nelson et al., Nature 2001). These alternatives are appropriately discussed and, taken together, the experimental results favor the authors' interpretation that C6A mediates the Ornithine responses. The authors provide preliminary data in Suppl. 3 for the possibility of co-expression of C6A with the CaSR.

      Weaknesses:

      The authors point out that animal models pose some difficulties of interpretation in studies of taste and raise the possibility in the Discussion that umami substances may enhance the taste response to ornithine (Line 271, Page 9).

      Ornithine and umami substances interact to produce synergistic effects in both directions—ornithine enhances responses to umami substances, and vice versa. These effects may depend on the concentrations used, as described in the Discussion (pp. 9–10). Further studies are required to clarify the precise nature of this interaction.

      One issue that is not addressed, and could be usefully addressed in the Discussion, relates to the potential effects of kokumi substances on the threshold concentrations of key tastants such as glutamate. Thus, an extension of taste distribution to additional areas of the mouth (previously referred to as 'mouthfulness') and persistence of taste/flavor responses (previously referred to as 'continuity') could arise from a reduction in the threshold concentrations of umami and other substances that evoke taste responses.

      Thank you for this important suggestion. If ornithine reduces the threshold concentrations of tastants—including glutamate—and enhances their suprathreshold responses, then adding ornithine may activate additional taste cells. This effect could explain kokumi attributes such as an “extension of taste distribution” and possibly the “persistence of responses.” As shown in Fig. 2, the lowest concentrations used for each taste stimulus are near or below the thresholds, which indicates that threshold concentrations are reduced—especially for MSG and MPG. We have incorporated this possibility into the Discussion as follows (p.12):

      “Kokumi substances may reduce the threshold concentrations as well as they increase the suprathreshold responses of tastants. Once the threshold concentrations are lowered, additional taste cells in the oral cavity become activated, and this information is transmitted to the brain. As a result, the brain perceives this input as coming from a wider area of the mouth.”

      The status of one of the compounds used as an inhibitor of C6A, the gallate derivative EGCG, as a potential inhibitor of the CaSR or T1R1/T1R3 is unknown. It would have been helpful to show that a specific inhibitor of the CaSR failed to block the ornithine response.

      Thank you for this important comment. We attempted to identify a specific inhibitor of CaSR. Although we considered using NPS-2143—a commonly used CaSR inhibitor—it is known to also inhibit GPRC6A. We agree that using a specific CaSR inhibitor would be beneficial and plan to pursue this in future studies.

      It would have been helpful to include a positive control kokumi substance in the two bottle preference experiment (e.g., one of the known gamma glutamyl peptides such as gamma-glu-Val-Gly or glutathione), to compare the relative potencies of the control kokumi compound and Ornithine, and to compare the sensitivities of the two responses to C6A and CaSR inhibitors.

      We agree with this comment. In retrospect, it may have been advantageous to directly compare the potencies of CaSR and GPRC6A agonists in enhancing taste preferences—and to evaluate the sensitivity of these preferences to CaSR and GPRC6A antagonists. However, we did not include γ-Glu-Val-Gly in the present study because we have already reported its supplementation effects on the ingestion of basic taste solutions in rats using the same methodology in a separate paper (Yamamoto and Mizuta, 2022, Ref. #25). The results from both studies are compared in the Discussion (p. 11).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major:

      I am not convinced by the Author's arguments for including the human data. I appreciate their efforts in adding a few (5) subjects and improving the description, but it still feels like it is shoehorned into this paper, and would be better published as a different manuscript.

      This human study is short, but it is complete rather than preliminary. The rationale for us to include the human data as supplementary information is shown in responses to the reviewer’s Public review.

      Minor concerns:

      Page 3 paragraph 1: Suggest "contributing to palatability".

      Thank you for this suggestion. We have rewritten the text as follows:

      “…, the brain further processes these sensations to evoke emotional responses, contributing to palatability or unpleasantness”.

      Page 4 paragraph 2: The text still assumes that "kokumi" is a meaningful descriptor for what rodents experience. Re-wording the following sentence like this could help:

      "Neuroscientific studies in mice and rats provide evidence that gluthione and y-Glu-Val-Gly activate CaSRs, and modify behavioral responses to other tastants in a way that may correspond to kokumi taste as experienced by humans. However, to our..."

      Or something similar.

      Thank you for this suggestion. We have rewritten the sentence according to your suggestion as follows:

      "Neuroscientific studies (23,25,30) in mice and rats provide evidence that glutathione and y-Glu-Val-Gly activate CaSRs, and modify behavioral responses to other tastants in a way that may correspond to kokumi as experienced by humans”.

      Page 7 paragraph 1 - put the concentrations of Calindol and EGCG used (in the physiology exps) in the text.

      We have added the concentrations: “300 µM calindol and 100 µM EGCG”.

      Reviewer #2 (Recommendations for the authors):

      I have included all of my recommendations in the public review section.

      Reviewer #3 (Recommendations for the authors):

      Although the definitions of 'thickness', 'mouthfulness' and 'continuity' have been revised very helpfully in the Introduction, 'mouthfulness' reappears at other points in the MS e.g., Page 4, Results, Line 3; Page 9, Line 3. It is best replaced by the new definition in these other locations too.

      We wish to clarify that our revised text stated, “…to clarify that kokumi attributes are inherently gustatory, in the present study we use the terms ‘intensity of whole complex tastes (rich flavor with complex tastes)’ instead of ‘thickness,’ ‘mouthfulness (spread of taste and flavor throughout the oral cavity)’ instead of ‘continuity,’ and ‘persistence of taste (lingering flavor)’ instead of ‘continuity.’” The term “mouthfulness” was retained in our text, though we provided a more specific explanation. In the re-revised version, we have added “(spread of taste in the oral cavity)” immediately after “mouthfulness.”

      I doubt that many scientific readers will be familliar with the term 'intragemmal nerve fibres' (Page 8, Line 4). It is used appropriately but it would be helpful to briefly define/explain it.

      We have added an explanation as follows:

      “… intragemmal nerve fibers, which are nerve processes that extend directly into the structure of the taste bud to transmit taste signals from taste cells to the brain.”

      I previously pointed out the overlap between the CaSR's amino acid (AA) and gamma-glutamyl-peptide binding site. I was surprised by the authors' response which appeared to miss the point being made. It was based on the impacts of selected mutations in the receptor's Venus FlyTrap domain (Broadhead JBC 2011) on the responses to AAs and glutathione analogs. The significantly more active analog, S-methylglutathione is of additional interest because, like glutathione itself, it is present in mammalian body fluids. My apologies to the authors for not more carefully explaining this point.

      Thank you for this comment. Both CaSR and GPRC6A are recognized as broad-spectrum amino acid sensors; however, their agonist profiles differ. Aromatic amino acids preferentially activate CaSR, whereas basic amino acids tend to activate GPRC6A. For instance, among basic amino acids, ornithine is a potent and specific activator of GPRC6A, while γ-Glu-Val-Gly in addition to amino acids is a high-potency activator of CaSR. It remains unclear how effectively ornithine activates CaSR and whether γ-glutamyl peptides also activate GPRC6A. These questions should be addressed in future studies.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This valuable study uses consensus-independent component analysis to highlight transcriptional components (TC) in high-grade serous ovarian cancers (HGSOC). The study presents a convincing preliminary finding by identifying a TC linked to synaptic signaling that is associated with shorter overall survival in HGSOC patients, highlighting the potential role of neuronal interactions in the tumour microenvironment. This finding is corroborated by comparing spatially resolved transcriptomics in a small-scale study; a weakness is in being descriptive, non-mechanistic, and requiring experimental validation.”

      We sincerely thank the editors for their valuable and constructive feedback. We are grateful for the recognition of our findings and the importance of identifying transcriptional components in high-grade serous ovarian cancers.

      We acknowledge the editors’ observation regarding the descriptive nature of our study and its limited mechanistic depth. We agree that additional experimental validation would further strengthen our conclusions. We are planning and executing the experiments for a future study to provide mechanistic insights into the associations found in this study. In addition, recent reviews focused on the emerging field of cancer neuroscience emphasize the early stages the field is in, specifically in terms of a mechanistic understanding of the contributions of tumor-infiltrating nerves in tumor initiation and progression (Amit et al., 2024; Hwang et al., 2024). Nonetheless, we wish to emphasize that emerging mechanistic preclinical studies have demonstrated the influence of tumour-infiltrating nerves on disease progression (Allen et al., 2018; Balood et al., 2022; Darragh et al., 2024; Globig et al., 2023; Jin et al., 2022; Restaino et al., 2023; Zahalka et al., 2017). Several of these studies include contributions from our co-authors and feature in vitro and in vivo research on head and neck squamous cell carcinoma as well as high-grade serous ovarian carcinoma samples. This study further strengthens the preclinical work by showing in patient data, the potential relevance of neuronal signaling on disease outcome.

      For instance, Restiano et al. (2023) demonstrated that substance P, released from tumour-infiltrating nociceptors, potentiates MAP kinase signaling in cancer cells, thereby driving disease progression. Crucially, this effect was shown to be reversible in vivo by blocking the substance P receptor (Restaino et al., 2023). These findings offer compelling evidence of the role of tumour innervation in cancer biology.

      Our current study in tumor samples of patients with high-grade serous ovarian cancer identifies a transcriptional component that is enriched for genes for which the protein is located in the synapse. We believe that the previously published mechanistic insights support our findings and suggest that this transcriptional component could serve as a valuable screening tool to identify innervated tumours based on bulk transcriptomes. Clinically, this information is highly relevant, as patients with innervated tumours may benefit from alternate therapeutic strategies targeting these innervations.

      Reviewer #1 (Public review)

      This manuscript explores the transcriptional landscape of high-grade serous ovarian cancer (HGSOC) using consensus-independent component analysis (c-ICA) to identify transcriptional components (TCs) associated with patient outcomes. The study analyzes 678 HGSOC transcriptomes, supplemented with 447 transcriptomes from other ovarian cancer types and noncancerous tissues. By identifying 374 TCs, the authors aim to uncover subtle transcriptional patterns that could serve as novel drug targets. Notably, a transcriptional component linked to synaptic signaling was associated with shorter overall survival (OS) in patients, suggesting a potential role for neuronal interactions in the tumour microenvironment. Given notable weaknesses like lack of validation cohort or validation using another platform (other than the 11 samples with ST), the data is considered highly descriptive and preliminary.

      Strengths:

      (1) Innovative Methodology:

      The use of c-ICA to dissect bulk transcriptomes into independent components is a novel approach that allows for the identification of subtle transcriptional patterns that may be overshadowed in traditional analyses.

      We thank the reviewer for recognizing the strengths and novelty of our study. We appreciate the positive feedback on using consensus-independent component analysis (c-ICA) to decompose bulk transcriptomes, which allowed us to detect subtle transcriptional signals often overlooked in traditional analyses.

      (2) Comprehensive Data Integration:

      The study integrates a large dataset from multiple public repositories, enhancing the robustness of the findings. The inclusion of spatially resolved transcriptomes adds a valuable dimension to the analysis.

      We thank the reviewer for recognizing the robustness of our study through comprehensive data integration. We appreciate the acknowledgment of our efforts to leverage a large, multi-source dataset, as well as the additional insights gained from spatially resolved transcriptomes. We consider this integrative approach enhances the depth of our analysis and contributes to a more nuanced understanding of the tumour microenvironment.

      (3) Clinical Relevance:

      The identification of a synaptic signaling-related TC associated with poor prognosis highlights a potential new avenue for therapeutic intervention, emphasizing the role of the tumour microenvironment in cancer progression.

      We appreciate the recognition of the clinical implications of our findings. The identification of a synaptic signaling-related transcriptional component associated with poor prognosis underscores the potential for novel therapeutic targets within the tumour microenvironment. We agree that this insight could open new avenues for intervention and further highlights the role of neuronal interactions in cancer progression.

      Weaknesses:

      (1) Mechanistic Insights:

      While the study identifies TCs associated with survival, it provides limited mechanistic insights into how these components influence cancer progression. Further experimental validation is necessary to elucidate the underlying biological processes.

      We acknowledge the point regarding the limited mechanistic insights provided in our study. We agree that further experimental validation would significantly enhance our understanding of how the biological processes captured by these transcriptional components influence cancer progression. We are planning and executing the experiments for  a future study to provide mechanistic insights into the associations found in this study.

      Our analyses were performed on publicly available bulk and spatial resolved expression profiles. To investigate the mechanistic insights in future studies, we plan to integrate spatial transcriptomic data with immunohistochemical analysis of the same tumour samples to validate our findings. Additionally, we have initiated efforts to set up in vitro co-cultures of neurons and ovarian cancer cells. These co-cultures will enable us to investigate how synaptic signaling impacts ovarian cancer cell behavior.

      (2) Generalizability:

      The findings are primarily based on transcriptomic data from HGSOC. It remains unclear how these results apply to other subtypes of ovarian cancer or different cancer types.

      To respond to this remark, we utilized survival data from Bolton et al. (2022) and TCGA to investigate associations between TC activity scores and overall survival of patients with ovarian clear cell carcinoma, the second most common subtype of epithelial ovarian cancer, and  other cancer types respectively. However, we acknowledge the limitations of TCGA survival data, as highlighted in the referenced article (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8726696/). Additionally, as shown in Figure 5, we provided evidence of TC121 activity across various cancer types, suggesting broader relevance. For the results of the analyses mentioned above, please refer to our response to remark 1.3 of the recommendation section (page 4).

      (3) Innovative Methodology:

      Requires more validation using different platforms (IHC) to validate the performance of this bulk-derived data. Also, the lack of control over data quality is a concern.

      We acknowledge the value of validating our results with alternative platforms such as IHC. We are planning and executing the experiments for a future study to provide mechanistic insights into the associations found in this study.

      We implemented regarding data quality control, the following measures to ensure the reliability of our analysis:

      Bulk Transcriptional Profiles: To assess data quality, we conducted principal component analysis (PCA) on the sample Pearson product-moment correlation matrix. The first principal component (PCqc), which explains approximately 80-90% of the variance, was used to distinguish technical variability from biological signals (Bhattacharya et al., 2020). Samples with a correlation coefficient below 0.8 relative to PCqc were identified as outliers and excluded. Additionally, MD5 hash values were generated for each CEL file to identify and remove duplicate samples. Expression values were standardized to a mean of zero and a variance of one for each gene to minimize probeset- or gene-specific variability across datasets (GEO, CCLE, GDSC, and TCGA).

      Spatial Transcriptional Profiles: PCA was also applied to spatial transcriptomic data for quality control. Only samples with consistent loading factor signs for the first principal component across all individual spot profiles were retained. Samples failing this criterion were excluded from further analyses.

      (4) Clinical Application:

      Although the study suggests potential drug targets, the translation of these findings into clinical practice is not addressed. Probably given the lack of some QA/QC procedures it'll be hard to translate these results. Future studies should focus on validating these targets in clinical settings.”

      Regarding clinical applications, we acknowledge the importance of further exploring strategies targeting synaptic signaling and neurotransmitter release in the tumour microenvironment (TME). As partially discussed in the first version of the manuscript, drugs such as ifenprodil and lamotrigine—commonly used to treat neuronal disorders—can block glutamate release, thereby inhibiting subsequent synaptic signaling. Additionally, the vesicular monoamine transporter (VMAT) inhibitor reserpine blocks the formation of synaptic vesicles (Reid et al., 2013; Williams et al., 2001). Previous in vitro studies with HGSOC cell lines demonstrated that ifenprodil significantly reduced cancer cell proliferation, while reserpine triggered apoptosis in cancer cells (North et al., 2015; Ramamoorthy et al., 2019). The findings highlight the potential of such approaches to disrupt synaptic neurotransmission in the TME.

      To address potential translation of our findings into clinical practice more comprehensively, we have included additional details in the manuscript:

      Section discussion, page 16, lines 338-341:

      “This interaction can be targeted with pan-TRK inhibitors such as entrectinib and larotrectinib. Both drugs are showing promising results in multiple phase II trials, including ovarian cancer and breast cancer patients. Furthermore, a TRKB-specific inhibitor was developed (ANA-12), but has not been subjected to any clinical trials in cancer so far (Ardini et al., 2016; Burris et al., 2015; Drilon et al., 2018, 2017).”

      On page 17, lines 361-374:

      “Strategies to disrupt neuronal signaling and neurotransmitter release in neurons target key elements of excitatory neurotransmission, such as calcium flux and vesicle formation. Drugs like ifenprodil and lamotrigine, commonly used to treat neuronal disorders, block glutamate release and subsequent neuronal signaling. Additionally, the vesicular monoamine transporter (VMAT) inhibitor reserpine prevents synaptic vesicle formation (Reid et al., 2013; Williams, 2001). In vitro studies with HGSOC cell lines have demonstrated that ifenprodil significantly inhibits tumour proliferation, while reserpine induces apoptosis in cancer cells (North et al., 2015; Ramamoorthy et al., 2019). These approaches hold promise for inhibiting neuronal signaling and interactions in the TME.”

      Reviewer #2 (Public review):

      Summary:

      Consensus-independent component analysis and closely related methods have previously been used to reveal components of transcriptomic data that are not captured by principal component or gene-gene coexpression analyses.

      Here, the authors asked whether applying consensus-independent component analysis (c-ICA) to published high-grade serous ovarian cancer (HGSOC) microarray-based transcriptomes would reveal subtle transcriptional patterns that are not captured by existing molecular omics classifications of HGSOC.

      Statistical associations of these (hitherto masked) transcriptional components with prognostic outcomes in HGSOC could lead to additional insights into underlying mechanisms and, coupled with corroborating evidence from spatial transcriptomics, are proposed for further investigation.

      This approach is complementary to existing transcriptomics classifications of HGSOC.

      The authors have previously applied the same approach in colorectal carcinoma (Knapen et al. (2024) Commun. Med).

      Strengths:

      (1) Overall, this study describes a solid data-driven description of c-ICA-derived transcriptional components that the authors identified in HGSOC microarray transcriptomics data, supported by detailed methods and supplementary documentation.

      We thank the reviewer for acknowledging the strength of our data-driven approach and the use of consensus-independent component analysis (c-ICA) to identify transcriptional components within HGSOC microarray data. We aimed to provide comprehensive methodological detail and supplementary documentation to support the reproducibility and robustness of our findings. We believe this approach allows for the identification of subtle transcriptional signals that might have been overlooked by traditional analysis methods.

      (2) The biological interpretation of transcriptional components is convincing based on (data-driven) permutation analysis and a suite of analyses of association with copy-number, gene sets, and prognostic outcomes.

      We appreciate the positive feedback on the biological interpretation of our transcriptional components. We are pleased that our approach, which includes data-driven permutation testing and analyses of associations with copy-number alterations, gene sets, and prognostic outcomes, was found to be convincing. These analyses were integral to enhancing our findings’ robustness and biological relevance.

      (3) The resulting annotated transcriptional components have been made available in a searchable online format.

      Thank you for this important positive remark.

      (4) For the highlighted transcriptional component which has been annotated as related to synaptic signalling, the detection of the transcriptional component among 11 published spatial transcriptomics samples from ovarian cancers appears to support this preliminary finding and requires further mechanistic follow-up.

      Thank you for acknowledging the accessibility of our annotated transcriptional components. We prioritized making these data available in a searchable online format to facilitate further research and enable the community to explore and validate our findings.

      Weaknesses:

      (1) This study has not explicitly compared the c-ICA transcriptional components to the existing reported transcriptional landscape and classifications for ovarian cancers (e.g. Smith et al Nat Comms 2023; TCGA Nature 2011; Engqvist et al Sci Rep 2020) which would enable a further assessment of the additional contribution of c-ICA - whether the cICA approach captured entirely complementary components, or whether some components are correlated with the existing reported ovarian transcriptomic classifications.

      We acknowledge the reviewer’s insightful suggestion to compare our c-ICA-derived transcriptional components with previously reported ovarian cancer classifications, such as those from Smith et al. (2023), TCGA (2011), and Engqvist et al. (2020). To address this, we incorporated analyses comparing the activity scores of our transcriptional components with these published landscapes and classifications, particularly focusing on any associations with overall survival. Additionally, we evaluated correlations between gene signatures from a subset of these studies and our identified TCs, enhancing our understanding of the unique contributions of the c-ICA approach. Please refer to our response to remark 10 for the results of these analyses.

      (2) Here, the authors primarily interpret the c-ICA transcriptional components as a deconvolution of bulk transcriptomics due to the presence of cells from tumour cells and the tumour microenvironment.

      However, c-ICA is not explicitly a deconvolution method with respect to cell types: the transcriptional components do not necessarily correspond to distinct cell types, and may reflect differential dysregulation within a cell type. This application of c-ICA for the purpose of data-driven deconvolution of cell populations is distinct from other deconvolution methods that explicitly use a prior cell signature matrix.”

      We acknowledge that c-ICA, unlike traditional deconvolution methods, is not specifically designed for cell-type deconvolution and does not rely on a predefined cell signature matrix. While we explored the transcriptional components in the context of tumour and microenvironmental interactions, we agree that these components may not correspond directly to distinct cell types but rather reflect complex patterns of dysregulation, potentially within individual cell populations.

      Our goal with c-ICA was to uncover hidden transcriptional patterns possibly influenced by cellular heterogeneity. However, we recognize these patterns may also arise from regulatory processes within a single cell type. To investigate further, we used single-cell transcriptional data (~60,000 cell-types annotated profiles from GSE158722) and projected our transcriptional components onto these profiles to obtain activity scores, allowing us to assess each TC’s behavior across diverse cellular contexts after removing the first principal component to minimize background effects. Please refer to our response to remark 2.2 in the recommendations to the authors (page 14) for the results of this analysis.

      References

      Allen JK, Armaiz-Pena GN, Nagaraja AS, Sadaoui NC, Ortiz T, Dood R, Ozcan M, Herder DM, Haemerrle M, Gharpure KM, Rupaimoole R, Previs R, Wu SY, Pradeep S, Xu X, Han HD, Zand B, Dalton HJ, Taylor M, Hu W, Bottsford-Miller J, Moreno-Smith M, Kang Y, Mangala LS, Rodriguez-Aguayo C, Sehgal V, Spaeth EL, Ram PT, Wong ST, Marini FC, Lopez-Berestein G, Cole SW, Lutgendorf SK, diBiasi M, Sood AK. 2018. Sustained adrenergic signaling promotes intratumoral innervation through BDNF induction. Cancer Res 78 (12):3233-3242.

      Ardini E, Menichincheri M, Banfi P, Bosotti R, Ponti CD, Pulci R, Ballinari D, Ciomei M, Texido G, Degrassi A, Avanzi N, Amboldi N, Saccardo MB, Casero D, Orsini P, Bandiera T, Mologni L, Anderson D, Wei G, Harris J, Vernier J-M, Li G, Felder E, Donati D, Isacchi A, Pesenti E, Magnaghi P, Galvani A. 2016. Entrectinib, a Pan–TRK, ROS1, and ALK Inhibitor with activity in multiple molecularly defined cancer Indications. Mol Cancer Ther 15:628–639.

      Balood M, Ahmadi M, Eichwald T, Ahmadi A, Majdoubi A, Roversi Karine, Roversi Katiane, Lucido CT, Restaino AC, Huang S, Ji L, Huang K-C, Semerena E, Thomas SC, Trevino AE, Merrison H, Parrin A, Doyle B, Vermeer DW, Spanos WC, Williamson CS, Seehus CR, Foster SL, Dai H, Shu CJ, Rangachari M, Thibodeau J, Rincon SVD, Drapkin R, Rafei M, Ghasemlou N, Vermeer PD, Woolf CJ, Talbot S. 2022. Nociceptor neurons affect cancer immunosurveillance. Nature 611:405–412.

      Bhattacharya A, Bense RD, Urzúa-Traslaviña CG, Vries EGE de, Vugt MATM van, Fehrmann RSN. 2020. Transcriptional effects of copy number alterations in a large set of human cancers. Nat Commun 11:715.

      Burris HA, Shaw AT, Bauer TM, Farago AF, Doebele RC, Smith S, Nanda N, Cruickshank S, Low JA, Brose MS. 2015. Abstract 4529: Pharmacokinetics (PK) of LOXO-101 during the first-in-human Phase I study in patients with advanced solid tumors: Interim update. Cancer Res 75:4529–4529.

    1. Author response:

      We thank the reviewers for their evaluation, for helpful suggestions to improve clarity and accuracy, and for their positive reception of the manuscript. We will incorporate their suggestions in a revised manuscript. Here, we respond to their major comments. 

      The reviewers suggest that a molecular study of Hofstenia’s reproductive systems would be beneficial, as would mechanistic explanations for its unusual reproductive behavior. We agree with the reviewers that both of these would be interesting avenues, although we think this is outside the scope of this current manuscript. This manuscript studies growth and reproductive dynamics in acoels, and establishes a foundation to study its underlying molecular, developmental, and physiological machinery. 

      Our previous molecular work, using scRNAseq and FISH, identified several germline markers. Here, we show that two of them are specific markers of testes and ovaries, respectively. This, together, with our new anatomical data, allows us to identify the expression domains of most of these other markers more clearly. Some markers may be expressed in a presumptive common germline that eventually splits into an anterior male germline and posterior female germline. We agree with the reviewers that understanding the dynamics of germline differentiation and its molecular genetic underpinnings would be very interesting, and we hope to address this in future work. 

      As the reviewers note, we do not understand how sperm is stored, how the worm’s own sperm can travel to its ovaries to enable selfing, or how eggs in the ovaries travel within the body. We agree with the reviewers that understanding these processes would be very interesting. Our histological and molecular work so far has been unable to find tube-like structures or other cavities for storage and transport. Potentially, cells could move within the parenchyma. Explaining these events will require substantial effort (including mechanistic studies of cell behavior and ultrastructural studies that the reviewers suggest), and we hope to do this in future work. 

      We agree with Reviewer 1 that it is interesting that Piwi-1 expression is only observed in the ovaries and not in the testes - unusual given its broad germline expression in many taxa. Although there are several possible explanations for this finding (for eg. Piwi-1 could be expressed at low levels in male germline, perhaps other Piwi proteins are expressed in male germline, or Piwi may play roles in male germline progenitors that are not co-located with maturing sperm, etc), we do not currently know why this is so, and we will discuss these possibilities in our revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors report the role of a novel gene Aff3ir-ORF2 in flow-induced atherosclerosis. They show that the gene is anti-inflammatory in nature. It inhibits the IRF5-mediated athero-progression by inhibiting the causal factor (IRF5). Furthermore, the authors show a significant connection between shear stress and Aff3ir-ORF2 and its connection to IRF5 mediated athero-progression in different established mice models which further validates the ex vivo findings.

      Strengths:

      (1) An adequate number of replicates were used for this study.

      (2) Both in vitro and in vivo validation was done.

      (3) The figures are well presented.

      (4) In vivo causality is checked with cleverly designed experiments.

      We thank you for your positive remarks.

      Weaknesses:

      (1) Inflammatory proteins must be measured with standard methods e.g ELISA as mRNA level and protein level does not always correlate.

      Thanks. We have followed your advice and performed ELISA experiments to measure the concentrations of inflammatory cytokines, including IL-6 and IL-1β. The newly acquired results have been included in Figure 2E (Line 160-163) in the revised manuscript.

      (2) RNA seq analysis has to be done very carefully. How does the euclidean distance correlate with the differential expression of genes. Do they represent the neighborhood?

      If they do how does this correlation affect the conclusion of the paper?

      We thank the reviewer for this professional comments and apologize for the confusion. The heatmap using Euclidean distance was generated based on the expression levels of all differentially expressed genes (calculated with deseq2). Since its interpretation overlaps with the volcano plot presented in Figure 4B, we have moved the heatmap to Figure S5A in the revised manuscript and provided a detailed description in the figure legend (Lines 106-108 in the supporting information). Additionally, to better illustrate the variation among all samples, we have performed PCA analysis and included the new results in Figure 4A of the revised manuscript.

      (3) The volcano plot does not indicate the q value of the shown genes. It is advisable to calculate the q value for each of the genes which represents the FDR probability of the identified genes.

      Thank you for your careful review. We apologize for the incorrect labeling.

      It was P.adj value. The label for Figure 4B has been corrected in the revised manuscript. 

      (4) GO enrichment was done against the Global gene set or a local geneset? The authors should provide more detailed information about the analysis.

      Thank you. We performed GO enrichment analysis against the global gene set. The description of the results has been updated in the revised manuscript (Lines 222–224).

      (5) If the analysis was performed against a global gene set. How does that connect with this specific atherosclerotic microenvironment?

      Thank you for your insightful comments. We have followed your advice and investigated the functional characteristics of these differentially expressed genes in the context of the atherosclerotic microenvironment. The RNA-seq differential gene list was further mapped onto the atherosclerosis-related gene dataset (PMID: 27374120), resulting in 363 overlapping genes. The 363 genes were subjected to bioinformatics enrichment analysis using Gene Ontology (GO) databases. GO analysis of these genes revealed enrichment in processes related to cell−cell adhesion and leukocyte activation involved in immune response (Figure S5B), which is highly consistent with the observed effects of AFF3ir-ORF2 on VCAM-1 expression. The newly acquired data are presented in Figure S5B and the description of the results is included in the revised manuscript (Line 227-233).

      (6) What was the basal expression of genes and how did the DGE (differential gene expression) values differ?

      Thanks for the comments. The RNA-sequencing data has been submitted to GEO datasets (GSE286206), making the basal gene expression data available to readers.

      The differential expression analysis was performed using DESeq2 (v1.4.5) (PMID: 25516281) with a criterion of 1.5-fold change and P<0.05. We has included the description in the revised manuscript in Lines 220-222 and Lines 575-576.

      (7) How was IRF5 picked from GO analysis? was it within the 20 most significant genes?

      Sorry for the confusion. IRF5 was not identified through GO analysis. To determine the upstream transcriptional regulators, we used the ChEA3 database to predict potential upstream transcription factors based on all differentially expressed genes. The top 20 transcription factors were selected based on their scores. To further explore their relationship with atherosclerosis, these top 20 transcription factors were mapped to the atherosclerosis-related gene list in the DisGeNET database. IRF5 and IRF8 were the only two overlapping genes. To clarify this process, we have included a more detailed description of the IRF prediction approach in the revised manuscript (Lines 234–239).

      (8) Microscopic studies should be done more carefully? There seems to be a global expression present on the vascular wall for Aff3ir-ORF2 and the expression seems to be similar to AFF3 in Figure 1.

      We thank the reviewer for the valuable suggestion. We have followed your advice and provided the more representative images in Figure 1F.

      Reviewer #2 (Public review):

      Summary:

      The authors recently uncovered a novel nested gene, Aff3ir, and this work sets out to study its function in endothelial cells further. Based on differences in expression correlating with areas of altered shear stress, they investigate a role for the isoform Aff3ir-ORF2 in endothelial activation and development of atherosclerosis downstream of disturbed shear stress. Using a knockout mouse model and in vivo overexpression experiments, they demonstrate a strong potential for Aff3ir-ORF2 to alleviate atherosclerosis. They find that Aff3ir-ORF2 interacts with the pro-inflammatory transcription factor IRF5 and retains it in the cytoplasm, hence preventing upregulation of inflammation-associated genes. The data expands our knowledge of IRF5 regulation which could be relevant to researchers studying various inflammatory diseases as well as adding to our understanding of atherosclerosis development.

      Strengths:

      The in vivo data is solid using immunofluorescence staining to assess AFF3ir-ORF2 expression, a knockout mouse model, overexpression and knockdown studies, and rescue experiments in combination with two atherosclerotic models to demonstrate that Aff3ir-ORF2 can lessen atherosclerotic plaque formation in ApoE<sup>-/-</sup> mice.

      We thank you for your positive remarks.

      Weaknesses:

      While the in vivo data is generally convincing, a few data panels have issues and will need addressing. Also, the knockout mouse model will need to be described, since the paper referred to in the manuscript does not actually report any knockout mouse model. Hence it is unclear how Aff3ir-ORF2 is targeted, but Figure S2B shows that targeting is partial, since about 30% expression remains at the RNA level in MEFs isolated from the knockout mice.

      We thank you for the valuable comments. 

      First, we have followed your advice and included detailed information regarding the animal construction in the revised manuscript in Line 405-415. Additionally, the genotyping results have been included in new Figure S3A.

      Second, we acknowledge your concern about the knockout efficiency of ORF2 in mice. While the PCR assay indicated approximately 30% residual expression, our Western blot analysis of aorta samples demonstrated that ORF2 protein was barely detectable in knockout mice, as shown in new Figure S3B-C. Besides, our in vivo experiments using MEF from WT and AFF3ir-ORF2<sup>-/-</sup> mice (Figure 4I) further confirmed successful knockout. 

      Third, we have included a discussion addressing the discrepancies between PCR and Western blot results. In addition to technical differences between the two methods, the nature of AFF3ir-ORF2 may also contribute to these inconsistencies. The parent gene AFF3 is located in a genetically variable region and can be excised via intron 5 to form a replicable transposon, which translocates to other chromosomes and has been linked to leukemia (PMID: 34995897, 12203795, 12743608, and 17968322). AFF3ir is located in the intron 6, thus it exists in the transposon, which may complicate the measurement of its expression. Replicable transposons can exist as extrachromosomal elements, allowing them to be inherited across generations. We have included these discussion in the revised manuscript in Line 188-196.

      While the effect on atherosclerosis is clear, the conclusion that this is the result of reduced endothelial cell activation is not supported by the data. The mouse model is described as a global knockout and the shRNA knockdowns (Figure 5) and overexpression data in Figure 2 are not cell type-specific. Only the overexpression construct in Figure 6 uses an ICAM-2 promoter construct, which drives expression in endothelial cells, though leaky expression of this promoter has been reported in the literature. Therefore, other cell types such as smooth muscle cells or macrophages could be responsible for the effects observed.

      Thank you for your critical comment. To address your concern, we have made the following three revisions:

      First, we have analyzed the expression of AFF3ir-ORF2 in the vascular wall with or without intima in WT and AFF3ir-ORF2 knockout mice. As shown in Figure 1B and Figure S1A, while the expression of AFF3ir-ORF2 was notably downregulated in the aortic intima of athero-prone regions compared to the protective region, it remained largely unchanged in the aortic wall without intima across different regions of the aorta. This suggested that AFF3ir-ORF2 might play a predominant role in endothelial cells rather than other cell types in the context of shear stress.

      Second, we have used human endothelial cells (HUVECs) to further confirm our findings. As shown in Figure 2C and Figure S2B, we found that AFF3ir-ORF2 overexpression could attenuate disturbed shear stress-induced IRF5 nuclear translocation and the expression of inflammatory genes in HUVECs, suggesting the potential anti-inflammatory effects of AFF3ir-ORF2 in endothelial cells.

      Third, we agree with the reviewer’s comment that we cannot completely exclude the potential involvement of other cell types. Hence, we have included a limitation statement in the discussion part in Lines 341-344.

      The weakest part of the manuscript is the in vitro experiment using some nonidentifiable expression differences. The data is used to hypothesise on a role for IRF5 in the effects observed with Aff3ir-ORF2 knockout.

      Thank you for the comments. To address your concerns, we have made the following two changes:

      First, we have further investigated the functional features of the differential genes from the RNA-seq in the context of atherosclerotic microenvironment. The differential gene list was mapped onto the atherosclerosis-related gene dataset (PMID: 27374120), and a total of 363 genes overlapped. These 363 genes were subjected to bioinformatics enrichment analysis using Gene Ontology (GO) databases. GO analysis showed that these genes were mainly enriched in cell−cell adhesion and leukocyte activation involved in immune response, which aligns with the expression of VCAM-1 affected by AFF3ir-ORF2. The newly acquired data are presented in Figure S5B and the description of the results has been updated in the revised manuscript (Line 227-233).

      Second, we have further verified the RNA-seq results in vitro. Several classical inflammatory factors, including ICAM-1, CCL5, and CXCL10, which mRNA levels were significantly downregulated in RNA-seq and were also identified as target genes of IRF5, were analyzed. We found that AFF3ir-ORF2 deficiency aggravated, while AFF3ir-ORF2 overexpression attenuated, the expression of ICAM-1, CCL5, and CXCL10 induced by disturbed shear stress (New Figure S5D). Besides, the regulation of ICAM-1 by AFF3ir-ORF2 was confirmed at both protein and mRNA levels in HUVECs (Figure 2C-D and Figure S2B). 

      Overall, the paper succeeds in demonstrating a link between Aff3ir-ORF2 and atherosclerosis, but the cell types involved and mechanisms remain unclear. The study also shows a functional interaction between Aff3ir-ORF2 and IRF5 in embryonic fibroblasts, but any relevance of this mechanism for atherosclerosis or any cell types involved in the development of this disease remains largely speculative.

      Thank you for all the valuable comments. The specific responses have been provided above. Briefly, we have followed your advice and further confirmed the regulation of AFF3ir-ORF2 on IRF5 in endothelial cells. Besides, the RNA-seq results have been further analyzed, and partial results have been verified in endothelial cells to support the anti-inflammatory role of AFF3ir-ORF2. We greatly appreciate the reviewer’s insightful comments, which guided our revisions and contributed to significantly improving the paper.

      Reviewer #3 (Public review):

      This study is to demonstrate the role of Aff3ir-ORF2 in the atheroprone flow-induced EC dysfunction and ensuing atherosclerosis in mouse models. Overall, the data quality and comprehensiveness are convincing. In silico, in vitro, and in vivo experiments and several atherosclerosis were well executed. To strengthen further, the authors can address human EC relevance.

      We thank you for your positive remarks and insightful comments.

      Major comments:

      (1) The tissue source in Figures 1A and 1B should be clarified, the whole aortic segments or intima? If aortic segment was used, the authors should repeat the experiments using intima, due to the focus of the current study on the endothelium.

      We thank you for the suggestion. The tissue used in Figures 1A and 1B was from aortic intima. The description has been updated for clarity in the revised manuscript on Lines 114-125. 

      (2) Why were MEFs used exclusively in the in vitro experiments? Can the authors repeat some of the critical experiments in mouse or human ECs?

      Thank you for this insightful comment. Isolation and culture of mouse primary aortic ECs were notorious technically difficult and shear stress experiment require a large number of cells. Considering MEFs exhibit responses consistent with those of ECs, which has been delicately proved (PMID: 23754392), we used MEFs in our in vitro experiments.

      However, following your valuable advice, we have now employed human ECs (HUVECs) to confirm our findings. Consistent with our results in MEFs, we found that AFF3ir-ORF2 overexpression reduced the expression of inflammatory genes induced by disturbed shear stress at both protein and mRNA levels in HUVECs (Figure 2C, Figure S2B). Notably, despite the significant anti-inflammatory effects of AFF3irORF2, the sequence of this gene is not conserved in Homo sapiens and lacks an initiation codon, which is why we did not further proceed with the loss-of-function experiments.

      (3) The authors should explain why AFF3ir-ORF2 overexpression did not affect the basal level expression of ICAM-1, VCAM-1, IL-1b, and IL-6 under ST conditions (Figure 2A-C).

      We thank you for raising this critical question. Indeed, we found that AFF3ir-ORF2 overexpression did not affect the basal level of inflammatory genes under ST conditions, while it exerted anti-inflammatory effects under OSS conditions. One underlying reason might be the relative low level of expression of inflammatory genes under ST compared to OSS conditions. Additionally, as our findings suggested, AFF3ir-ORF2 exerted its anti-inflammatory role by binding to IRF5 and inhibiting IRF5 nuclear translocation. However, as shown in Figure 4I, IRF5 might be predominantly localized in the cytoplasm rather than the nucleus under ST conditions.

      We have included the description in the revised manuscript on Lines 157-163.

      (4) Please include data from sham controls, i.e., right carotid artery in Figure 2E.

      Thank you for the suggestion. We have followed your advice and included sham controls (staining of the right carotid arteries) in Figure S2E.

      (5) Given that the merit of the study lies in the effect of different flow patterns, the legion areas in AA and TA (Figure 3B, 3C) should be separately compared.

      We have followed your valuable suggestion and included the additional statistical results in Figure 3C in the revised manuscript.

      (6) For confirmatory purposes for the variations of IRF5 and IRF8, can the authors mine available RNA-seq or even scRNA-seq data on human or mouse atherosclerosis? This approach is important and could complement the current results that are lacking EC data.

      Thank you for your valuable suggestion. In the present study, we found that disturbed flow did not alter the protein level of IRF5 but promoted its nuclear translocation. Following your advice, we analyzed the expression of IRF5 in human ECs (GSE276195) and atherosclerotic mouse arteries (GSE222583) using public databases. Consistently, IRF5 did not show significant changes in mRNA levels under these conditions (Figure S5E-F), suggesting that the regulation of IRF5 in the context of disturbed flow or atherosclerosis is primarily post-translational.

      (7) With the efficacy of using AAV-ICAM2-AFF3ir-ORF2 in atherosclerosis reduction (Figure 6), the authors are encouraged to use lung ECs isolated from the AFF3ir-ORF2/-mice to recapitulate its regulation of IRF5.

      We greatly appreciate your valuable suggestion to use lung ECs from mice. We have observed that AFF3ir-ORF2 deficiency enhanced the nuclear translocation of IRF5 induced by OSS. Noteworthy, the transcriptional levels of IRF5 were minimally affected by AFF3ir-ORF2 deficiency. Hence, to recapitulate the regulation of IRF5 with lung ECs isolated from the AFF3ir-ORF2<sup>-/-</sup> mice, it would require treating lung ECs with OSS followed by isolation of subcellular components. However, both in vitro shear stress treatment and subcellular fraction isolation require a large number of cells, and mouse lung ECs are difficult to culture and pass through several passages. Therefore, we hope the reviewer understands that these experiments were not performed. As an alternative, we have confirmed the transcriptional activity changes of IRF5 due to AFF3ir-ORF2 manipulation by analyzing the expression of its target genes indicated from RNA-seq results in both the intima of mouse aorta (Figure S5C-D) and HUVECs (Figure 2C-D and Figure S2B). Our findings show that AFF3ir-ORF2 deficiency increases, while its overexpression decreases, the expression levels of IRF5-targeted genes in endothelial cells.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Figure 2H - As I understand it, this is MFI measurement of VCAM. Please change accordingly.

      Thanks. Corrected.

      Reviewer #2 (Recommendations for the authors):

      My major concern is the use of MEFs for all in vitro experiments. All experiments should be done in endothelial cells if the aim is to show a mechanism relevant to endothelial activation and atherosclerosis. Lines 314-316 of the conclusion are absolutely not supported by the data.

      Thank you for the insightful comment. Following your advice, we have employed human ECs (HUVECs) to confirm our findings. Consistent with the findings in MEFs, we found that AFF3ir-ORF2 decreased the expression of inflammatory genes induced by disturbed shear stress, both at protein and mRNA levels in HUVECs (Figure 2C, Figure S2B). 

      Since the in vivo experiments are not cell type-specific, it would be important to test and compare the expression of Aff3ir-ORF2 in endothelial cells as well as smooth muscle and macrophages to support any claim of cell type involvement in the effects observed.

      We thank you for the valuable suggestion. In the revised manuscript, we have followed your suggestion and analyzed the expression pattern of AFF3ir-ORF2 in different regions of the aorta with or without endothelium. We observed a marked reduction in AFF3ir-ORF2 expression in the intima of the aortic arch compared to that in the intima of the thoracic aorta (Figure 1B-C). In contrast, the expression of AFF3irORF2 in the media and adventitia was comparable between the aortic arch and thoracic aorta (Figure S1A-B). These findings provide further evidence supporting the predominant role of endothelial cells. The description has been modified accordingly in the revised manuscript on Lines 121-134.

      The results of the RNA-seq experiment should be disclosed. The experiment should be deposited on GEO or similar and a table of differentially expressed genes added to the manuscript.

      Thank you for the suggestion. We have followed your advice and submitted the RNA-sequencing data to GEO datasets (GSE286206). Besides, a table of differentially expressed genes has been included in the revised manuscript as Table S3.

      Minor comments:

      (1) Figure 1A. Missing the labels of the target.

      Thanks. Corrected. 

      (2) Figure 1D. Cell alignment in AA compared to TA suggests that the image is of the outer curvature, but Figure 1F is showing that the outer curvature is expressing more ORF2 than the inner. Why was the outer curvature chosen for this panel and is it true to conclude on that assumption that expression of ORF2 compares as TA > Outer > Inner curvature?

      We thank you for the insightful suggestion. We have followed your advice and performed en-face immunofluorescence staining of AFF3ir-ORF2 and quantification of AFF3ir-ORF2 expression in AA inner, AA outer, and TA regions. As shown in new Figure 1D-E, the results indeed indicated that expression of AFF3irORF2 compares as TA > AA outer > AA inner.

      (3) Figure 2H. Target mislabelled as ICAM-1 instead of VCAM-.

      Thanks. Corrected. 

      (4) Figure S1A. VE-cad staining and cell shape differ between control and overexpression. Is this a phenotype or are different areas of the vasculature shown, which would make it hard to interpret since Aff3ir-ORF2 levels differ in different vessel areas?

      We thank the reviewer for raising this important question. For Figure S1A, only common carotid arteries were used for the staining. The potential differences in cell shape observed might be due to variations in the procedure during immunofluorescence staining. To avoid any misinterpretation, more representative images have been provided in the revised Figure S2C.

      (5) Figure 3D-G. Images are not representative of the quantification results.

      Thank you. More representative images have been replaced in the revised Figure 3D and Figure 3F.

      (6) Line 220. Data for IRF8 are not shown in the figure to support this claim.

      Thank you for pointing this out. The expression level of IRF8 has been included in Figure S5C.

      (7) Figure 6F. AAV-AFF3ir-ORF2 panel order inverted.

      Thanks. Corrected. 

      (8) Line 401. Type "hat" instead of "h at".

      Sorry for the typo. Corrected.

      Reviewer #3 (Recommendations for the authors):

      Minor comments:

      (1)  The rationale for the following sentence (lines 126-128) is lacking: "Moreover, 126 we observed the expression of AFF3ir-ORF2 in longitudinal sections of the mouse aorta (B. 127 Li et al., 2019)".

      Thanks. The rationale for these experiments have been included in the revised manuscript on Line 127-129. 

      (2) The source of antibodies against AFF3ir-ORF1 and AFF3ir-ORF2 used in western blot and immunostaining experiments were not mentioned in the manuscript.

      Thanks. The antibody information has been included in the method part on Line 456-457, 510-511. 

      (3) The rationale and data interpretation is not clear for the following sentence (lines 220-221): "In addition, neither IRF5 nor IRF8 expression was regulated by AFF3irORF2 220 (Figure 4F)".

      Thank you for pointing this out. The expression level of IRF8 has been included in Figure S5C. The sentence has been modified accordingly on Lines 253254. 

      (4) The quality of AFF3ir-ORF2 blot in Figure 4I needs improvement.

      Thanks. More representative images have been included in Figure 4I.

      (5) It appears that AFF3ir-ORF2 was present in both cytoplasm and nucleus. Does AFF3ir-ORF2 have a nuclear entry peptide? Also, the nuclear entry of AFF3ir-ORF2 can be enhanced by an immunofluorescence staining experiment.

      Thank you for your insightful comments. Indeed, although we did not observe any significant subcellular changes in the localization of AFF3ir-ORF2 under shear stress conditions, our immunostaining results revealed that AFF3ir-ORF2 is localized in both the cytoplasm and nucleus. To explore whether AFF3ir-ORF2 contains nuclear localization signals, we utilized the NLStradamus tool (http://www.moseslab.csb.utoronto.ca/NLStradamus/) to analyze its sequence. The predication indicated that AFF3ir-ORF2 lacks a nuclear localization signal.

    1. Author response:

      Reviewer 1: “The authors over-emphasized this study's relevance to RP disease (i.e. patients and mammals are not capable of regeneration like zebrafish).”

      It is true that humans and other mammals are not capable of regeneration.  This is why we and many other groups study zebrafish to identify mechanisms of regeneration that successfully form new rods.  That said, our previous paper on the molecular basis or retinal remodeling in this zebrafish model system (Santhanam et al., 2023; Cell Mol Life Sci. 2023;80(12):362) revealed remarkable similarities in the stress and physiological responses of rods, cones, RPE and inner retinal neurons to those in mammalian RP models.  Thus, we believe this zebrafish is an adequate model of RP and an excellent model to study rod regeneration. 

      Reviewer 1: “They under-explained this regeneration's relevance or difference to normal developmental process, which is pretty much conserved in evolution.”  and:

      Reviewer 3: “It would also benefit from integration with single-cell multiome data from developing retinas (Lyu, et al. 2023).”

      It is an excellent suggestion to compare the regenerative response we have studied in a chronic degeneration/regeneration model to the trajectory of developmental rod formation. In Lyu, et at. 2023, it was found that while retinal regeneration has similarities to retinal development, it does not precisely recapitulate the same transcription factors and processes. Any differences between this trajectory and that revealed in developmental studies would be enlightening.  We intend to do such analyses to add to a revised manuscript in the future. 

      Reviewer 2: “Perhaps the authors can consider explaining why the Prdm1a knock-down cells would have a higher Retp1 signal per cell in Fig 9B. Is this a representative picture? This appears to contradict Figure 8's conclusion, although I could tell that the number of Retp1+ cells in the ONL appears to be lower.”

      These are different experimental paradigms.  Figure 8 shows knockdown 48 hours after injection, at which time prdm1a knockdown is affecting rhodopsin expression directly.  That experiment investigated whether prdm1a knockdown affected progenitor proliferation.  Figure 9 shows a time point 6 days after injection, at which time we were asking if prdm1a knockdown affected differentiation of progenitors into rods. 

      Reviewer 2: “The authors noted "Surprisingly, the knockdown of prdm1a resulted in a significantly higher number of rhodopsin-positive cells in the INL (p=0.0293)", while it appears in Figure 9B, 9C that the difference is 2 cells vs 0 in a rightly broader field. It seems to be too strong of a statement for this effect.”

      This was a very unexpected finding.  We included statistics (Figure 9D) to support the finding, so we don’t think it is too strong a statement to make.  Speculation as to what might cause this is fascinating.  Are Muller cells producing progenitors that fail to migrate to the ONL before differentiating into rods?  The lack of BrdU labeling does not support this idea.  Do neurogenic progenitor cells in the INL differentiate towards rods via a pathway that does not require prdm1a?  Perhaps.  Perhaps there are other explanations.

      Reviewer 2: “It appears to this reviewer that the proteomic data didn't reveal much in line with the overall hypothesis or the mechanism, and it's unclear why the authors went for proteomics rather than bulk RNA-seq or ChIP-seq for a transcription factor knock-down experiment. Overall this is a minor point.”

      We agree that bulk RNA sequencing would provide a similar answer, possibly with greater sensitivity.  We chose proteomics for two reasons: 1) We wanted an independent assessment of the knockdown effects that could evaluate whether the knockdowns worked and what pathways were affected.  Since our pathway comparison is to single cell RNAseq data, bulk RNA seq did not seem to be fully independent. 2) Because we used translation-blocking antisense oligos for most knockdown experiments, we did not expect the transcript abundance of the targeted gene to be affected, although these oligos can lead to target transcript degradation.  Thus, we were not likely to be able to validate that our knockdown worked with this technique. 

      Reviewer 3: “The gene regulatory network analysis here would also benefit from the addition of matched scATAC-Seq data, …”

      This is certainly true, and the reviewer points to several studies that have made excellent use of this strategy.  Given the 1-2 year timeline to obtain and analyze such data, it is unlikely that we will be able to incorporate such data in our revised manuscript, but we hope to do so for follow-up studies.

      Reviewer 3: “The description of the time points analyzed is vague, stating only that "fish from 6 to 12 months of age were analyzed". Since photoreceptor degeneration is progressive, it is unclear how progenitor behavior changes over time, or how the gene expression profile of other cell types such as microglia, cones, or surviving rods is altered by disease progression.”

      We have shown in a previous study (Santhanam et al. Cells. 2020;9(10)) that rod degeneration and regeneration are in a steady state from at least 4 to 8 months of age, and in other experiments in the lab at least to 12 months of age.  In this age range, regeneration keeps up with the pace of degeneration, both of which are very fast.  This encompasses the cell types that we specifically study in this manuscript.  The reviewer is right that other cell types could undergo changes.  This is a separate topic of study in the lab.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The objective of this research is to understand how the expression of key selector transcription factors, Tal1, Gata2, Gata3, involved in GABAergic vs glutamatergic neuron fate from a single anterior hindbrain progenitor domain is transcriptionally controlled. With suitable scRNAseq, scATAC-seq, CUT&TAG, and footprinting datasets, the authors use an extensive set of computational approaches to identify putative regulatory elements and upstream transcription factors that may control selector TF expression. This data-rich study will be a valuable resource for future hypothesis testing, through perturbation approaches, of the many putative regulators identified in the study. The data are displayed in some of the main and supplemental figures in a way that makes it difficult to appreciate and understand the authors' presentation and interpretation of the data in the Results narrative. Primary images used for studying the timing and coexpression of putative upstream regulators, Insm1, E2f1, Ebf1, and Tead2 with Tal1 are difficult to interpret and do not convincingly support the authors' conclusions. There appears to be little overlap in the fluorescent labeling, and it is not clear whether the signals are located in the cell soma nucleus.

      Strengths:

      The main strength is that it is a data-rich compilation of putative upstream regulators of selector TFs that control GABAergic vs glutamatergic neuron fates in the brainstem. This resource now enables future perturbation-based hypothesis testing of the gene regulatory networks that help to build brain circuitry.

      We thank Reviewer #1 for the thoughtful assessment and recognition of the extensive datasets and computational approaches employed in our study. We appreciate the acknowledgment that our efforts in compiling data-rich resources for identifying putative regulators of key selector transcription factors (TFs)—Tal1, Gata2, and Gata3—are valuable for future hypothesis-driven research.

      Weaknesses:

      Some of the findings could be better displayed and discussed.

      We acknowledge the concerns raised regarding the clarity and interpretability of certain figures, particularly those related to expression analyses of candidate upstream regulators such as Insm1, E2f1, Ebf1, and Tead2 in relation to Tal1. We agree that clearer visualization and improved annotation of fluorescence signals are crucial to accurately support our conclusions. In our revised manuscript, we will enhance image clarity and clearly indicate sites of co-expression for Tal1 and its putative regulators, ensuring the results are more readily interpretable. Additionally, we will expand explanatory narratives within the figure legends to better align the figures with the results section.

      Reviewer #2 (Public review):

      Summary:

      In the manuscript, the authors seek to discover putative gene regulatory interactions underlying the lineage bifurcation process of neural progenitor cells in the embryonic mouse anterior brainstem into GABAergic and glutamatergic neuronal subtypes. The authors analyze single-cell RNA-seq and single-cell ATAC-seq datasets derived from the ventral rhombomere 1 of embryonic mouse brainstems to annotate cell types and make predictions or where TFs bind upstream and downstream of the effector TFs using computational methods. They add data on the genomic distributions of some of the key transcription factors and layer these onto the single-cell data to get a sense of the transcriptional dynamics.

      Strengths:

      The authors use a well-defined fate decision point from brainstem progenitors that can make two very different kinds of neurons. They already know the key TFs for selecting the neuronal type from genetic studies, so they focus their gene regulatory analysis squarely on the mechanisms that are immediately upstream and downstream of these key factors. The authors use a combination of single-cell and bulk sequencing data, prediction and validation, and computation.

      We also appreciate the thoughtful comments from Reviewer #2, highlighting the strengths of our approach in elucidating gene regulatory interactions that govern neuronal fate decisions in the embryonic mouse brainstem. We are pleased that our focus on a critical cell-fate decision point and the integration of diverse data modalities, combined with computational analyses, has been recognized as a key strength.

      Weaknesses:

      The study generates a lot of data about transcription factor binding sites, both predicted and validated, but the data are substantially descriptive. It remains challenging to understand how the integration of all these different TFs works together to switch terminal programs on and off.

      Reviewer #2 correctly points out that while our study provides extensive data on predicted and validated transcription factor binding sites, clearly illustrating how these factors collectively interact to regulate terminal neuronal differentiation programs remains challenging. We acknowledge the inherently descriptive nature of the current interpretation of our combined datasets.

      In our revision, we will clarify how the different data types support and corroborate one another, highlighting what we consider the most reliable observations of TF activity. Additionally, we will revise the discussion to address the challenges associated with interpreting the highly complex networks of interactions within the gene regulatory landscape.

      We sincerely thank both reviewers for their constructive feedback, which we believe will significantly enhance the quality and accessibility of our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors demonstrate impairments induced by a high cholesterol diet on GLP-1R dependent glucoregulation in vivo as well as an improvement after reduction in cholesterol synthesis with simvastatin in pancreatic islets. They also map sites of cholesterol high occupancy and residence time on active versus inactive GLP-1Rs using coarse-grained molecular dynamics (cgMD) simulations and screened for key residues selected from these sites and performed detailed analyses of the effects of mutating one of these residues, Val229, to alanine on GLP-1R interactions with cholesterol, plasma membrane behaviour, clustering, trafficking and signalling in pancreatic beta cells and primary islets, and describe an improved insulin secretion profile for the V229A mutant receptor.

      These are extensive and very impressive studies indeed. I am impressed with the tireless effort exerted to understand the details of molecular mechanisms involved in the effects of cholesterol for GLP-1 activation of its receptor. In general, the study is convincing, the manuscript well written and the data well presented.

      Some of the changes are small and insignificant which makes one wonder how important the observations are. For instance, in figure 2 E (which is difficult to interpret anyway because the data are presented in percent, conveniently hiding the absolute results) does not show a significant result of the cyclodextrin except for insignificant increases in basal secretion. That is not identical to impairment of GLP-1 receptor signaling!

      We assume that the reviewer refers to Figure 1E, where we show the percentage of insulin secretion in response to 11 mM glucose +/- exendin-4 stimulation in mouse islets pretreated with vehicle or MβCD loaded with 20 mM cholesterol. While we concur with the reviewer that the effect in this case is triggered by increased basal insulin secretion at 11 mM glucose, exendin-4 appears to no longer compensate for this increase by proportionally amplifying insulin responses in cholesterol-loaded islets, leading to a significantly decreased exendin-4induced insulin secretion fold increase under these circumstances, as shown in Figure 1F. We interpret these results as a defect in the GLP-1R capacity to amplify insulin secretion beyond the basal level to the same extent as in vehicle conditions. An alternative explanation is that there is a maximum level of insulin secretion in our cells, and 11 mM glucose + exendin-4 stimulation gets close to that value. With the increasing effect of cholesterol-loaded MβCD on basal secretion at 11 mM glucose, exendin-4 stimulation would then appear to work less well.

      We have performed a simple experiment to investigate this possibility: insulin secretion following stimulation with a secretagogue cocktail (20 mM glucose, 30 mM KCl, 10 µM FSK and 100 µM IBMX) in islets +/- MβCD/cholesterol loading to determine if maximal stimulation had been reached or not in our original experiment. This experiment, now included in Supplementary Figure 1C, demonstrates that insulin secretion can increase up to ~4% (from ~2%) in our islets, supporting our initial conclusion. We have also included absolute insulin concentrations as well as percentages of secretion for all the experiments included in the study in the new Supplementary File 1 to improve the completeness of the report.

      To me the most important experiment of them all is the simvastatin experiment, but the results rest on very few numbers and there is a large variation. Apparently, in a previous study using more extensive reduction in cholesterol the opposite response was detected casting doubt on the significance of the current observation. I agree with the authors that the use of cyclodextrin may have been associated with other changes in plasma membrane structure than cholesterol depletion at the GLP-1 receptor.

      We agree with the reviewer that the insulin secretion results in vehicle versus LPDS/simvastatin treated mouse islets (Figure 1H, I) are relatively variable. We have therefore performed 2 extra biological repeats of this experiment (for a total n of 7). Results now show a significant increase in exendin-4-stimulated secretion with no change in basal secretion in islets pre-incubated with LPDS/simvastatin.  

      The entire discussion regarding the importance of cholesterol would benefit tremendously from studies of GLP-1 induced insulin secretion in people with different cholesterol levels before and after treatment with cholesterol-lowering agents. I suspect that such a study would not reveal major differences.

      We agree with the reviewer that such study would be highly relevant. While this falls outside the scope of the present paper, we encourage other researchers with access to clinical data on GLP-1R agonist responses in individuals taking cholesterol lowering agents to share their results with the scientific community. We have highlighted this point in the paper discussion to emphasise the importance of more research in this area.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript the authors provided a proof of concept that they can identify and mutate a cholesterol-binding site of a high-interest class B receptor, the GLP-1R, and functionally characterize the impact of this mutation on receptor behavior in the membrane and downstream signaling with the intent that similar methods can be useful to optimize small molecules that as ligands or allosteric modulators of GLP-1R can improve the therapeutic tools targeting this signaling system.

      Strengths:

      The majority of results on receptor behavior are elucidated in INS-1 cells expressing the wt or mutant GLP-1R, with one experiment translating the findings to primary mouse beta-cells. I think this paper lays a very strong foundation to characterize this mutation and does a good job discussing how complex cholesterol-receptor interactions can be (ie lower cholesterol binding to V229A GLP-1R, yet increased segregation to lipid rafts). Table 1 and Figure 9 are very beneficial to summarize the findings. The lower interaction with cholesterol and lower membrane diffusion in V229A GLP-1R resembles the reduced diffusion of wt GLP-1R with simv-induced cholesterol reductions, although by presumably decreasing the cholesterol available to interact with wt GLP-1R. This could be interesting to see if lowering cholesterol alters other behaviors of wt GLP-1R that look similar to V229A GLP-1R. I further wonder if the authors expect that increased cholesterol content of islets (with loading of MβCD saturated with cholesterol or high-cholesterol diets) would elevate baseline GLP-1R membrane diffusion, and if a more broad relationship can be drawn between GLP-1R membrane movement and downstream signaling.

      Membrane diffusion experiments are difficult to perform in intact islets as our method requires cell monolayers for RICS analysis. We however agree that it is of interest to investigate if cholesterol loading affects GLP-1R diffusion. To this end, we have performed further RICS analysis in INS-1 832/3 SNAP/FLAG-hGLP-1R cells pretreated with vehicle or MβCD loaded with 20 mM cholesterol (new Supplementary Figures 1D and 1E). Interestingly, results show significantly increased plasma membrane diffusion of exendin-4-stimulated receptors, with no change in basal diffusion, following MβCD/cholesterol loading. This behaviour differs from that of the V229A mutant receptor which shows reduced diffusion under basal conditions, a pattern that mimics that of the WT receptor under low cholesterol conditions (by pre-treatment with LPDS/simvastatin).

      Weaknesses:

      I think there are no obvious weaknesses in this manuscript and overall, I believe the authors achieved their aims and have demonstrated the importance of cholesterol interactions on GLP-1R functioning in beta-cells. I think this paper will be of interest to many physiologists who may not be familiar with many of the techniques used in this paper and the authors largely do a good job explaining the goals of using each method in the results section.

      The intent of some methods, for example the Laurdan probe studies, are better expanded in the discussion.

      We have expanded on the rationale behind the use of Laurdan to assess behaviours of lipid packed membrane nanodomains in the methods, results and discussion of the revised manuscript.

      I found it unclear what exactly was being measured to assess 'receptor activity' in Fig 7E and F.

      Figures 7E and F refer to bystander complementation assays measuring the recruitment of nanobody 37 (Nb37)-SmBiT, which binds to active Gas, to either the plasma membrane (labelled with KRAS CAAX motif-LgBiT), or to endosomes (labelled with Endofin FYVE domain-LgBiT) in response to GLP-1R stimulation with exendin-4. This assay therefore measures GLP-1R activation specifically at each of these two subcellular locations. We have included a schematic of this assay in the new Supplementary Figure 3 to clarify the aim of these experiments.

      Certainly many follow-up experiments are possible from these initial findings and of primary interest is how this mutation affects insulin homeostasis in vivo under different physiological conditions. One of the biggest pathologies in insulin homeostasis in obesity/t2d is an elevation of baseline insulin release (as modeled in Fig 1E) that renders the fold-change in glucose stimulated insulin levels lower and physiologically less effective. No difference in primary mouse islet baseline insulin secretion was seen here but I wonder if this mutation would ameliorate diet-induced baseline hyperinsulinemia.

      We concur with the reviewer that it would be interesting to determine the effects of the GLP1R V229A mutation on insulin secretion responses under diet-induced metabolic stress conditions. While performing in vivo experiments on glucoregulation in mice harbouring the V229A mutation falls outside the scope of the present study, we have included ex vivo insulin secretion experiments in islets from GLP-1R KO mice transduced with adenoviruses expressing SNAP/FLAG-hGLP-1R WT or V229A and subsequently treated with vehicle versus MβCD loaded with 20 mM cholesterol to replicate the conditions of Figure 1E in the new Supplementary Figure 4.

      I would have liked to see the actual islet cholesterol content after 5wks high-cholesterol diet measured to correlate increased cholesterol load with diminished glucose-stimulated inulin. While not necessary for this paper, a comparison of islet cholesterol content after this cholesterol diet vs the more typical 60% HFD used in obesity research would be beneficial for GLP-1 physiology research broadly to take these findings into consideration with model choice.

      We have included these data in Supplementary Figure 1A.

      Another area to further investigate is does this mutation alter ex4 interaction/affinity/time of binding to GLP-1 or are all of the described findings due to changes in behavior and function of the receptor?

      To answer this question, have performed binding affinity experiments, which show no differences, in INS-1 832/3 SNAP/FLAG-hGLP-1R WT versus V229A cells (new Supplementary Figure 2D).

      Lastly, I wonder if V229A would have the same impact in a different cell type, especially in neurons? How similar are the cholesterol profiles of beta-cells and neurons? How this mutation (and future developed small molecules) may affect satiation, gut motility, and especially nausea, are of high translational interest. The comparison is drawn in the discussion between this mutation and ex4-phe1 to have biased agonism towards Gs over beta-arrestin signaling. Ex4-phe1 lowered pica behavior (a proxy for nausea) in the authors previously co-authored paper on ex4-phe1 (PMID 29686402) and I think drawing a parallel for this mutation or modification of cholesterol binding to potentially mitigate nausea is worth highlighting.

      While experiments in neurons are outside the scope of the present study, we have added this worthy point to the discussion and hypothesise on possible effects of GLP-1R mutants with modified cholesterol interactions on central GLP-1R actions in the revised manuscript.

      Reviewer #1 (Recommendations for the authors):

      There are no line numbers

      These have now been added.

      Abstract: "Cholesterol is a plasma membrane enriched lipid" - sorry for being finicky, but shouldn't this read; "a lipid often enriched in plasma membranes"

      We have modified the abstract to state that: “Cholesterol is a lipid enriched at the plasma membrane”.

      p. 4 "Moreover, islets extracted from high cholesterol-fed mice". How do you "extract islets"?

      We have exchanged the term “extracted” by “isolated”. Islet isolation is described in the paper methods section.

      p. 4 The sentence "These effects were accompanied by decreased GLP-1R plasma membrane diffusion under vehicle conditions, measured by Raster Image Correlation Spectroscopy (RICS) in rat insulinoma INS-1 832/3 cells with endogenous GLP-1R deleted [INS-1 832/3 GLP-1R KO cells (27)] stably expressing SNAP/FLAG-tagged human GLP-1R (SNAP/FLAG-hGLP-1R), an effect that is normally triggered by agonist binding (28), as also observed here (Supplementary Figure 1C, D)" is a masterpiece of complexity. Perhaps breaking up would facilitate reading?

      This paragraph has now been modified in the revised manuscript.

      p. 5. I cannot evaluate the "coarse grain molecular dynamics" studies.

      Reviewer #2 (Recommendations for the authors):

      I view this as an excellent manuscript with very comprehensive work and clear translational relevance. I don't think any further experiments are needed for the scope outlined in this manuscript. The discussion is already long but a short postulation on how this may translate to GLP-1R-cholesterol interactions in other cell types, specifically neurons with the intent on manipulating satiation and nausea, could be worthwhile.

      This has now been added.

      The only thing for readability I would suggest is a sentence in the results mentioning why you're doing the Laurdan analysis, and what is the output for assessing 'receptor activity' in the membrane and endosomes.

      Both points have now been added.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The authors examine CD8 T cell selective pressure in early HCV infection using. They propose that after initial CD8-T mediated loss of virus fitness, in some participants around 3 months after infection, HCV acquires compensatory mutations and improved fitness leading to virus progression.

      Strengths:

      Throughout the paper, the authors apply well-established approaches in studies of acute to chronic HIV infection for studies of HCV infection. This lends rigor the to the authors' work.

      Weaknesses:

      (1) The Discussion could be strengthened by a direct discussion of the parallels/differences in results between HIV and HCV infections in terms of T cell selection, entropy, and fitness.

      We have added a direct discussion of the parallels/differences between HIV and HCV throughout the discussion including at lines 308 – 310 and 315 -327.

      Lines 308-310: “In fact, many parallels can be drawn between HIV infections and HCV infections in the context of emerging viral species that escape T cell immune responses.”

      Lines: 315-327: “One major difference between HCV and HIV infection is the event where patients infected with HCV have an approximately 25% chance to naturally clear the infection as opposed to just achieving viral control in HIV infections. Here, we probed the underlying mechanism, and questioned how the host immune response and HCV mutational landscape can allow the virus to escape the immune system. To understand this process, taking inspiration from HIV studies (24), a quantitative analysis of viral fitness relative to viral haplotypes was conducted using longitudinal samples to investigate whether a similar phenomenon was identified in HCV infections for our cohort for patients who progress to chronic infection. We observed a decrease in population average relative fitness in the period of <90DPI with respect to the T/F virus in chronic subjects infected with HCV. The decrease in fitness correlated positively with IFN-γ ELISPOT responses and negatively with SE indicating that CD8+ T-cell responses drove the rapid emergence of immune escape variants, which initially reduced viral fitness. This is similarly reflected in HIV infected patients where strong CD8+ T-cell responses drove quicker emergence of immune escape variants, often accompanied by compensatory mutations (24).”

      (2) In the Results, please describe the Barton model functionality and why the fitness landscape model was most applicable for studies of HCV viral diversity.

      This has been added to the introduction section rather than Results as we feel that it is more appropriate to show why it is most applicable to HCV viral diversity in the background section of the manuscript. We write at lines 77-90:

      “Barton et al.’s [23] approach to understand HIV mutational landscape resulting in immune escape had two fundamental points: 1) replicative fitness depends on the virus sequence and the requirement to consider the effect of co-occurring mutations, and 2) evolutionary dynamics (e.g. host immune pressure). Together they pave the way to predict the mutational space in which viral strains can change given the unique immune pressure exerted by individuals infected with HIV. This model fits well with the pathology of HCV infection. For instance, HIV and HCV are both RNA viruses with rapid rate of mutation. Additionally, like HIV, chronic infection is an outcome for HCV infected individuals, however, unlike HIV, there is a 25% probability that individuals infected with HCV will naturally clear the virus. Previously published studies [9] have shown that HIV also goes through a genetic bottleneck which results in the T/F virus losing dominance and replaced by a chronic subtype, identified by the immune escape mutations. The concepts in Barton’s model and its functionality to assess the fitness based on the complex interaction between viral sequence composition and host immune response is also applicable to early HCV infection.”

      (3) Recognize the caveats of the HCV mapping data presented.

      We have now recognized the caveats of the HCV mapping data at lines 354-256 “While our findings here are promising, it should be recognized that although the bioinformatics tool (iedb_tool.py) proved useful for identifying potential epitopes, there could be epitopes that are not predicted or false-positive from the output which could lead to missing real epitopes”

      (4) The authors should provide more data or cite publications to support the authors' statement that HCV-specific CD8 T cell responses decline following infection.

      We have now clarified at lines 352-353 that the decline was toward “selected epitopes that showed evidence of escape”.

      Furthermore, we have cited two publications at line 352 that support our statement.

      (5) Similarly, as the authors' measurements of HCV T and humoral responses were not exhaustive, the text describing the decline of T cells with the onset of humoral immunity needs caveats or more rigorous discussion with citations (Discussion lines 319-321).

      We have now added a caveat in the discussion at lines 357-360 which reads

      “In conclusion, this study provides initial insights into the evolutionary dynamics of HCV, showing that an early, robust CD8+ T-cell response without nAbs strongly selects against the T/F virus, enabling it to escape and establish chronic infection. However, these findings are preliminary and not exhaustive, warranting further investigation to fully understand these dynamics. “

      (6) What role does antigen drive play in these data -for both T can and antibody induction?

      It is possible that HLA-adapted mutations could limit CD8 T cell induction if the HLAs were matched between transmission pairs, as has been shown previously for HIV (https://doi.org/10.1371/journal.ppat.1008177) with some data for HCV (https://journals.asm.org/doi/10.1128/jvi.00912-06). However, we apologise as we are not entirely sure that this is what the reviewer is asking for in this instance.

      (7) Figure 3 - are the X and Y axes wrongly labelled? The Divergent ranges of population fitness do not make sense.

      Our apologies, there was an error with the plot in Figure 3 and the X and Y axis were wrongly labelled. This has now been resolved.

      (8) Figure S3 - is the green line, average virus fitness?

      This has now been clarified in Figure S3.

      (9) Use the term antibody epitopes, not B cell epitopes.

      We now use the term antibody epitopes throughout the manuscript.

      Reviewer #1 (Recommendations for the authors):

      Recommendations for improving the writing and presentation:

      (1) Introduction:

      Line 52: 'carry mutations B/T cell epitopes'. Two points

      i) These are antibody epitopes (and antibody selection) not B cell epitopes

      We have corrected this sentence at line 55 which now reads: “carry mutations within epitopes targeted by B cells and CD8+ T cells”.

      ii) To avoid confusion, add text that mutations were generated following selection in the donor.

      For HCV, it is unclear if mutations are generated following selection or have been occurring in low frequencies outside detection range. Only when selection by host immune pressure arises do the potentially low-frequency variants become dominant. However, we do acknowledge it is potentially misleading to only mention new variants replacing the transmitted/founder population. We have modified the sentence at line 52 to read:

      “At this stage either an existing variant that was occurring in low-frequency outside detection range or an existing variant with novel mutations generated following immune selection is observed in those who progress to chronic infection”

      - Lines 51-56: Human studies of escape and progression are associative, not causative as implied.

      Correct, evidence suggesting that escape and progression are currently associative. We have now corrected these lines to no longer suggest causation.

      - Line 65: Suggest you clarify your meaning of 'easier'?

      This sentence, now at line 72, has been modified to: “subtype 1b viruses have a higher probability to evade immune responses”

      (2) Results:

      - Line 147: Barton model (ref'd in Intro) is directly referred to here but not referenced.

      The reference has been added.

      - The authors should cite previous HIV literature describing associations between the rate of escape and Shannon Entropy e.g. the interaction between immunodominance, entropy, and rate of escape in acute HIV infection was described in Liu et al JCI 2013 but is not cited.

      We have now cited previous HIV research at line 147-151, adding Liu et al:

      “Additionally, the interaction between immunodominance, entropy, and escape rate in acute HIV infection has been described, where immunodominance during acute infection was the most significant factor influencing CD8+ T cell pressure, with higher immunodominance linked to faster escape (27). In contrast, lower epitope entropy slowed escape, and together, immunodominance and entropy explained half of the variability in escape timing (27).”

      - Line 319: The authors suggest that HCV-specific CD8 T cell response declines following early infection. On what are they basing this statement? The authors show their measured T cell responses decline but their approach uses selected epitopes and they are therefore unable to assess total HCV T cell response in participants (Where there is no escape, are T cell magnitudes maintained or do they still decline?). Can the authors cite other studies to support their statement?

      We have now clarified that the decline was toward “selected epitopes that showed evidence of escape”. Furthermore, we also cite two studies to support our findings.

      - Throughout the authors talk in terms of CD8 T cells but the ELISpot detects both CD4 and CD8 T cell responses. I suggest the authors be more explicit that their peptide design (9-10mers) is strongly biased to only the detection of CD8 T cells.

      To make this clearer and more explicit we have now added to the methods section at line 433-435:

      “While the ELISpot assay detects responses from both CD4 and CD8 T cells, our peptide design (9-10mers) is strongly biased toward CD8 T-cell detection. We have therefore interpreted ELISpot responses primarily in terms of CD8 T-cell activity.”

      - The points made in lines 307-321 could be more succinct

      We have now edited the discussion (lines 307 – 321) to make the points more succinct (now lines 307-323).

      Minor corrections to text, figures:

      - Figure 2: suggest making the Key bigger and more obvious.

      We have now made the key bigger and more obvious

      - Figure 3 A & D....is there an error on the X-axis...are you really reporting ELISpot data of < 1 spot/10^6? Perhaps the X and Y axes are wrongly labelled?

      Our apologies, there was an error with the plot in Figure 3 and the X and Y axis were wrongly labelled. This has now been resolved.

      - Figure 5: As this is PBMC, remove CD8 from the description of ELISpot. 

      We have now removed CD8 from the description of ELISpot in both Figure 5 and Figure S3

      Reviewer #2 (Public review):

      Summary:

      In this work, Walker and collaborators study the evolution of hepatitis C virus (HCV) in a cohort of 14 subjects with recent HCV infections. They focus in particular on the interplay between HCV and the immune system, including the accumulation of mutations in CD8+ T cell epitopes to evade immunity. Using a computational method to estimate the fitness effects of HCV mutations, they find that viral fitness declines as the virus mutates to escape T-cell responses. In long-term infections, they found that viral fitness can rebound later in infection as HCV accumulates additional mutations.

      Strengths:

      This work is especially interesting for several reasons. Individuals who developed chronic infections were followed over fairly long times and, in most cases, samples of the viral population were obtained frequently. At the same time, the authors also measured CD8+ T cell and antibody responses to infection. The analysis of HCV evolution focused not only on variation within particular CD8+ T cell epitopes but also on the surrounding proteins. Overall, this work is notable for integrating information about HCV sequence evolution, host immune responses, and computational metrics of fitness and sequence variation. The evidence presented by the authors supports the main conclusions of the paper described above.

      Weaknesses:

      One notable weakness of the present version of the manuscript is a lack of clarity in the description of the method of fitness estimation. In the previous studies of HIV and HCV cited by the authors, fitness models were derived by fitting the model (equation between lines 435 and 436) to viral sequence data collected from many different individuals. In the section "Estimating survival fitness of viral variants," it is not entirely clear if Walker and collaborators have used the same approach (i.e., fitting the model to viral sequences from many individuals), or whether they have used the sequence data from each individual to produce models that are specific to each subject. If it is the former, then the authors should describe where these sequences were obtained and the statistics of the data.

      If the fitness models were inferred based on the data from each subject, then more explanation is needed. In prior work, the use of these models to estimate fitness was justified by arguing that sequence variants common to many individuals are likely to be well-tolerated by the virus, while ones that are rare are likely to have high fitness costs. This justification is less clear for sequence variation within a single individual, where the viral population has had much less time to "explore" the sequence landscape. Nonetheless, there is precedent for this kind of analysis (see, e.g., Asti et al., PLoS Comput Biol 2016). If the authors took this approach, then this point should be discussed clearly and contrasted with the prior HIV and HCV studies.

      We thank the reviewer for pointing out the weakness in our explanation and description of the fitness model. The model has been generated using publicly released viral sequences and this has been described in a previous publication by Hart et al. 2015. T/F virus from each of the subjects chronically infected with HCV in our cohort were given to the model by Hart et al. to estimate the initial viral fitness of the T/F variant. Subsequent time points of each subject containing the subvariants of the viral population were also estimated using the same model (each subtype). For each subject, these subvariant viral fitness values were divided by the fitness value of the initial T/F virus (hence relative fitness of the earliest time points with no mutations in the epitope regions were a value of 1.000). All other fitness values are therefore relative fitness to the T/F variant.

      We have further clarified this point in the methods section “Estimating survival fitness of viral variant” to better describe how the data of the model was sourced (Lines 465-499).

      To add to the reviewer’s point, we agree that sequence variants common to many individuals are likely to be well-tolerated by the virus and this event was observed in our findings as our data suggested that immune escape variants tended to revert to variants that were closer the global consensus strain. Our previous publications have indicated that T/F viruses during transmission were variants that were “fit” for transmission between hosts, especially in cases where the donor was a chronic progressor, a single T/F is often observed. Progression to immune escape and adaptation to chronic infection in the new host has an in-between process of genetic expansion via replication followed by a bottleneck event under immune pressure where overall fitness (overall survivability including replication and exploring immune escape pathways) can change. Under this assumption we questioned whether the observation reported in HIV studies (i.e. mutation landscapes that allow HIV adaptation to host) also happens in HCV infections. Furthermore, cohort used in this study is a rare cohort where patients were tracked from uninfected, to HCV RNA+, to seroconversion and finally either clearing the virus or progression to chronic infection. Thus, it is of importance to understand the difference between clearance and chronic progression.

      Another important point for clarification is the definition of fitness. In the abstract, the authors note that multiple studies have shown that viral escape variants can have reduced fitness, "diminishing the survival of the viral strain within the host, and the capacity of the variant to survive future transmission events." It would be helpful to distinguish between this notion of fitness, which has sometimes been referred to as "intrinsic fitness," and a definition of fitness that describes the success of different viral strains within a particular individual, including the potential benefits of immune escape. In many cases, escape variants displace variants without escape mutations, showing that their ability to survive and replicate within a specific host is actually improved relative to variants without escape mutations. However, escape mutations may harm the virus's ability to replicate in other contexts. Given the major role that fitness plays in this paper, it would be helpful for readers to clearly discuss how fitness is defined and to distinguish between fitness within and between hosts (potentially also mentioning relevant concepts such as "transmission fitness," i.e., the relative ability of a particular variant to establish new infections).

      Thank you for pointing out the weakness of our definition of fitness. We have now clarified this at multiple sections of the paper: In the abstract at lines 18-21 and in the introduction at lines 64-69.

      These read:

      Lines 18-21: “However, this generic definition can be further divided into two categories where intrinsic fitness describes the viral fitness without the influence of any immune pressure and effective fitness considers both intrinsic fitness with the influence of host immune pressure.”

      Lines 64-69: “This generic definition of fitness can be further divided into intrinsic fitness (also referred to as replicative fitness), where the fitness of sequence composition of the variant is estimated without the influence of host immune pressure. On the other hand, effective fitness (from here on referred to as viral fitness) considers fundamental intrinsic fitness with host immune pressure acting as a selective force to direct mutational landscape (19)[REF], which subsequently influences future transmission events as it dictates which subvariants remain in the quasispecies.”

      One concern about the analysis is in the test of Shannon entropy as a way to quantify the rate of escape. The authors describe computing the entropy at multiple time points preceding the time when escape mutations were observed to fix in a particular epitope. Which entropy values were used to compare with the escape rate? If just the time point directly preceding the fixation of escape mutations, could escape mutations have already been present in the population at that time, increasing the entropy and thus drawing an association with the rate of escape? It would also be helpful for readers to include a definition of entropy in the methods, in addition to a reference to prior work. For example, it is not clear what is being averaged when "average SE" is described.

      We thank the reviewer to point out the ambiguity in describing average SE. This has been rectified by adding more information in the methods section (Lines 397 to 400):

      “Briefly, SE was calculated using the frequency of occurrence of SNPs based on per codon position, this was further normalized by the length of the number of codons in the sequence which made up respective protein. An average SE value was calculated for each time point in each protein region for all subjects until the fixation event.”

      To answer the reviewer’s question, we computed entropy at multiple time points preceding the observation in the escape mutation. The escape rate was calculated for the epitopes targeted by immune response. We compared the average SE based on change of each codon position and then normalised by protein length, where the region contained the epitope and the time it took to reach fixation. We observed that if the protein region had a higher rate of variation (i.e. higher average SE) then we also see a quicker emergence of an immune escape epitope. Since we took SE from the very first time point and all subsequent time points until fixation, we do not think that escape mutations already been present at the population would alter the findings of the association with rate of escape. Especially, these escape mutations were rarely observed at early time points. It is likely that due to host immune pressure that the escape variant could be observed, the SE therefore suggest the liberty of exploration in the mutation landscape. If the region was highly restrictive where any mutations would result in a failed variant, then we should observe relatively lower values of average SE. In other words, the higher variability that is allowed in the region, the greater the probability that it will find a solution to achieve immune escape.

      Reviewer #2 (Recommendations for the authors):

      In addition to the main points above, there are a few minor comments and suggestions about the presentation of the data.

      (1) It's not clear how, precisely, the model-based fitness has been calculated and normalized. It would be helpful for the authors to describe this explicitly. Especially in Figure 3, the plotted fitness values lie in dramatically different ranges, which should be explained (maybe this is just an error with the plot?).

      We have now clarified how the model-based fitness has been calculated and normalized in the method section “Estimating survival fitness of viral variants” at line 465-472.

      “The model used for estimating viral fitness has been previously described by Hart et al. (19). Briefly, the original approach used HCV subtype 1a sequences to generate the model for the NS5B protein region. To update the model for other regions (NS3 and NS2) as well as other HCV subtypes in this study, subtype 1b and subtype 3a sequences were extracted from the Los Almos National Laboratory HCV database. An intrinsic fitness model was first generated for each subtype for NS5B, NS3 and NS2 region of the HCV polyprotein. Then using, longitudinally sequenced data from patients chronically infected with HCV as well as clinically documented immune escape to describe high viral fitness variants, we generated estimates of the viral fitness for subjects chronically infected with HCV in our cohort.”

      Our apologies, there was an error with the plot in Figure 3. This has now been resolved.

      (2) In different plots, the authors show every pairwise comparison of ELISPOT values, population fitness, average SE, and rate of escape. It may be helpful to make one large matrix of plots that shows all of these pairwise comparisons at the same time. This could make it clear how all the variables are associated with one another. To be clear, this is a suggestion that the authors can consider at their discretion.

      Thank you for the suggestion to create a matrix of plots for pairwise comparisons. While this approach could indeed clarify variable associations, implementing it is outside the scope of this project. We appreciate the idea and may consider it in future studies as we continue to expand on this work.

    1. Author response:

      We have reviewed the helpful feedback from the reviewers and would like to thank them for their careful consideration of our manuscript. By way of provisional response, we agree with many of the above points and plan to revise our manuscript accordingly.

      In an effort to replicate some of the heme trafficking-related experiments in the original paper using a C. elegans model of TDD, we were either unable to do so or demonstrated an alternative explanation for the findings we could partially reproduce. As the reviewers correctly point out, there were some methodological and reagent-related differences between the study by Sun et al. and our own that we will more directly highlight in a subsequent manuscript version. Additionally, where possible, we will attempt to replicate these experiments using the same protocol(s).

      We observed several phenotypic traits observed in the C. elegans model of TDD that were not previously described in prior studies. While we believe these features to be consistent with a bioenergetic problem in the worm, direct evidence for this is admittedly lacking in our original manuscript. We are actively engaged in experiments examining potential functions of HRG-9 and HRG-10 unrelated to heme trafficking and will consider which data best aligns with the scope of this study, thus warranting inclusion in a subsequent manuscript version. We will also provide a more comprehensive review of relevant data generated by other groups (e.g., lipid dysregulation, impaired autophagy, mitochondrial dysfunction in the absence of TANGO2) in the discussion section.

      Recommended improvements related to figure legends, terminology, and formatting will also be executed in our forthcoming version. On behalf of my co-authors and myself, thank you again for your time and effort improving this work.

    1. Author response:

      We thank both reviewers for their time and effort in considering our manuscript. We are pleased that the reviewers recognised the strength of our theoretical analysis and found it "elegant" and "reasonably accessible". We also acknowledge the suggestions made by both reviewers that the manuscript could be improved by more discussion of potential experiments. We were concerned not to make the original manuscript too long but, in the light of the reviewers' comments, we will submit a revised version with more details of the kinds of experiments that would build on the results that we have presented.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The present study aims to associate reproduction with age-related disease as support of the antagonistic pleiotropy hypothesis of ageing, predominantly using Mendelian Randomization. The authors found evidence that early-life reproductive success is associated with advanced ageing.

      Strengths:

      Large sample size. Many analyses.

      Weaknesses:

      There are some errors in the methodology, that require revisions.

      In particular, the main conclusions drawn by the authors refer to the Mendelian Randomization analyses. However, the authors made a few errors here that need to be reconsidered:

      (1) Many of the outcomes investigated by the authors are continuous outcomes, while the authors report odds ratios. This is not correct and should be revised.

      Thank you for your observation. We have revised the manuscript to ensure that the results for continuous outcomes are appropriately reported using beta coefficients, which indicate the change in the outcome per unit increase in exposure. This will accurately reflect the nature of the analysis and provide a clearer interpretation of continuous outcomes (lines 56-109).

      (2) Some of the odds ratios (for example the one for osteoporosis) are really small, while still reaching the level of statistical significance. After some checking, I found the GWAS data used to generate these MR estimates were processed by the program BOLT-LLM. This program is a linear mixed model program, which requires the transformation of the beta estimates to be useful for dichotomous outcomes. The authors should check the manual of BOLT-LLM and recalculate the beta estimates of the SNP-outcome associations prior to the Mendelian Randomization analyses. This should be checked for all outcomes as it doesn't apply to all.

      Thank you for your detailed feedback. We have reviewed all the GWAS data used in our MR analyses and confirmed that all GWAS of continuous traits have already been processed using the BOLT-LMM, including age at menarche, age at first birth, BMI, frailty index, father's age at death, mother's age at death, DNA methylation GrimAge acceleration, age at menopause, eye age, and facial aging. Most of the dichotomous outcomes have not been processed by BOLT-LMM, including late-onset Alzheimer's disease, type 2 diabetes, chronic heart failure, essential hypertension, cirrhosis, chronic kidney disease, early onset chronic obstructive pulmonary disease, breast cancer, ovarian cancer, endometrial cancer, and cervical cancer, except osteoporosis. We have reprocessed the GWAS beta values of osteoporosis and re-conducted the MR analysis (lines 74-75; lines 366-373).

      (3) The authors should follow the MR-Strobe guidelines for presentation.

      Thank you for your suggestion to follow the MR-STROBE guidelines for the presentation of our study. We appreciate the importance of adhering to these standardized guidelines to ensure clarity and transparency in reporting Mendelian Randomization (MR) analyses. We confirm that the MR components of our research are structured and presented following the MR-STROBE checklist. In addition to the MR analyses, our study also integrates Colocalization analysis, Genetic correlation analysis, Ingenuity Pathway Analysis (IPA), and population validation to provide a more comprehensive understanding of the genetic and biological context. While these analyses are not strictly covered by MR-STROBE guidelines, they complement the MR results by offering additional validation and mechanistic insights.

      We have structured our manuscript to separate these complementary analyses from the core MR results, maintaining alignment with MR-STROBE for the MR-specific components. The additional analyses are discussed in dedicated sections to highlight their unique contributions and avoid conflating them with the MR findings.

      (4) The authors should report data in the text with a 95% confidence interval.

      Thank you for your feedback. We have added the 95% confidence intervals for the reported data within the main text to enhance clarity and provide comprehensive context (lines 56-109). Additionally, the complete analysis data, including all detailed results, can be found in Table S3.

      (5) The authors should consider correction for multiple testing

      Thank you for your comment regarding the need to consider correction for multiple testing. We agree that correcting for multiple comparisons is an important step to control for the possibility of false-positive findings, particularly in studies involving large numbers of statistical tests. In our study, we carefully considered the issue of multiple testing and adopted the following approach:

      Context of Multiple Testing: The tests we conducted were hypothesis-driven, focusing on specific relationships (e.g., genetic correlation, colocalization, and Mendelian Randomization). These analyses are based on priori hypotheses supported by existing literature or biological relevance.

      Statistical Methods: Where applicable, we applied appropriate measures to account for multiple tests. For instance, in Mendelian Randomization, sensitivity analyses serve to validate the robustness of the results.

      We believe that the methodology and corrections applied in our study appropriately address concerns about multiple testing, given the hypothesis-driven nature of our analyses and the rigorous steps taken to validate our findings. If you feel that additional corrections are required for specific parts of the analysis, we would be happy to further clarify or revise as needed.

      Reviewer #2 (Public review):

      Summary:

      The authors present an interesting paper where they test the antagonistic pleiotropy theory. Based on this theory they hypothesize that genetic variants associated with later onset of age at menarche and age at first birth have a positive causal effect on a multitude of health outcomes later in life, such as epigenetic aging and prevalence of chronic diseases. Using a mendelian randomization and colocalization approach, the authors show that SNPs associated with later age at menarche are associated with delayed aging measurements, such as slower epigenetic aging and reduced facial aging, and a lower risk of chronic diseases, such as type 2 diabetes and hypertension. Moreover, they identified 128 fertility-related SNPs that are associated with age-related outcomes and they identified BMI as a mediating factor for disease risk, discussing this finding in the context of evolutionary theory.

      Strengths:

      The major strength of this manuscript is that it addresses the antagonistic pleiotropy theory in aging. Aging theories are not frequently empirically tested although this is highly necessary. The work is therefore relevant for the aging field as well as beyond this field, as the antagonistic pleiotropy theory addresses the link between fitness (early life health and reproduction) and aging.

      Points that have to be clarified/addressed:

      (1) The antagonistic pleiotropy is an evolutionary theory pointing to the possibility that mutations that are beneficial for fitness (early life health and reproduction) may be detrimental later in life. As it concerns an evolutionary process and the authors focus on contemporary data from a single generation, more context is necessary on how this theory is accurately testable. For example, why and how much natural variation is there for fitness outcomes in humans?

      Thank you for these insightful questions. We appreciate the opportunity to clarify how we approach the testing of AP theory within a contemporary human cohort and address the evolutionary context and comparative considerations with the disposable soma theory.

      We recognize that modern human populations experience selection pressures that differ from those in the past, which may affect how well certain genetic variants reflect historical fitness benefits. Nonetheless, the genetic variation present today still offers valuable insights into potential AP mechanisms through statistical associations in contemporary cohorts. We believe that AP can indeed be explored in current populations by examining genetic links between reproductive traits and age-related health outcomes. In our study, we investigate whether certain genetic variants linked to reproductive timing—such as age at menarche and age at first birth—also correlate with late-life health risks. By identifying SNPs associated with both early-life reproductive success and adverse aging outcomes, we aim to capture the evolutionary trade-offs that AP theory suggests.

      Despite contemporary selection pressures that differ from historical conditions, there remains natural genetic variation in traits like reproductive timing and longevity in humans today. This diversity allows us to apply MR to test causal relationships between reproductive traits and aging outcomes, providing insights into potential AP mechanisms. Prior studies have demonstrated that reproductive behaviors exhibit significant heritability and have identified genetic loci associated with reproductive timing (1,2). This genetic variation facilitates causal inference in modern cohorts, despite environmental and healthcare advances that might modulate these associations (3). By leveraging genetic risk scores for reproductive timing, our study captures the necessary variability to assess potential AP effects, thus providing valuable insights into how evolutionary trade-offs may continue to influence human health outcomes.

      How do genetic risk score distributions of the exposure data look like?

      Thank you for your question. Our study is focused on Mendelian Randomization (MR) analysis, which aims to infer causal relationships between exposures and outcomes. While genetic risk scores (GRS) provide valuable insights at an individual level, they do not directly align with our study's objective, which is centered on population-level causal inference rather than individual-level genetic risk assessment. In MR, we use genetic variants as instrumental variables to determine the causal effect of an exposure on an outcome. GRS analysis typically focuses on summarizing an individual's risk based on multiple genetic variants, which is outside the scope of our current research. Therefore, we did not perform or analyze the distribution of genetic risk scores, as our primary goal was to understand broader causal relationships using established genetic instruments.

      Also, how can the authors distinguish in their data between the antagonistic pleiotropy theory and the disposable soma theory, which considers a trade-off between investment in reproduction and somatic maintenance and can be used to derive similar hypotheses? There is just a very brief mention of the disposable soma theory in lines 196-198.

      In our manuscript, we test AP theory specifically by examining genetic variants associated with reproductive timing and their association with age-related health risks in later life. MR and genetic risk scores allow us to assess these associations, directly testing the hypothesis that certain alleles enhancing reproductive success might have adverse effects on aging outcomes. This gene-centered approach aligns with AP’s premise of genetic trade-offs, enabling us to observe whether alleles associated with early-life reproductive traits correlate with increased risks of age-related diseases. Distinguishing from disposable soma theory, which would predict a general trade-off in energy allocation affecting somatic maintenance and not specific genetic effects, our data focuses on how certain alleles have differential impacts across life stages. Our findings thus support AP theory over disposable soma by highlighting the effects of specific genetic loci on both reproductive and aging phenotypes. However, future research could indeed explore the intersection of these theories, for example, by examining how resource allocation and genetic predispositions interact to influence longevity in various environmental contexts.

      (2) The antagonistic pleiotropy theory, used to derive the hypothesis, does not necessarily distinguish between male and female fitness. Would the authors expect that their results extrapolate to males as well? And can they test that?

      Emerging evidence suggests that early puberty in males is linked to adverse health outcomes, such as an increased risk of cardiovascular disease, type 2 diabetes, and hypertension in later life (4). A Mendelian randomization study also reported a genetic association between the timing of male puberty and reduced lifespan (5). These findings support the hypothesis that genetic variants associated with delayed reproductive timing in males might similarly confer health benefits or improved longevity, akin to the patterns observed in females. This would suggest that similar mechanisms of antagonistic pleiotropy could operate in males as well.

      In our study, BMI was identified as a mediator between reproductive timing and disease risk. Given that BMI is a common risk factor for age-related diseases in both males and females (6-9), it is plausible that similar mechanisms involving BMI, reproductive timing, and disease risk could exist in males. This shared mediator points to the possibility that, while reproductive timelines may differ, the pathways through which these traits influence aging outcomes may be consistent across genders.

      AP theory could potentially be tested in males, as the principles of the theory may extend to analogous reproductive traits in males, such as age at puberty and testosterone levels, which could similarly influence health outcomes later in life. However, as our current study focuses specifically on female reproductive traits, testing the AP theory in males is outside the scope of this work. We acknowledge the importance of exploring these mechanisms in males, and we hope that future research will address this by investigating male-specific reproductive traits and their relationship to aging and health outcomes.

      (3) There is no statistical analyses section providing the exact equations that are tested. Hence it's not clear how many tests were performed and if correction for multiple testing is necessary. It is also not clear what type of analyses have been done and why they have been done. For example in the section starting at line 47, Odds Ratios are presented, indicating that logistic regression analyses have been performed. As it's not clear how the outcomes are defined (genotype or phenotype, cross-sectional or longitudinal, etc.) it's also not clear why logistic regression analysis was used for the analyses.

      Thank you for your thoughtful comments regarding the statistical analyses and the clarification of methods and variables used in the study.

      Statistical Analyses Section: We have included a detailed explanation of all statistical analyses in the Methods section (lines 291–408), specifying the rationale for the choice of methods, the variables analyzed, and their relationships. Additionally, we have provided the relevant equations or statistical models used where appropriate to ensure transparency.

      Beta Values and Odds Ratios: In the Results section (starting at line 56), both Beta values and Odds Ratios are presented: Beta values were used for analyses of continuous outcomes to quantify the linear relationship between predictors and outcomes. Odds Ratios (ORs) were calculated for binary or categorical disease outcomes to describe the relative odds of an outcome given specific exposures or independent variables.

      Validation and Regression Analyses: For further validation of the MR results, we conducted analyses using the UK Biobank dataset (starting at line 162). Logistic regression analysis was then employed for disease risk assessments involving categorical outcomes (e.g., diseased or not).

      We hope that this clarifies the methods and their applicability to our study, as well as the rationale for the presentation of Beta values and Odds Ratios. If further details or refinements are required, we are happy to incorporate them.

      (4) Mendelian Randomization is an important part of the analyses done in the manuscript. It is not clear to what extent the MR assumptions are met, how the assumptions were tested, and if/what sensitivity analyses are performed; e.g. reverse MR, biological knowledge of the studied traits, etc. Can the authors explain to what extent the genetic instruments represent their targets (applicable expression/protein levels) well?

      Thank you for your insightful comments regarding the Mendelian Randomization (MR) analysis and the evaluation of its assumptions. Below, we provide additional clarification on how the MR assumptions were addressed, sensitivity analyses performed, and the representativeness of the genetic instruments (starting at line 314):

      Relevance Assumption (Genetic instruments are associated with the exposure): “We identified single nucleotide polymorphisms (SNPs) associated with exposure datasets with p < 5 × 10<sup>-8</sup> (10,11). In this case, 249 SNPs and 67 SNPs were selected as eligible instrumental variables (IVs) for exposures of age at menarche and age at first birth, respectively. All selected SNPs for every exposure would be clumped to avoid the linkage disequilibrium (r<sup>2</sup> = 0.001 and kb = 10,000).” “During the harmonization process, we aligned the alleles to the human genome reference sequence and removed incompatible SNPs. Subsequent analyses were based on the merged exposure-outcome dataset. We calculated the F statistics to quantify the strength of IVs for each exposure with a threshold of F>10 (12).”

      Independence Assumption (Genetic instruments are not associated with confounders, Genetic instruments affect the outcome only through the exposure): Then we identified whether there were potential confounders of IVs associated with the outcomes based on a database of human genotype-phenotype associations, PhenoScanner V2 (13,14) (http://www.phenoscanner.medschl.cam.ac.uk/), with a threshold of p < 1 × 10<sup>-5</sup>. IVs associated with education, smoking, alcohol, activity, and other confounders related to outcomes would be excluded.

      Sensitivity Analyses Performed: A pleiotropy test was used to check if the IVs influence the outcome through pathways other than the exposure of interest. A heterogeneity test was applied to ensure whether there is a variation in the causal effect estimates across different IVs. Significant heterogeneity test results indicate that some instruments are invalid or that the causal effect varies depending on the IVs used. MRPRESSO was applied to detect and correct potential outliers of IVs with NbDistribution = 10,000 and threshold p = 0.05. Outliers would be excluded for repeated analysis. The causal estimates were given as odds ratios (ORs) and 95% confidence intervals (CI). A leave-one-out analysis was conducted to ensure the robustness of the results by sequentially excluding each IV and confirming the direction and statistical significance of the remained remaining SNPs.

      Supplemental post-GWAS analysis: Colocalization analysis (starting at line 356), Genetic correlation analysis (starting at line 366).

      Our MR analysis adheres to the guidelines for causal inference in MR studies. By combining multiple sensitivity analyses and ensuring the quality of genetic instruments, we demonstrate that the results are robust and unlikely to be driven by confounding or pleiotropy.

      (5) It is not clear what reference genome is used and if or what imputation panel is used. It is also not clear what QC steps are applied to the genotype data in order to construct the genetic instruments of MR.

      Starting in line 314, the steps of SNPs selection were included in the Methods part. “We identified single nucleotide polymorphisms (SNPs) associated with exposure datasets with p < 5 × 10<sup>-8</sup> (10,11). In this case, 249 SNPs and 67 SNPs were selected as eligible instrumental variables (IVs) for exposures of age at menarche and age at first birth, respectively. All selected SNPs for every exposure would be clumped to avoid the linkage disequilibrium (r<sup>2</sup> = 0.001 and kb = 10,000). Then we identified whether there were potential confounders of IVs associated with the outcomes based on a database of human genotype-phenotype associations, PhenoScanner V2 (13,14) (http://www.phenoscanner.medschl.cam.ac.uk/), with a threshold of p < 1 × 10<sup>-5</sup>. IVs associated with education, smoking, alcohol, activity, and other confounders related to outcomes would be excluded. During the harmonization process, we aligned the alleles to the human genome reference sequence and removed incompatible SNPs. Subsequent analyses were based on the merged exposure-outcome dataset. We calculated the F statistics to quantify the strength of IVs for each exposure with a threshold of F>10 (12). If the effect allele frequency (EAF) was missing in the primary dataset, EAF would be collected from dsSNP (https://www.ncbi.nlm.nih.gov/snp/) based on the population to calculate the F value.” The SNP numbers of exposures for each outcome and F statistics results were listed in supplemental table S2.

      (6) A code availability statement is missing. It is understandable that data cannot always be shared, but code should be openly accessible.

      We have added it to the manuscript (starting at line 410).

      Reviewer #2 (Recommendations for the authors):

      (1) The outcomes seem to be genotypes (lines 274-288). In MR, genotypes are used as an instrument, representing an exposure, which is then associated with an outcome that is typically observed and measured at a later moment in time than the predictors. If both exposure and outcome are genotypes it is not clear how this works in terms of causality; it would rather reflect a genetic correlation. One would expect the genotypes that function as instruments for the exposure to have a functional cascade of (age-related) effects, leading to an (age-related) outcome. From line 149 the outcomes seem to be phenotypes. Can the authors please clearly explain in each section what is analyzed, how the analyses were done, and why the analyses were done that way?

      Thank you for your insightful comment. We understand the concern regarding the use of genotypes as both exposures and outcomes and the implications this has for interpreting causality versus genetic correlation. To clarify, in our study, the outcomes analyzed in the MR framework are indeed genotypes, starting from line 47. We use genotypes as instrumental variables for exposures, which are then linked to phenotypic outcomes observed at a later stage, in line with standard MR principles.

      To improve the robustness of the MR results, we validated the genetic associations in the population with phenotype data from UK Biobank (lines 162-203), and the detailed methods were listed in lines 385-408.

      (2) Overall, the English writing is good. However, some small errors slipped in. Please check the manuscript for small grammar mistakes like in sentences 10 (punctuation) and 33 (grammar).

      Thank you for your feedback. We appreciate your careful review and attention to detail. We thoroughly rechecked the manuscript for any grammatical errors, including punctuation and sentence structure, especially in sentences 11 and 35 in revised manuscript, as suggested.

      (3) There is currently no results and discussion section.

      The manuscript was submitted as Short Reports article type with a combined Results and Discussion section. We have added the section title of Discussion.

      (4) Why did the authors not include SNPs associated with age at menopausal onset? See for example: https://www.nature.com/articles/s41586-021-03779-7https://urldefense.com/v3/__https://www.nature.com/articles/s41586-021-03779-7__;!!HYjtAOY1tjP_!Kl_ZKCmWOQEnvEbl46TG0TuhlsxapwvFdAFfZJkMvz8z7XhX5VEA1cT8CVvNu8xrv9k679Kl0XTrxwSajUeiXWm04XP4$.

      Thank you for your information. Our manuscript focuses on the antagonistic pleiotropy theory, which posits that inherent trade-off in natural selection, where genes beneficial for early survival and reproduction (like menarche and childbirth) may have costly consequences later. So, we only included age at menarche and age at first childbirth as exposures in our research.

      (5) Can the authors include genetic correlations between menarche, age at first child, BMI, and preferably menopause?

      Thank you for your suggestion. We acknowledge that including genetic correlations between age at menarche, age at first childbirth, BMI, and menopause can provide valuable context to our analysis. While our current MR study sets age at menarche and age at first childbirth as exposures and menopause as the outcome, and we have already included results that account for BMI-related SNPs before and after correction, we recognize the importance of assessing genetic correlations.

      To address this, we calculated the genetic correlations between these traits to provide insight into their shared genetic architecture. This analysis helps clarify whether there is a significant genetic overlap between the two exposures and between exposure and outcome, which can inform and support the interpretation of our MR results. We appreciate your suggestion and include these calculations to enhance the robustness and comprehensiveness of our study. In the genetic correlations analysis, LDSC software was applied and the genetic correlation values for all pairwise comparisons among age at menarche, age at first birth, BMI, and age at menopause onset were calculated(15,16). The results are listed in Table S6.

      (6) Line 39-40: that is not entirely true. There is also amounting evidence that socioeconomic factors cause earlier onset of menarche through stress-related mechanisms: https://doi.org/10.1016/j.annepidem.2010.08.006https://urldefense.com/v3/__https://doi.org/10.1016/j.annepidem.2010.08.006__;!!HYjtAOY1tjP_!Kl_ZKCmWOQEnvEbl46TG0TuhlsxapwvFdAFfZJkMvz8z7XhX5VEA1cT8CVvNu8xrv9k679Kl0XTrxwSajUeiXZ4vbX0y$

      Thank you so much for your information. We changed it to “Considering reproductive events are partly regulated by genetic factors that can manifest the physiological outcome later in life”.

      (7) Why did the authors choose to work with studies derived from IEU Open GWAS? as it is often does not contain the most recent and relevant GWAS for a specific trait.

      We chose to work with studies derived from the IEU Open GWAS database after careful consideration of several sources, including the GWAS Catalog database and recently published GWAS papers. Our selection criteria focused on publicly available GWAS with large sample sizes and a higher number of SNPs to ensure robust analysis. For specific traits such as late-onset Alzheimer's disease and eye aging, we used GWAS data published in scientific articles to ensure that our research reflects the latest findings in the field.

      (1) Barban, N. et al. Genome-wide analysis identifies 12 loci influencing human reproductive behavior. Nat Genet 48, 1462-1472 (2016). https://doi.org/10.1038/ng.3698

      (2) Tropf, F. C. et al. Hidden heritability due to heterogeneity across seven populations. Nat Hum Behav 1, 757-765 (2017). https://doi.org/10.1038/s41562-017-0195-1

      (3) Stearns, S. C., Byars, S. G., Govindaraju, D. R. & Ewbank, D. Measuring selection in contemporary human populations. Nat Rev Genet 11, 611-622 (2010). https://doi.org/10.1038/nrg2831

      (4) Day, F. R., Elks, C. E., Murray, A., Ong, K. K. & Perry, J. R. Puberty timing associated with diabetes, cardiovascular disease and also diverse health outcomes in men and women: the UK Biobank study. Sci Rep 5, 11208 (2015). https://doi.org/10.1038/srep11208

      (5) Hollis, B. et al. Genomic analysis of male puberty timing highlights shared genetic basis with hair colour and lifespan. Nat Commun 11, 1536 (2020). https://doi.org/10.1038/s41467-020-14451-5

      (6) Field, A. E. et al. Impact of overweight on the risk of developing common chronic diseases during a 10-year period. Arch Intern Med 161, 1581-1586 (2001). https://doi.org/10.1001/archinte.161.13.1581

      (7) Singh, G. M. et al. The age-specific quantitative effects of metabolic risk factors on cardiovascular diseases and diabetes: a pooled analysis. PLoS One 8, e65174 (2013). https://doi.org/10.1371/journal.pone.0065174

      (8) Kivimaki, M. et al. Obesity and risk of diseases associated with hallmarks of cellular ageing: a multicohort study. Lancet Healthy Longev 5, e454-e463 (2024). https://doi.org/10.1016/S2666-7568(24)00087-4

      (9) Kivimaki, M. et al. Body-mass index and risk of obesity-related complex multimorbidity: an observational multicohort study. Lancet Diabetes Endocrinol 10, 253-263 (2022). https://doi.org/10.1016/S2213-8587(22)00033-X

      (10) Savage, J. E. et al. Genome-wide association meta-analysis in 269,867 individuals identifies new genetic and functional links to intelligence. Nat Genet 50, 912-919 (2018). https://doi.org/10.1038/s41588-018-0152-6

      (11) Gao, X. et al. The bidirectional causal relationships of insomnia with five major psychiatric disorders: A Mendelian randomization study. Eur Psychiatry 60, 79-85 (2019). https://doi.org/10.1016/j.eurpsy.2019.05.004

      (12) Burgess, S., Small, D. S. & Thompson, S. G. A review of instrumental variable estimators for Mendelian randomization. Stat Methods Med Res 26, 2333-2355 (2017). https://doi.org/10.1177/0962280215597579

      (13) Staley, J. R. et al. PhenoScanner: a database of human genotype-phenotype associations. Bioinformatics 32, 3207-3209 (2016). https://doi.org/10.1093/bioinformatics/btw373

      (14) Kamat, M. A. et al. PhenoScanner V2: an expanded tool for searching human genotype-phenotype associations. Bioinformatics 35, 4851-4853 (2019). https://doi.org/10.1093/bioinformatics/btz469

      (15) Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat Genet 47, 1236-1241 (2015). https://doi.org/10.1038/ng.3406

      (16) Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet 47, 291-295 (2015). https://doi.org/10.1038/ng.3211

    1. Author response:

      We thank the reviewers for their thoughtful comments and suggestions. We plan to make a number of revisions to the manuscript to address their feedback.

      Firstly, we plan to incorporate feedback related to our modeling approach. We will provide justification for the chosen models and why this dataset is not appropriate for an in-depth exploration of other models. In particular, we will highlight that the models included in this manuscript were taken from Langdon et al. (2019) with a minor extension. Model development and validation in the Langdon et al. (2019) paper required a dataset with >100 rats per task. As the current n per variant is 28-32, and behavioral performance on this task is highly variable, it would be difficult to sufficiently test the validity of models that majorly depart from the previously tested RL models. Nevertheless, we will acknowledge this as a limitation in the discussion section. Additionally, we will test some alternatives suggested by reviewers that fall within the scope of the current RL modeling framework (e.g., comparison to a standard delta-rule update for unrewarded choices). We will address other concerns brought up by reviewers by a.) providing a rationale for why we constrained our analyses to the first five sessions, b.) simulating data for sessions that match those that were analyzed in the real data (i.e., sessions 35-40 instead of 18-20), and c.) including a figure of the simulated choice probabilities rather than just risk score.

      Secondly, we will include additional analyses and clarify the current statistical approach to address comments on how the data were analyzed. We will include an analysis of task acquisition to investigate when choice preferences emerge across the different variants. We will justify the statistical approach used for detecting behavioral differences between task variants, including a better explanation of the inclusion of the risky/optimal label as a between-subjects factor in the ANOVAs. We will also expand the section on parameters predicting risk preference on the rGT to fully explain the statistical method used and provide a figure of the results.

      Lastly, we will provide a more detailed rationale for the reinforcer devaluation test, and describe the hypothesis it tests. We will also expand on how the results from the devaluation test support our conclusions, and address alternative explanations suggested by the reviewers.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1: 

      (1) As discussed in review and nicely simulated by the authors, the large figure error indicated by profilometry (~10 um in some cases on average) is inconsistent with the optical performance improvements observed, suggesting that those measurements are inaccurate.

      I see no reason to include these inaccurate measurements.  

      We agree with the Referee and removed the indicated figure (old Supplementary Fig. 4) and data.

      Reviewer #3:

      (1) It would be interesting to comment on how the addition of a coverslip changes the performance of the uncorrected microendoscope compared to the use of bare grin lenses. 

      We modified the discussion section (page 18) and added a new reference (#36) to include the request of the Referee.

      (2) In Figure 6C-H, the authors can indeed show data corresponding to all detected cells, but I still think that the statistics should be calculated using the same effective FOV. 

      We modified Figure 6 legend to include the request of the Referee.

      (3) Authors could present the images in Figures 4-6 as in the original version, with a scale bar in the centre of the FOV that is different for the two types of objectives (corrected vs uncorrected). They could add a short justification for this choice, and perhaps present the other version for Figure 4 in a supplementary information sheet (with similar scale bars at the centre of the FOV for both types of objectives). It would allow readers to appreciate that the FOV still appears significantly enlarged with this other presentation.

      As requested by the Referee, we modified the text in the Result section (page 11) and added the additional version of Figure 4 as Figure 4-figure supplement 1.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This study presents potentially valuable insights into the role of climbing fibers in cerebellar learning. The main claim is that climbing fiber activity is necessary for optokinetic reflex adaptation, but is dispensable for its long-term consolidation. There is evidence to support the first part of this claim, though it requires a clearer demonstration of the penetrance and selectivity of the manipulation. However, support for the latter part of the claim is incomplete owing to methodological concerns, including unclear efficacy of longer-duration climbing fiber activity suppression.

      We sincerely appreciate the thoughtful feedback provided by the reviewer regarding our study on the role of climbing fibers in cerebellar learning. Each point raised has been carefully considered, and we are committed to addressing them comprehensively. We acknowledge the importance of addressing methodological concerns, particularly regarding the efficacy of long-term suppression of CF activity, as well as ensuring clarity regarding the penetrance and selectivity of our manipulation. To this end, we have outlined plans for substantial revisions to the manuscript to adequately address these issues.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study by Seo et al highlights knowledge gaps regarding the role of cerebellar complex spike (CS) activity during different phases of learning related to optokinetic reflex (OKR) in mice. The novelty of the approach is twofold: first, specifically perturbing the activity of climbing fibers (CFs) in the flocculus (as opposed to disrupting communication between the inferior olive (IO) and its cerebellar targets globally); and second, examining whether disruption of the CS activity during the putative "consolidation phase" following training affects OKR performance.

      The first part of the results provides adequate evidence supporting the notion that optogenetic disruption of normal CF-Purkinje neuron (PN) signaling results in the degradation of OKR performance. As no effects are seen in OKR performance in animals subjected to optogenetic irradiation during the memory consolidation or retrieval phases, the authors conclude that CF function is not essential beyond memory acquisition. However, the manuscript does not provide a sufficiently solid demonstration that their longterm activity manipulation of CF activity is effective, thus undermining the confidence of the conclusions.

      Strengths:

      The main strength of the work is the aim to examine the specific involvement of the CF activity in the flocculus during distinct phases of learning. This is a challenging goal, due to the technical challenges related to the anatomical location of the flocculus as well as the IO. These obstacles are counterbalanced by the use of a well-established and easy-to-analyse behavioral model (OKR), that can lead to fundamental insights regarding the long-term cerebellar learning process.

      Weaknesses:

      The impact of the work is diminshed by several methodological shortcomings.

      Most importantly, the key finding that prolonged optogenetic inhibition of CFs (for 30 min to 6 hours after the training period) must be complemented by the demonstration that the manipulation maintains its efficacy. In its current form, the authors only show inhibition by short-term optogenetic irradiation in the context of electrical-stimulation-evoked CSs in an ex vivo preparation. As the inhibitory effect of even the eNpHR3.0 is greatly diminished during seconds-long stimulations (especially when using the yellow laser as is done in this work (see Zhang, Chuanqiang, et al. "Optimized photo-stimulation of halorhodopsin for long-term neuronal inhibition." BMC biology 17.1 (2019): 1-17. ), we remain skeptical of the extent of inhibition during the long manipulations. In short, without a demonstration of effective inhibition throughout the putative consolidation phase (for example by showing a significant decrease in CS frequency throughout the irradiation period), the main claim of the manuscript of phase-specific involvement of CF activity in OKR learning cannot be considered to be based on evidence.

      Second, the choice of viral targeting strategy leaves gaps in the argument for CF-specific mechanisms. CaMKII promoters are not selective for the IO neurons, and even the most precise viral injections always lead to the transfection of neurons in the surrounding brainstem, many of which project to the cerebellar cortex in the form of mossy fibers (MF). Figure 1Bii shows sparsely-labelled CFs in the flocculus, but possibly also MFs. While obtaining homogenous and strong labeling in all floccular CFs might be impossible, at the very least the authors should demonstrate that their optogenetic manipulation does not affect simple spiking in PNs.

      Finally, while the paper explicitly focuses on the effects of CF-evoked complex spikes in the PNs and not, for example, on those mediated by molecular layer interneurons or via direct interaction of the CF with vestibular nuclear neurons, it would be best if these other dimensions of CF involvement in cerebellar learning were candidly discussed.

      We appreciate the reviewer’s thorough evaluation, which thoughtfully highlights the strengths and areas for improvement in our study.

      We agree with the reviewer’s recognition of the novelty of our approach, particularly in specifically perturbing climbing fiber (CF) activity in the flocculus and examining its effects across distinct phases of learning. Additionally, our use of the well-established OKR behavior paradigm provides a robust framework for investigating cerebellar learning processes, further strengthening our study.

      To address concerns regarding the efficacy of long-term optogenetic inhibition and the specificity of viral targeting, we conducted additional experiments. These include in vivo monitoring of CF activity during the irradiation period, confirming sustained inhibition of complex spikes throughout the consolidation phase. To ensure precise targeting and mitigate potential side effects, such as unintended modification of Purkinje cell (PC) simple spike activity, we demonstrated that optogenetic suppression of CF transmission did not affect simple spike firing. Furthermore, we made additional characterizations to confirm the specificity of viral targeting.

      Lastly, we recognize the importance of exploring alternative mechanisms underlying CF involvement in cerebellar learning. Accordingly, we expanded the manuscript to provide a more comprehensive discussion of these mechanisms, offering a clearer perspective on the broader implications of our findings.

      Reviewer #2 (Public Review):

      Summary:

      The authors aimed to explore the role of climbing fibers (CFs) in cerebellar learning, with a focus on optokinetic reflex (OKR) adaptation. Their goal was to understand how CF activity influences memory acquisition, memory consolidation, and memory retrieval by optogenetically suppressing CF inputs at various stages of the learning process.

      Strengths:

      The study addresses a significant question in the cerebellar field by focusing on the specific role of CFs in adaptive learning. The authors use optogenetic tools to manipulate CF activity. This provides a direct method to test the causal relationship between CF activity and learning outcomes.

      Weaknesses:

      Despite shedding light on the potential role of CFs in cerebellar learning, the study is hampered by significant methodological issues that question the validity of its conclusions. The absence of detailed evidence on the effectiveness of CF suppression and concerns over tissue damage from optogenetic stimulation weakens the argument that CFs are not essential for memory consolidation. These challenges make it difficult to confirm whether the study's objectives were fully met or if the findings conclusively support the authors' claims. The research commendably attempts to unravel the temporal involvement of CFs in learning but also underscores the difficulties in pinpointing specific neural mechanisms that underlie the phases of learning. Addressing these methodological issues, investigating other signals that might instruct consolidation, and understanding CFs' broader impact on various learning behaviors are crucial steps for future studies.

      We appreciate the reviewer’s recognition of the significance of our study in addressing the fundamental question of the role of CF in adaptive learning within the cerebellar field. The use of optogenetic tools indeed provides a direct means to investigate the causal relationship between CF activity and learning outcomes.

      To address concerns regarding the effectiveness of CF suppression during consolidation, we plan to conduct further in-vivo recordings. These will demonstrate how reliably CF transmission can be suppressed through optogenetic manipulation over an extended period.

      In response to the concern about potential tissue damage from laser stimulation, we believe that our optogenetic manipulation was not strong enough to induce significant heat-induced tissue damage in the flocculus. According to Cardin et al. (2010), light applied through an optic fiber may cause critical damage if the intensity exceeds 100 mW, which is eight times stronger than the intensity we used in our OKR experiment. Furthermore, if there had been tissue damage from chronic laser stimulation, we would expect to see impaired long-term memory reflected in abnormal gain retrieval results tested the following day. However, as shown in Figures 2 and 3, there were no significant abnormalities in consolidation percentages even after the optogenetic manipulation.

      Finally, we appreciate the reviewer’s recognition of the challenges involved in pinpointing specific neural mechanisms. We plan to expand the discussion to address these complexities and outline future research directions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Inhibitory optogenetic actuators are generally problematic, especially in time frames longer than seconds. If the authors wish to be able to inhibit activity in the flocculus-targeting CFs for a long time, maybe it would make sense to try to retrogradely transfect the IO neurons from the flocculus (using a cre-lox approach) with inhibitory DREADDs. This approach is also full of problems, so the absence or significant decrease in CS activity throughout the period of manipulation must be demonstrated.

      In addition to re-examining the strength of the evidence regarding the role of CFs in the consolidation and retrival phases, the manuscript would benefit from significant reworking of the details in the manuscript and figures. Below is a possibly incomplete list of things we would want to highlight:

      (1) While the text states the authors "... verified the potential reduction of Cs firing rate in PCs of awake mice in vivo by inhibiting CF signals", the data nor a figure are shown. This is of critical importance when judging the reliability of the following results. The data presented in panels Figure 1D-E should also be improved to be more informative, specifically, the waveforms of EPSCs should be shown in higher resolution. We are not informed about how many cells/slices/animals the results are obtained from, nor how many trials were done per condition. Finally, the in vitro data is from vermal Purkinje neurons, while the focus of the work is in the flocculus. Please provide these verifications for the flocculus.

      To verify the suppression of complex spike (Cs) activity, we conducted additional in-vivo experiments and added Figure 2, which presents recordings of Cs firing rates from Purkinje cells (PCs) during optogenetic suppression of climbing fiber (CF) activity. These data demonstrate that the suppression specifically and robustly targets Cs activity without affecting simple spike firing, as shown in Figure 2C. The results presented in Figure 2 were acquired at 40 minutes of optostimulation, consistently showing effective suppression of Cs activity throughout this period. While continuous recordings over several hours were not performed, the stability and sustained suppression observed at the 40-minute mark strongly suggest that the manipulation remains effective during the extended durations required for the behavioral tests.

      Additionally, we have improved Figure 1D by enhancing the resolution of EPSC waveforms and including more detailed information in the figure legend regarding the number of cells and animals analyzed. For the current-clamp mode data (Figures 1E and F), we clarified the experimental conditions to provide additional context. While the in vitro data were collected from vermal PCs, these experiments were intended to illustrate the fundamental properties of CF-PC transmission.

      (2) It is challenging to get a homogenous transfection of all CFs in a given region. To be able to judge the significance of the results, the readers should be provided with material allowing assessing the transfection quality. The images shown in panels Bi-ii are spatially restricted and of too low quality to make judgements. Also, it is not stated whether the images shown are from GFP or NpHR-transfected animals. These different payloads are delivered using different viral capsids (AAV1 vs. AAV9) that have significantly different transfection capacities and results from AAV9-CamKIIGFP cannot be generalized to AAV1-CamKII-NpHR. Please show the expression for the capsid used with NpHR.

      To clarify, the images in Figure Bi-ii are representative of GFP expression in animals transfected using AAV1-CamKII-EGFP. The purpose of these panels is to confirm the successful targeting of the region of interest rather than to evaluate viral tropism or capsid-specific transfection efficiency. Moreover, while the transfection characteristics of AAV1 and AAV9 may differ, the key experimental parameter of effective CF suppression was validated through in-vivo electrophysiological recordings, which robustly confirm the efficacy of NpHR expression.

      (3) Finally, please show the location of the optic fiber implant in the flocculus from post-mortem images.

      In Figure 3a of our revised manuscript, we added post-mortem histological images showing the exact location of the optic fiber implants in the flocculus. These images provided clear confirmation that the optogenetic stimulation was targeted to the correct anatomical region, ensuring that the observed effects are attributable to CF manipulation in the flocculus.

      Reviewer #2 (Recommendations For The Authors):

      (1) The efficacy of CF suppression is questionable. The histology in Figure 1 shows that only a handful of CFs are transduced in their approach. This observation casts doubt on the claimed complete suppression of CF-evoked EPSCs in every recorded PC in the same figure. This necessitates a more detailed explanation for this apparent discrepancy. Also, the absence of current-clamp recordings to measure the effect on CF-evoked complex spiking in PCs and the lack of detail regarding the timing of optogenetic actuation (continuous or pulsed) during these slice experiments are also significant omissions.

      We are providing additional in vivo electrophysiological recordings showing sustained CF suppression in awake animals (Figure 2). These recordings will directly demonstrate the extent of CFevoked complex spike (Cs) suppression.

      Moreover, we have included additional data of current-clamp recordings to measure the impact of CF suppression on Cs activity (Figures 1E and 1F). Regarding the timing of the optogenetic actuation, the stimulation was applied continuously in the slice experiments.

      (2) The authors claim that their method effectively suppresses CF activity in vivo, yet they do not present any supporting data. Given the histological evidence provided, it's questionable whether their approach truly impacts the CF population broadly, casting doubts on the efficacy of their suppression approach to identify the role of CFs during behavior. To address these concerns, further experiments and detailed quantification are essential to validate the extent and uniformity of CF suppression achieved.

      As we responded earlier, we conducted additional in-vivo experiments with continuous recordings of CF-evoked complex spike (Cs) activity during optogenetic suppression (Figure 2). These data directly demonstrate effective and sustained inhibition of CF transmission throughout the behavioral experiments. Quantification of CF suppression revealed consistent inhibition across the manipulation period, with no observable alterations in Purkinje cell simple spike firing rates, confirming that our intervention specifically targeted CF activity without off-target effects. In addition to the in-vivo data, the in-vitro data presented in Figure 1 (lines 107~116) further validate the efficacy of our optogenetic manipulation, showing consistent suppression of CF transmission without any failures. These findings collectively confirm the reliability and specificity of our suppression approach for studying CF contributions to behavior.

      (3) To optogenetically test the role of CFs in memory consolidation, the authors deliver continuous, high-power light to the flocculus (13 mW for 6 hrs). This extends well beyond typical experimental conditions. The sustained nature of the light exposure thus brings into question the consistency and reliability of CF suppression over time. Firstly, it is imperative to determine whether CF activity is suppressed throughout this extended period. Secondly, the intensity and duration of light exposure carry a significant risk of causing extensive damage to the surrounding tissue. Given these concerns, a thorough histological examination is warranted to assess the potential adverse effects on tissue integrity. Such an analysis is crucial not only for validating the experimental outcomes but also for ensuring that the observed effects are not confounded by light-induced tissue damage.

      To address whether CF activity is suppressed throughout the extended period, we included new in-vivo recordings demonstrating robust suppression of CF transmission, as evidenced by inhibited complex spikes sustained at 40 minutes of optostimulation. Regarding potential tissue damage, our optogenetic protocol used a light intensity (13 mW), which is much lower than the 75 mW threshold reported by Cardin et al. (2010) as sufficient to maintain normal neuronal activity. Moreover, critical damage typically requires intensities exceeding 100 mW for several hours (Cardin, Jessica A., et al. "Targeted optogenetic stimulation and recording of neurons in vivo using cell-type-specific expression of Channelrhodopsin-2." Nature protocols 5.2 (2010): 247-254.). Finally, we observed no abnormalities in long-term memory consolidation or gain retrieval (Figures 3C, 4C, 4F), further supporting that our light stimulation did not induce tissue damage.

      (4) The generalizability of their findings to various learning behaviors remains uncertain. Given that the flocculus plays a role in vestibulo-ocular reflex (VOR) adaptation, which encompasses both CFdependent and CF-independent learning types (gain increase and gain decrease, respectively), this system could offer a more feasible approach for investigating hypotheses about the role of CFs in guiding distinct learning processes.

      In response to the reviewer’s comment on the generalizability of our findings to learning behaviors involving both CF-dependent and CF-independent mechanisms, we acknowledge the importance of examining these dynamics in cerebellar motor adaptation systems, such as the OKR. Although our study used an OKR task, findings from VOR studies apply here. Ke et al. (2009) demonstrated that VOR gain increases (CF-dependent) and gain decreases (CF-independent) involve distinct plasticity processes (Ke, Michael C., Cong C. Guo, and Jennifer L. Raymond. "Elimination of climbing fiber instructive signals during motor learning." Nature neuroscience 12.9 (2009): 1171-1179), suggesting that CF engagement is task-dependent, particularly for larger error signals that require CF-guided adaptation.

      Similarly, our OKR findings suggest that CF-dependent pathways are likely used for large, persistent errors, whereas CF-independent mechanisms may drive more gradual adjustments. This alignment between OKR and VOR systems supports the generalizability of CF-selective adaptation across cerebellar learning tasks. We have elaborated on this point in our revised manuscript (lines 219~237), clarifying how CF-dependent and CF-independent mechanisms can generalize across motor learning contexts in the cerebellum.

      (5) The acute effect of CF suppression on OKR eye movements warrants investigation. If OKR eye movements are altered by their method, this could complicate the interpretation of their results.

      During our experiments, we monitored ocular movements during CF optogenetic manipulation and found no aberrant effects, such as nystagmus. As shown in Figures 4G and 4H, disrupting CF signaling during gain retrieval did not alter the gain, confirming that our manipulation neither acutely affects ocular reflexes nor induces abnormal eye movement. Therefore, it leads to the conclusion that the observed effects are specific to learning and memory processes.

      (6) The authors raise the potential issue of inducing presynaptic LTD in CFs. Can they be sure that their manipulation doesn't generate a similar effect? Additional controls or techniques to accurately interpret the results are needed considering this concern.

      However, our discussion does not claim that optogenetic suppression directly induces CF-LTD. Instead, we posit that CF suppression may have mimicked the functional consequences of CFLTD, such as reduced complex spike (Cs) activity and associated calcium signaling. This, in turn, may have indirectly interfered with the induction of parallel fiber-Purkinje cell (PF-PC) LTD, thereby preventing gain enhancement during learning.

      This hypothesis is consistent with previous studies highlighting the interplay between CF and PF synaptic plasticity in cerebellar motor learning. For example, Hansel and Linden (2000) and Weber et al. (2003) discuss how changes at CF synapses can modulate Cs waveforms and calcium dynamics, which are critical for PF-PC LTD. Coesmans et al. (2004) and Han et al. (2007) further elaborate on the necessity of CF input for effective PF-PC LTD induction during learning tasks such as retinal slip correction.

      While our experiments were not designed to directly measure CF-LTD, the observed prevention of gain enhancement aligns with the hypothesis that CF suppression functionally disrupted downstream PF-PC LTD. We have clarified these points in our revised manuscript (lines 250~258) to avoid misunderstanding.

      (7) The specific timeframe for OKR consolidation remains uncertain, with evidence from numerous studies indicating that cerebellar memory consolidation unfolds over several days. Therefore, a more thorough investigation into these extended durations, supported by control experiments to validate the outcomes, would significantly strengthen the study's conclusions, and provide clearer insights into the consolidation process of OKR learning.

      Our current study specifically focused on the early phase of the post-learning period, as supported by findings from several studies: Cooke et al., (2004); Titley et al., (2007); Steinmetz et al., (2016); Seo et al., (2024)

      These studies collectively indicate that cerebellar-dependent memory consolidation—including OKR—can occur rapidly during the early consolidation phase. While the specific mechanisms examined in these studies vary (e.g., synaptic plasticity, intrinsic plasticity, or circuit-level changes), they consistently demonstrate that modifications in the cerebellum after the early consolidation period no longer influence memory storage or performance. This evidence strongly supports the relevance of our experimental focus and the timing of our interventions.

      We acknowledge the importance of investigating extended consolidation periods, which could indeed provide additional insights. However, given our current aims, the rapid consolidation dynamics observed in the early phase are most relevant to the questions addressed in this study. We have elaborated on these matter in our revised manuscript (lines 273~283).

      (8) Issues around whether the authors have control over CF activity with their optogenetic intervention raise questions of whether learning can be recovered during the training procedure if the optogenetic stimuli are halted. Specifically, if suppression is applied for three blocks (what the authors refer to as "sessions") during the training procedure and then ceases, does learning rapidly recover in the immediately following blocks?

      While we did not directly examine the restoration of learning capability within the same training session following the cessation of optogenetic inhibition, we believe several aspects of our experimental design and insights from prior studies support our interpretation.

      Our optogenetic intervention specifically targeted Purkinje cells (PCs) in the flocculus and was applied continuously during designated training sessions to modulate cerebellar activity. Notably, Medina et al. (2001) demonstrated that transient inactivation of the cerebellar cortex impairs the expression of learned responses but does not disrupt the underlying plasticity mechanisms (Medina, Javier F., Keith S. Garcia, and Michael D. Mauk. "A mechanism for savings in the cerebellum." Journal of Neuroscience 21.11 (2001): 4081-4089.). This finding suggests that cerebellar plasticity remains intact and functional even after transient perturbations.

      Therefore, it is plausible that once optogenetic inhibition is lifted, the cerebellar network regains its capacity for learning and adaptation, as the intrinsic plasticity and memory encoding processes remain preserved. While we acknowledge that direct experimental confirmation of rapid recovery in our setup was not performed, this interpretation is consistent with our experimental framework and the broader literature.

      (9) The study does not fully explore the instructive signals/mechanisms underlying the memory consolidation process. A detailed investigation into potential instructive signals for consolidation beyond CF-induced signaling, like the simple spiking of PCs, could significantly enhance the study's conclusions. Indeed, there is currently no evidence to suggest that CFs play a role in the consolidation phase anyway so testing their role seems a bit of a strawman argument.

      While our study primarily focused on characterizing CF-dependent pathways, we acknowledge that memory consolidation is likely driven by a multifaceted interplay of instructive signals beyond CF-induced mechanisms. In particular, Purkinje cell (PC) simple spiking may act as a critical signal during the consolidation phase, either complementing or functioning independently of CF input. Emerging evidence suggests that simple spiking can modulate downstream circuitry in ways that stabilize and strengthen memory traces.

      To address this, we have expanded the discussion in the revised manuscript to explore potential instructive signals for consolidation, including PC simple spiking, local circuit plasticity within the cerebellar cortex, and its interaction with the cerebellar nuclei. We propose that these mechanisms collectively contribute to the transfer and stabilization of motor memory, offering a more comprehensive framework for understanding consolidation. We have elaborated on these matter in our revised manuscript (lines 238~250).

      (10) Previous reports have highlighted the necessity of CF activity for extinction/memory maintenance (Medina et al. 2002; Kim et al. 2020). That is, the absence of CF activity is consequential for cerebellar function. These results present a potential contrast to the findings reported in this current study. This discrepancy raises important questions about the experimental conditions, methodologies, and interpretations of CF function across different studies. A thorough discussion comparing these divergent outcomes is essential, as it could elucidate the specific contexts or conditions under which CF activity influences memory processes.

      We acknowledge that previous studies (Medina et al., 2002; Kim et al., 2020) have suggested a role for climbing fiber (CF) activity in extinction. However, our study specifically focuses on the acquisition phase of motor learning and does not extend to extinction or maintenance. As such, we have revised our discussion to limit interpretations strictly to the scope of our findings and removed references to extinction.

      The discrepancies between our results and prior work may arise from differences in methodologies and behavioral paradigms. For instance, we utilized optogenetic inhibition to achieve precise temporal and spatial control of CF activity, whereas previous studies employed pharmacological or lesion methods that may have broader effects on the cerebellar circuitry. Additionally, differences in behavioral paradigms, such as the optokinetic reflex (OKR) task used in our study compared to the eye-blink conditioning tasks in prior studies, may demand distinct roles for CF signaling depending on the specific requirements for error correction and adaptation.

      This clarification is now incorporated into our revised manuscript, and the discussion has been streamlined to focus on the phase-specific role of CF activity during acquisition without extending to extinction or maintenance (lines 259~270).

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      The article emphasizes vocal social behavior but none of the experiments involve a social element. Marmosets are recorded in isolation which could be sufficient for examining the development of vocal behavior in that particular context. However, the early-life maturation of vocal behavior is strongly influenced by social interactions with conspecifics. For example, the transition of cries and subharmonic phees which are high-entropy calls to more low-entropy mature phees is affected by social reinforcement from the parents. And this effect extends cross context where differences in these interaction patterns extend to vocal behavior when the marmosets are alone. From the chord diagrams, cries still consist of a significant proportion of call types in lesioned animals. Additionally, though it is an intriguing finding that the infants' phee calls have acoustic differences being 'blunted of variation, less diverse and more regular,' the suggestion that the social message conveyed by these infants was 'deficient, limited, and/or indiscriminate' is not but can be tested with, for example, playback experiments.

      We recognize that our definition of vocal social behavior is not within the normal realm of direct social interactions. We were particularly interested in marmoset vocalizations as a social signal, such as phees, cries and twitter, even when their family members or conspecifics are not visibly present. Generally speaking, in the laboratory, infant marmosets make few calls when in the presence of another conspecific, but when isolated they naturally make phee calls to reach out to their distantly located relatives. In this context, while we did not assess the animals interacting directly, we assessed what are normally referred to as ‘social contact calls,’ hence the term ‘social vocalizations.’ Playback recordings might provide potential evidence of antiphonal calling as a means of social interaction and might reveal the poor quality of the social message conveyed by the infant, but even here, the vocalizing marmoset would be calling to a non-visible conspecific. Thus, although our experiment lacked a direct social element, our data suggest that in the absence of a functioning ACC in early life, infant calls that convey social information, and which would elicit feedback from parents and other family members, may be compromised, and this could potentially influence how that infant develops its social interactive skills. We have now commented on the significance of social vocalizations in the introductory text (page 3) and discussion (page 15).

      The manuscript would benefit from the addition of more details to be able to better determine if the conclusions are well supported by the data. Understanding that this is very difficult data to get, the number of marmosets and some variability in the collection of the data would allow for the plotting of each individual across figures. For example, in the behavioral figures, which is the marmoset that is in the behavioral data that has a sparing of the ACC lesion in one hemisphere? Certain figures, described below in the recommendations for the authors, could also do with additional description.

      Thanks for these suggestions. We have plotted the individual animals in the relevant figures and addressed the comments and recommendations listed below.

      Reviewer #1 (Recommendations For The Authors):

      Given the number of marmosets, variability in the collected data, lesion extent, and different controls, I would like to see more plots with individuals indicated (perhaps with different symbols). More details could also be added for several plots.

      Figure 2D (new) and 2E now have plots that represent the individual animals, each represented by a different symbol.

      Figure 2A) Since lesions are bilateral, could you also show the extent of the lesions on the other side for completeness?

      Our intention was to process one hemisphere of each brain for Golgi staining to examine changes in cell morphology in the ACC and associated brain regions following the lesion. Unfortunately, the Golgi stain was unsuccessful. Consequently, we were unable to use the tissue to reconstruct the bilateral extent of the lesion. We did, however, first establish the bilateral nature of the lesion through coronal slices of the animals MRI scan before processing the intact hemisphere to confirm the bilateral extent of the lesion. The MRI scans (every 5th section) for each control and lesioned animal is compiled in a figure in the supplementary materials (Fig. S1). These scans show that the ACC-lesioned animals have bilateral lesions with one animal (ACC1) showing some sparing in one hemisphere, as we noted in the text. We have now made reference to this supplemental figure in the text (page 5).

      Figure 2B/C) In Figure 2B, control and ACC lesions are in the columns while right next to it in 2C, ACC lesion and control are in the rows. Could these figures be adjusted so that they are consistent?

      We have now adjusted these figures and updated the figure legends accordingly.

      Figure 2C) Is there quantification of the 'loss of neurons and respective increase in glial cells at the lesioned site especially at the interface between gray and white matter'? There are multiple slices for each animal.

      Thanks for suggesting this. We have now quantified these data which are presented as a new graph as Fig. 2D. These data revealed a significant loss of neurons (NeuN) in the ACC group as well as an increase in glial cells (GFAP and Iba1) relative to the controls. The figure legend and results have also been updated.

      Figure 2C) It is difficult for me to distinguish between white and purple - could you show color channels independently since images were split into separate channels for each fluorophore?

      Fig. 2C has been revised to better visualize the neurons and glia at the gray and white matter interface. We found that grayscale images for each channel offered a better contrast than separating the channels for each fluorophore.

      Figure 2C/D) I like how there are individual dots here for the individual marmosets. Since there are four in each group, could they be represented throughout with symbols (with a key indicating the pair and also the control condition)? For example, were there changes in the histology for control animals that got saline injections as opposed to those that didn't get any surgery?

      We have highlighted the individual animals with different symbols in the figures. Although some animals were twin pairs, it was not possible to have twins in all cases. Only two sets were twins. We have indicated the symbols that represent the twin pair in Fig. 2 as well as the MRI scans of the twin pairs in Fig. S1. There were no observed changes in histology for the sham animals relative to the other non-sham controls. The MRI scan for one sham CON2 shows herniated tissue in the right hemisphere which is a normal consequence of brain exposure caused by a craniotomy.

      Figure 3D-E) Here, individual data points could be informative especially given that some animals are missing data past the third week.

      To prevent cluttering the figure with too many data points, we have added the sample size for each group in the figure legend (pages 33).

      Figure 3D/F) What exactly is the period that goes into this analysis? In the text, 'Further analysis showed that the ACC lesion had minimal effects on the rate of most call types during this period'. Is this period from weeks 3 to 6 relative to the proportions in week 2? I think I also don't quite understand the chord diagram. The legend says 'the numbers around each chord diagram represents relative probability value for each call type transition' so how does that relate to the proportion of these call types? It looks like there is a wider slice for cries for ACC-lesioned animals each week. I also don't see in the week 4 chord diagram, the text description of 'elevation in the rate of 'other' calls, which comprised tsik, egg, eck, chatter and seep calls. These calls were significantly elevated in animals after the ACC lesion."

      We apologize for the confusion. Fig 3D and Fig 3F are not directly related. Fig. 3D shows the different types of emitted calls. The figure shows the averaged data per group pooled from post-surgery weeks (week 3 – week 6). It represents the proportion of individual call types relative to the total number of calls during each recording period. The only major finding here was the increased rate of ‘other’ calls comprising tsik, egg, ock, chatter and seep calls. These calls were significantly elevated in animals after the ACC lesion.

      While Fig. 3D represents the differences in the proportion of calls, the chord diagrams in Fig. 3F represents the probability of call-to-call transition obtained from a probability matrix. At postnatal week 6, marmosets with ACC lesions showed a higher likelihood of transitions between all call types, but less frequent transitions between social contact calls relative to sham controls. The chord diagrams visualize the weighted probabilities and directionality of these transitions between the different call types. Weighted probabilities were used to account for variations in call counts. The thickness of the arrows or links indicates the probability of a call transition, while the numbers surrounding each chord diagram represent the relative probability value for each specific transition. We have now reworded the text and clarified these details in the figure legend (pages 32-33).

      Figure 3E) How is the ratio on the y-axis calculated here?

      The y-axis represents the averaged value of the ratios of the number of social contact calls relative to non-social contact calls in each recording per subject per group (i.e., (x̄ (# social calls / # non-social calls). This is now included in the figure legend and the axis is updated (page 32).

      Also, cries could be considered a 'social contact call' since they are produced by infants to elicit responses from the parents. There is also the hypothesis in the literature that cries transition into phees.

      The reviewer is correct. Cries are often considered a social contact call because they elicit parental feedback. We decided to separate cry-calls from other social contact calls for two reasons. First, in our sample, we found cry behavior to be highly variable across the animals. For example, one control infant cried incessantly whereas another control infant cried less than normal. This extreme variability in animals of the same group masked the features between animals that reliably differentiated between them. Second, cry-calls elicit feedback from parents who are normally within the vicinity of the infant whereas phee calls elicit antiphonal phee calls from any distantly located conspecific. In other words, the context in which these calls are often elicited are very different.

      The use of 'syntactical' is a bit jarring to me because outside of linguistics, its use in animal communication generally refers to meaning-bearing units that can be combined into well-formed complexes such as pod-specific whale songs or predator alarm calls with concatenated syllable types in some species of monkeys. To my knowledge, individual phee syllables have not been currently shown to carry information on their own and may be better described as 'sequential' rather than 'syntactical'.

      We agree. We have made this change accordingly.

      Figure 4B) How many phee calls with differing numbers of syllables are present each week? How equal is the distribution given that later analyses go up to 5 syllables?

      The total number of phee calls with differing number of syllables ranged between 20-40 phees. This number varied between subjects, per week. The most common were 3- and 4-syllable phee calls which ranged from 7-15. Due to this variability, Fig. 4B presents the average syllable count. The axis is now updated.

      Figure 4C-E) How is the data combined here? Is there a 2nd syllable, the combined data from the 2nd syllable from phee calls of all lengths (1 - 5?). If so, are there differences based on how long the total sequence is?

      The combined data represents the specific syllable (e.g., the 1st syllable in a 2-syllable phee, in a 3-syllable phee and in a 4-syllable phee) irrespective of the length of the sequence in a sequence. No differences were observed between 2nd syllable in a 2 syllable phee and 2nd syllable in a 3 or a 4 syllable phee. We have included this detail in the figure legend (page 33-34).

      So duration is a vocal parameter that is highly dependent on physical factors such as body size and lung volume, where there differences in physical growth between the pairs of ACC-lesioned marmosets and their twins? Entropy is less closely tied to these physical factors but has previously been shown to decrease as phee calls mature, which we can also see in the negative relationship of the control animals. Do you know of experiments that show that lower entropy calls are more 'blunted'?

      Thank you for raising the important issue of physical growth factors. For twin pairs, it is not uncommon for one infant to be slightly bigger, heavier or stronger than the other presumably because one gets more access to food. With increasing age, we did not observe significant changes in bodyweight between the groups. We examined grip strength in all infants as a means of assessing how well the infant was able to access food during nursing. Poor grip strength would indicate a lower propensity to ‘hang on’ to the mother for nursing which could lead to lower weight gain and reduced physical growth. We found that both grip strength and body weight increased as the infants got older and both parameters were equivalent. We have included an additional figure to show the normal increase in both weight and grip strength to the supplemental materials (Fig. S3) and have made reference to this in the text (page 8).

      As for entropy, it’s impact on the emotional quality of vocalizations has not been systematically explored. Generally speaking, high entropy relates to high randomness and distortion in the signal. Accordingly, one view posits low-entropy phee calls represent mature sounding calls relative to noisy and immature high-entropy calls (e.g., Takahasi et al 2017). In the current study, the reduction in syllable entropy observed for both groups of animals with increasing age is consistent with this view. At the same time entropy can relate to vocal complexity; high entropy refers to complex and variable sound patterns whereas low entropy sounds are predictable, less diverse and simple vocal sequences (Kershenbaum, A. 2013. Entropy rate as a measure of animal vocal complexity. Bioacoustics, 23(3), 195–208). One possibility is that call maturity does not equate directly to emotional quality. In other words, a low-entropy mature call can also be lacking in emotion as observed in humans with ACC damage; these patients show mature speech, but they lack the variations in rhythms, patterns and intonation (i.e., prosody) that would normally convey emotional salience and meaning. Our observation of a reduction in phee syllable entropy in the ACC group in the context of being short and loud with reduced peak frequency is consistent with this view. Our use of the word ‘blunt’ was to convey how the calls exhibited by the ACC group were potentially lacking emotional meaning. Beyond this speculation, we are not aware of any papers that have examined the relationship between entropy and blunted calls directly. We have now included this speculation in the discussion (pages 12-13).

      Reviewer #2 (Public Review):

      The authors state that the integrity of white matter tracts at the injection site was impacted but do not show data.

      We have added representative micrographs of a control and ACC-lesioned animal in a new supplementary figure which shows the neurotoxin impacted the integrity of white matter tracts local to the site of the lesion (Fig. S2).

      The study only provides data up to the 6th week after birth. Given the plasticity of the cortex, it would be interesting to see if these impairments in vocal behavior persist throughout adulthood or if the lesioned marmosets will recover their social-vocal behavior compared to the control animals.

      We agree. Our original intention was to examine behavior into adulthood. Unfortunately, the COVID-19 pandemic compromised the continuation of the study. We were limited by the data that we were allowed to acquire due to imposed restrictions. Some non-vocalization data collected when the animals were young adults is currently being prepared for another paper.

      Even though this study focuses entirely on the development of social vocalizations, providing data about altered social non-vocal behaviors that accompany ACC lesions is missing. This data can provide further insights and generate new hypotheses about the exact role of ACC in social vocal development. For example, do these marmosets behave differently towards their conspecifics or family members and vice versa, and is this an alternate cause for the observed changes in social-vocal development?

      We agree. At the time however, apparatus for assessing behavior between the infant’s family and non-family members was not available. Assessing such behaviors in the animals holding room posed some difficulty since marmosets are easily distracted by other animals as well as the presence of an experimenter, amongst other things. This is an area of investigation we are currently pursuing.

      Reviewer #3 (Public Review):

      It is striking to find that the vocal repertoire of infant marmosets was not significantly affected by ACC lesions. During development, the neural circuits are still maturing and the role of different brain regions may evolve over time. While the ACC likely contributes to vocalizations across the lifespan, its relative importance may vary depending on the developmental stage. In neonates, vocalizations may be more reflexive or driven by physiological needs. At this stage, the ACC may play a role in basic socioemotional regulation but may not be as critical for vocal production. Since the animals lived for two years, further analysis might be helpful to elucidate the precise role of ACC in the vocal behavior of marmosets.

      Figure 3D. According to the Introduction "...infant ACC lesions abolish the characteristic cries that infants normally issue when separated from its mother". Are the present results in marmosets showing the opposite effect? Please discuss.

      To date, the work of Maclean (1985) is the only publication that describes the effect of early cingulate ablation on the spontaneous production of ‘separation calls’ largely construed as cries, coos and whimpers in response to maternal separation. All of this work was largely performed in rhesus macaques or squirrel monkeys. In addition to ablating the cingulate cortex, Maclean found that it was necessary to ablate the subcallosal (areas 25) and preseptal cingulate cortex (presumably referring to prelimbic area 32) to permanently eliminate the spontaneous production of separation cry calls. Our ablation of the ACC was more circumscribed to area 24 and is therefore consistent with MacLean’s earlier work that removal of ACC alone does not eliminate cry behavior. In adults, ACC ablation is insufficient at eliminating vocalization as well. We make reference to this on pages 13-14 of the discussion.

      Figure 3E and Discussion. Phees are mature contact calls and cries immature contact calls (Zhang et al, 2019, Nat Commun). Therefore, I would rather say that the proportion of immature (cries) contact calls increases vs the mature (phee, trill, twitters) contact calls in the ACC group. Cries are also "isolated-induced contact calls" to attract the attention of the caregivers.

      The reviewer is correct in that cries are directed towards caregivers but in our sample, cry behavior was highly variable between the infants. Consequently, in Fig. 3E social contact calls include phee, twitter and trill calls but does not include cries which were separated (see also response to reviewer #1). Many of the calls made during babbling were immature in their spectral pattern (compare phee calls between Fig. 3A and 3B). Cries typically transitioned into phees, twitters or trills before they fully matured. Fig 3E shows that the controls made more isolation-induced social contact calls at postnatal week 6 which were presumably maturing at this time point. Thus, if anything, there was an increase in the proportion of mature contact calls vs immature contact calls with increasing age.

      Figure 4D. Animal location and head direction within the recording incubator can have significant effects on the perceived amplitude of a call. Were these factors taken into account?

      The reviewer makes an excellent observation. Unfortunately, we did not account for location and head direction because the infants were quite mobile in the incubator. The directional microphone was hidden from view because the infants were distracted by it, and positioned ~12 cm from the marmoset, and placed in the exact same location for every recording. In addition, calls with phantom frequencies were eliminated during visual inspection of spectrograms. Beyond these details, location and head direction were not taken into account.

      Figure 4E. When a phee call has a higher amplitude, as is the case for the ACC group (Figure 4D), the energy of the signal will be concentrated more strongly at the phee call frequency ~8KHz. This concentration of the energy reduces the variability in the frequency distribution, leading to lower entropy. The interpretation of the results should be reconsidered. A faint call (control group) can exhibit more variability in the frequency content since the energy is distributed across a wider range of frequencies contributing to higher entropy. It can still be "fixed, regular, and stereotyped" if the behavior is consistent or predictable with little variation. Also, to define ACC calls as "monotonic" I would rather search for the lack of frequency modulation, amplitude variation, or narrower bandwidth.

      We very much appreciate this explanation. We were able to identify the maximum frequency that closely matched pitch of a sound for each syllable in a multisyllabic phee. New Fig. 4E shows that the peak frequency for each phee syllable was lower in the ACC-lesioned monkeys which may directly translate to the low entropy observed in this group. The term “monotonic” was used to relate our data to the classical and long-standing evidence of human ACC lesions causing monotonous intonation of speech. When all factors are taken into account, it is evident that the vocal phee signature of the ACC-lesioned animal was structurally different to the controls implicating a less complex and stereotyped ACC signal. Further studies are needed to systematically explore the relationship between entropy and emotional quality of vocalizations

      Apart from the changes in the vocal behavior, did the AAC lesions manifest in any other observable cognitive, emotional, or social behavior? ACC plays a role in processing pain and modulating pain perception. Could that be the reason for the observed increase in the proportion of cries in the ACC group and the increase in the phee call amplitude? Did the cries in the ACC group also display a higher amplitude than the cries in the control group?

      It was our intention to acquire as much data as possible from these infants as they matured from a cognitive, social and emotional perspective. Unfortunately, our study was hampered by variety of reasons including the COVID-19 pandemic which imposed major restrictions on our ability to continue with the experiment in a time sensitive manner. In addition, the development and construction of the custom apparatus to measure these behaviors was stalled during this period further preventing us from collecting behavioral data at regular time intervals. As for the cry behavior, the number of cries, in the ACC group were very low especially at postnatal week 5 and 6. Consequently, there were very few data points to work with.

      Discussion. Louder calls have the potential to travel longer distances compared to fainter calls, possess higher energy levels, and can propagate through the environment more effectively. If the ACC group produced louder phee syllables, how could be the message conveyed over long distances "deficient, limited, and/or indiscriminate"?

      Thanks for raising this interesting concept. Not all calls emitted by the animals were loud. We specifically examined the long-distance phee call in this regard. The phee syllables emitted by the ACC group were high amplitude with low frequencies, short duration and low entropy. Taking these factors into account, it is conceivable that the phee calls produced by the ACC group could not effectively convey their message over long distances despite their propagation through the environment. We have made reference to this in the discussion where we focus is specifically on the phee calls only (pages 12).

      Abstract: Do marmosets have syntax? Consider replacing "syntactical" with a more appropriate term (maybe "syntax-like").

      Thanks for this suggestion. We have replaced the term syntactical with ‘sequential’ as per the recommendation of reviewer #1.

      Introduction: "...cries that infants normally issue when separated from its mother". Please replace "its" with "their".

      This has been corrected.

      Results: Is the reference to Fig 1B related to the text?

      We have included and referred to Fig. 1B in the text (results and methods) to show other researchers how they can use this technique as a reliable and safe means of monitoring tidal volume under anesthesia in small infant marmoset without intubation.

      I understand that both "spectrograph" and "spectrogram" are used to analyze the frequency content of a signal. Nevertheless, "spectrogram" refers to the visual representation of the frequency content of a signal over time, and this term is commonly used in audio signal processing and specifically in the vocal communication field. I would recommend replacing "spectrograph" with "spectrogram".

      Thanks for this suggestion. We have corrected this throughout the manuscript.

      (Concerning the previous comment in the public review). Cries are uttered to attract the attention of the caregivers. The increase in the proportion of cries in the ACC group does not match the sentence: "...these infants appeared to make little effort in using vocalizations to solicit social contact when socially isolated".

      We apologize for the confusion. It is not the case that the ACC animals make more cries. Cry calls were highly variable amongst the animals. Consequently, although Fig 3D gives the impression that the proportion of cries in higher in ACC animals they did not differ significantly from the controls. Due to their high variability, cries were removed in the measurement of social contact. Accordingly, Fig. 3E does not include cry behavior; it shows that the ACC animals engage less in social contact calls.

      Related to Figure 3. What is the difference between "egg" and "eck" calls? Do you mean "ock"?

      We apologize. This was a typo. It should be ock calls.

      Figure 4B. Is the sample size five animals per group and per week? Overlapping data points seem to be placed next to each other. Why in some groups (e.g. ACC 6 weeks) less than five dots are visible?

      The sample size differed per week because of the lack of recording during the COVID restrictions. In Fig 4b, we have now separated the overlapping dots. We have also added the sample size of the groups in the figure legends.

      Would the authors expect to see stronger differences between the lesioned and the control groups when comparing a later developmental stage? The animals were euthanized at the age of

      These speculation is certainly feasible and yes, we were hoping to establish this level of detail with testing at later developmental stages. This is an aspect of development we are currently pursuing.

      Could these experiments be conducted?

      I’m afraid these animals are longer available, but we are currently conducting experiments in other animals with early life neurochemical manipulations who show behavioral changes into early adulthood.

      ACC lesion: It is reported that the lesions extended past 24b into motor area 6M. Did the animal display any motor control disability?

      Surprisingly, despite the lesion encroaching into 6M, these animals showed no observable motor impairment. We assessed the animals grip strength and body weight and discovered normal strength and growth in weight in both controls and the lesioned group. We have added this data as supplemental information (Fig. S3).

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      This study investigates what happens to the stimulus-driven responses of V4 neurons when an item is held in working memory. Monkeys are trained to perform memory-guided saccades: they must remember the location of a visual cue and then, after a delay, make an eye movement to the remembered location. In addition, a background stimulus (a grating) is presented that varies in contrast and orientation across trials. This stimulus serves to probe the V4 responses, is present throughout the trial, and is task-irrelevant. Using this design, the authors report memory-driven changes in the LFP power spectrum, changes in synchronization between the V4 spikes and the ongoing LFP, and no significant changes in firing rate.

      Strengths:

      (1) The logic of the experiment is nicely laid out.

      (2) The presentation is clear and concise.

      (3) The analyses are thorough, careful, and yield unambiguous results.

      (4) Together, the recording and inactivation data demonstrate quite convincingly that the signal stored in FEF is communicated to V4 and that, under the current experimental conditions, the impact from FEF manifests as variations in the timing of the stimulus-evoked V4 spikes and not in the intensity of the evoked activity (i.e., firing rate).

      Weaknesses:

      I think there are two limitations of the study that are important for evaluating the potential functional implications of the data. If these were acknowledged and discussed, it would be easier to situate these results in the broader context of the topic, and their importance would be conveyed more fairly and transparently.

      (1) While it may be true that no firing rate modulations were observed in this case, this may have been because the probe stimuli in the task were behaviorally irrelevant; if anything, they might have served as distracters to the monkey's actual task (the MGS). From this perspective, the lack of rate modulation could simply mean that the monkeys were successful in attending the relevant cue and shielding their performance from the potentially distracting effect of the background gratings. Had the visual probes been in some way behaviorally relevant and/or spatially localized (instead of full field), the data might have looked very different.

      Any task design involves tradeoffs; if the visual stimulus was behaviorally relevant, then any observed neurophysiological changes would be more confounded by possible attentional effects. We cannot exclude the possibility that a different task or different stimuli would produce different results; we ourselves have reported firing rate enhancements for other types of visual probes during an MGS task (Merrikhi et al. 2017). We have added an acknowledgement of these limitations in the discussion section (lines 323-330 in untracked version). At minimum, our results show a dissociation between the top-down modulation of phase coding, which is enhanced during WM even for these task-irrelevant stimuli, and rate coding. Establishing whether and how this phase coding is related to perception and behavior will be an important direction for future work.

      With this in mind, it would be prudent to dial down the tone of the conclusions, which stretch well beyond the current experimental conditions (see recommendations).

      We have edited the title (removing the word ‘primarily’) and key sentences throughout to tone down the conclusions, generally to state that the importance of a phase code in WM modulations is *possible* given the observed results, rather than certain (see abstract lines 26-27, introduction lines 59-62, conclusion lines 310-311).

      (2) Another point worth discussing is that although the FEF delay-period activity corresponds to a remembered location, it can also be interpreted as an attended location, or as a motor plan for the upcoming eye movement. These are overlapping constructs that are difficult to disentangle, but it would be important to mention them given prior studies of attentional or saccade-related modulation in V4. The firing rate modulations reported in some of those cases provide a stark contrast with the findings here, and I again suspect that the differences may be due at least in part to the differing experimental conditions, rather than a drastically different encoding mode or functional linkage between FEF and V4.

      We have added a paragraph to the discussion section addressing links to attention and motor planning (lines 315-333), and specifically acknowledging the inherent difficulties of fully dissociating these effects when interpreting our results (lines 323-330).

      Reviewer #2 (Public review):

      Summary:

      It is generally believed that higher-order areas in the prefrontal cortex guide selection during working memory and attention through signals that selectively recruit neuronal populations in sensory areas that encode the relevant feature. In this work, Parto-Dezfouli and colleagues tested how these prefrontal signals influence activity in visual area V4 using a spatial working memory task. They recorded neuronal activity from visual area V4 and found that information about visual features at the behaviorally relevant part of space during the memory period is carried in a spatially selective manner in the timing of spikes relative to a beta oscillation (phase coding) rather than in the average firing rate (rate code). The authors further tested whether there is a causal link between prefrontal input and the phase encoding of visual information during the memory period. They found that indeed inactivation of the frontal eye fields, a prefrontal area known to send spatial signals to V4, decreased beta oscillatory activity in V4 and information about the visual features. The authors went one step further to develop a neural model that replicated the experimental findings and suggested that changes in the average firing rate of individual neurons might be a result of small changes in the exact beta oscillation frequency within V4. These data provide important new insights into the possible mechanisms through which top-down signals can influence activity in hierarchically lower sensory areas and can therefore have a significant impact on the Systems, Cognitive, and Computational Neuroscience fields.

      Strengths:

      This is a well-written paper with a well-thought-out experimental design. The authors used a smart variation of the memory-guided saccade task to assess how information about the visual features of stimuli is encoded during the memory period. By using a grating of various contrasts and orientations as the background the authors ensured that bottom-up visual input would drive responses in visual area V4 in the delay period, something that is not commonly done in experimental settings in the same task. Moreover, one of the major strengths of the study is the use of different approaches including analysis of electrophysiological data using advanced computational methods of analysis, manipulation of activity through inactivation of the prefrontal cortex to establish causality of top-down signals on local activity signatures (beta oscillations, spike locking and information carried) as well as computational neuronal modeling. This has helped extend an observation into a possible mechanism well supported by the results.

      Weaknesses:

      Although the authors provide support for their conclusions from different approaches, I found that the selection of some of the analyses and statistical assessments made it harder for the reader to follow the comparison between a rate code and a phase code. Specifically, the authors wish to assess whether stimulus information is carried selectively for the relevant position through a firing rate or a phase code. Results for the rate code are shown in Figures 1B-G and for the phase code are shown in Figure 2. Whereas an F-statistic is shown over time in Figure 1F (and Figure S1) no such analysis is shown for LFP power. Similarly, following FEF inactivation there is no data on how that influences V4 firing rates and information carried by firing rates in the two conditions (for positions inside and outside the V4 RF). In the same vein, no data are shown on how the inactivation affects beta phase coding in the OUT condition.

      Per the reviewer’s suggestion, we have added several new supplementary figures. We now show the F-statistic for discriminability over time for the LFP timecourse (Fig. S2), and as a function of power in various frequencies (Fig. S4). We have added before/after inactivation comparisons of the LFP and spiking activity, and their respective F-statistics for discrimination between contrasts and orientations in Fig. S9. Lastly, we added a supplementary figure evaluating the impact of FEF inactivation on beta phase coding in the OUT condition, showing no significant change (Fig. S11).

      Moreover, some of the statistical assessments could be carried out differently including all conditions to provide more insight into mechanisms. For example, a two-way ANOVA followed by post hoc tests could be employed to include comparisons across both spatial (IN, OUT) and visual feature conditions (see results in Figures 2D, S4, etc.). Figure 2D suggests that the absence of selectivity in the OUT condition (no significant difference between high and low contrast stimuli) is mainly due to an increase in slope in the OUT condition for the low contrast stimulus compared to that for the same stimulus in the IN condition. If this turns out to be true it would provide important information that the authors should address.

      We have updated the STA slope measurement, excluding the low contrast condition which lacks a clear peak in the STA. Additionally, we equalized the bin widths and aligned the x-axes for better visual comparability. Then, we performed a two-way ANOVA, analyzing the effects of spatial features (IN vs. OUT) and visual conditions (contrast and orientation). The results showed a significant effect of the visual feature on both orientation (F = 3.96, p=0.046) and contrast (F = 14.26, p<10<sup>-3</sup>). However, neither the spatial feature nor the spatial-visual interaction exhibited significant effects for orientation (F = 0.52, p=0.473, F=1.56, p=0.212) or contrast (F = 2.19, p=0.139, F=1.15, p=0.283).

      There are also a few conceptual gaps that leave the reader wondering whether the results and conclusion are general enough. Specifically,

      (1) The authors used microstimulation in the FEF to determine RFs. It is thus possible that the FEF sites that were inactivated were largely more motor-related. Given that beta oscillations and motor preparatory activity have been found to be correlated and motor sites show increased beta oscillatory activity in the delay period, it is possible that the effect of FEF inactivation on V4 beta oscillations is due to inactivation of the main source of beta activity. Had the authors inactivated sites with a preponderance of visual neurons in the FEF would the results be different?

      We do not believe this to be likely based on what is known anatomically and functionally about this circuitry. Anatomically, the projections from FEF to V4 arise primarily from the supragranular layers, not layers which contain the highest proportion of motor activity (Barone et al. 2000, Pouget et al. 2009, Markov et al. 2013). Functionally, based on electrical identification of V4-projecting FEF neurons, we know that FEF to V4 projections are predominantly characterized by delay rather than motor activity (Merrikhi et al. 2017). We have now tried to emphasize these points when we introduce the inactivation experiments (lines 185-186).

      Experimentally, the spread of the pharmacological effect with our infusion system is quite large relative to any clustering of visual vs. motor neurons within the FEF, with behavioral consequences of inactivation spreading to cover a substantial portion of the visual hemifield (e.g., Noudoost et al. 2014, Clark et al. 2014), and so our manipulation lacks the spatial resolution to selectively target motor vs. other FEF neurons.

      (2) Somewhat related to this point and given the prominence of low-frequency activity in deeper layers of the visual cortex according to some previous studies, it is not clear where the authors' V4 recordings were located. The authors report that they do have data from linear arrays, so it should be possible to address this.

      Unfortunately, our chamber placement for V4 has produced linear array penetration angles which do not reliably allow identification of cortical layers. We are aware of previous results showing layer-specific effects of attention in V4 (e.g., Pettine et al. 2019, Buffalo et al. 2011), and it would indeed be interesting to determine whether our observed WM-driven changes follow similar patterns. We may be able to analyze a subset of the data with current source density analysis to look for layer-specific effects in the future, but are not able to provide any information at this time.

      (3) The authors suggest that a change in the exact frequency of oscillation underlies the increase in firing rate for different stimulus features. However, the shift in frequency is prominent for contrast but not for orientation, something that raises questions about the general applicability of this observation for different visual features.

      While the shift in peak frequency across contrasts is more prominent than that across orientations (Fig. S3A-B), the relationship between orientation and peak frequency is also significant (one-way ANOVA for peak frequency across contrasts, F<sub>Contrast</sub>=10.72, p<10<sup>-4</sup>; or across orientations, F<sub>Orientation</sub>=3, p=0.030; stats have been added to Fig. S3 caption). This finding also aligns with previous studies, which reported slight peak frequency shifts (~1–2 Hz) in the context of attention (Fries, 2015). To address the question of whether the frequency-firing rate correlation generalizes to orientation-driven changes, we now examine the relationship between peak frequency and firing rate separately for each contrast level (Fig. S14). The average normalized response as a function of peak frequency, pooled across subsamples of trials from each of 145 V4 neurons (100 subsamples/neuron), IN vs. OUT conditions, shows a significant correlation during the delay period for each contrast (contrast low (F<sub>Condition</sub>=0.03, p=0.867; F<sub>Frequency</sub>=141.86, p<10<sup>-18</sup>; F<sub>Interaction</sub>=10.70, p=0.002, ANCOVA), contrast middle (F<sub>Condition</sub>=7.18, p=0.009; F<sub>Frequency</sub>=96.76, p<10<sup>-14</sup>; F<sub>Interaction</sub>=0.13, p=0.716, ANCOVA), contrast high (F<sub>Condition</sub>=12.51, p=0.001; F<sub>Frequency</sub>=333.74, p<10<sup>-29</sup>; F<sub>Interaction</sub>=7.91, p=0.006, ANCOVA).

      (4) One of the major points of the study is the primacy of the phase code over the rate code during the delay period. Specifically, here it is shown that information about the visual features of a stimulus carried by the rate code is similar for relevant and irrelevant locations during the delay period. This contrasts with what several studies have shown for attention in which case information carried in firing rates about stimuli in the attended location is enhanced relative to that for stimuli in the unattended location. If we are to understand how top-down signals work in cognitive functions it is inevitable to compare working memory with attention. The possible source of this difference is not clear and is not discussed. The reader is left wondering whether perhaps a different measure or analysis (e.g. a percent explained variance analysis) might reveal differences during the delay period for different visual features across the two spatial conditions.

      We have added discussion regarding the relationship of these results to previous findings during attention in the discussion section (lines 315-333).

      The use of the memory-guided saccade task has certain disadvantages in the context of this study. Although delay activity is interpreted as memory activity by the authors, it is in principle possible that it reflects preparation for the upcoming saccade, spatial attention (particularly since there is a stimulus in the RF), etc. This could potentially change the conclusion and perspective.

      We have added a new discussion paragraph addressing the relationship to attention and motor planning (lines 315-333). We have also moderated the language used to describe our conclusions throughout the manuscript in light of this ambiguity.

      For the position outside the V4 RF, there is a decrease in both beta oscillations and the clustering of spikes at a specific phase. It is therefore possible that the decrease in information about the stimuli features is a byproduct of the decrease in beta power and phase locking. Decreased oscillatory activity and phase locking can result in less reliable estimates of phase, which could decrease the mutual information estimates.

      Looking at the SNR as a ratio of power in the beta band to all other bands, there is no significant drop in SNR between conditions (SNRIN = 4.074+-984, SNROUT = 4.333+-0.834 OUT, p=0.341, Wilcoxon signed-rank). Therefore, we do not think that the change in phase coding is merely a result of less reliable phase estimates.

      The authors propose that coherent oscillations could be the mechanism through which the prefrontal cortex influences beta activity in V4. I assume they mean coherent oscillations between the prefrontal cortex and V4. Given that they do have simultaneous recordings from the two areas they could test this hypothesis on their own data, however, they do not provide any results on that.

      This paper only includes inactivation data. We are working on analyzing the simultaneous recording data for a future publication.

      The authors make a strong point about the relevance of changes in the oscillation frequency and how this may result in an increase in firing rate although it could also be the reverse - an increase in firing rate leading to an increase in the frequency peak. It is not clear at all how these changes in frequency could come about. A more nuanced discussion based on both experimental and modeling data is necessary to appreciate the source and role (if any) of this observation.

      As the reviewer notes, it is difficult to determine whether the frequency changes drive the rate changes, vice versa, or whether both are generated in parallel by a common source. We have adjusted our language to reflect this (lines 291-293). Future modeling work may be able to shed more light on the causal relationships between various neural signatures.

      Reviewer #3 (Public review):

      Summary:

      In this report, the authors test the necessity of prefrontal cortex (specifically, FEF) activity in driving changes in oscillatory power, spike rate, and spike timing of extrastriate visual cortex neurons during a visual-spatial working memory (WM) task. The authors recorded LFP and spikes in V4 while macaques remembered a single spatial location over a delay period during which task-irrelevant background gratings were displayed on the screen with varying orientation and contrast. V4 oscillations (in the beta range) scaled with WM maintenance, and the information encoded by spike timing relative to beta band LFP about the task-irrelevant background orientation depended on remembered location. They also compared recorded signals in V4 with and without muscimol inactivation of FEF, demonstrating the importance of FEF input for WM-induced changes in oscillatory amplitude, phase coding, and information encoded about background orientations. Finally, they built a network model that can account for some of these results. Together, these results show that FEF provides meaningful input to the visual cortex that is used to alter neural activity and that these signals can impact information coding of task-irrelevant information during a WM delay.

      Strengths:

      (1) Elegant and robust experiment that allows for clear tests for the necessity of FEF activity in WM-induced changes in V4 activity.

      (2) Comprehensive and broad analyses of interactions between LFP and spike timing provide compelling evidence for FEF-modulated phase coding of task-irrelevant stimuli at remembered location.

      (3) Convincing modeling efforts.

      Weaknesses:

      (1) 0% contrast background data (standard memory-guided saccade task) are not reported in the manuscript. While these data cannot be used to consider information content of spike rate/time about task-irrelevant background stimuli, this condition is still informative as a 'baseline' (and a more typical example of a WM task).

      We have added a new supplementary figure to show the effect of WM on V4 LFP power and SPL in 0% contrast trials (Fig. S6). These results (increases in beta LFP power and SPL when remembering the V4 RF location) match our previous report for the effect of spatial WM on LFP power and SPL within extrastriate area MT (Bahmani et al. 2018).

      (2) Throughout the manuscript, the primary measurements of neural coding pertain to task-irrelevant stimuli (the orientation/contrast of the background, which is unrelated to the animal's task to remember a spatial location). The remembered location impacts the coding of these stimulus variables, but it's unclear how this relates to WM representations themselves.

      Indeed, here we have focused on how maintaining spatial WM impacts visual processing of incoming sensory information, rather than on how the spatial WM signal itself is represented and maintained. Behaviorally, this impact on visual signals could be related to the effects of the content of WM on perception and reaction times (e.g., Soto et al. 2008, Awh et al. 1998, Teng et al. 2019), but no such link to behavior is shown in our data.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      As mentioned above, the two points I raised in the public review merit a bit of development in the Discussion. In addition, the authors should revise some of their conclusions.

      For instance (L217):

      "The finding that WM mainly modulates phase coded information within extrastriate areas fundamentally shifts our understanding of how the top-down influence of prefrontal cortex shapes the neural representation, suggesting that inducing oscillations is the main way WM recruits sensory areas."

      In my opinion, this one is over-the-top on various counts.

      Here is another exaggerated instance (L298):

      "...leading us to conclude that representations based on the average firing rate of neurons are not the primary way that top-down signals enhance sensory processing."

      Again, as noted above, the problem is that one could make the case that the top-down signals are, in fact, highly effective, since they are completely quashing any distracter-related modulation in firing rate across RFs. There is only so much that one can conclude from responses to stimuli that are task-irrelevant, uniform across space, and constant over the course of a trial.

      I think even the title goes too far. What the work shows, by all accounts, is that the sustained activity in FEF has a definitive impact on V4 *even* with respect to a sustained, irrelevant background stimulus. The result is very robust in this sense. However, this is quite different from saying that the *primary* means of functional control for FEF is via phase coding. Establishing that would require ruling out other forms of control (i.e., rate coding) in all or a wide range of experimental conditions. That is far from the restricted set of conditions tested here and is also at variance with many other experiments demonstrating effects of attention or even FEF microstimulation on V4 firing activity.

      To reiterate, in my opinion, the work is carefully executed and the data are interesting and largely unambiguous. I simply take issue with what can be reliably concluded, and how the results fit with the rest of the literature. Revisions along these lines would improve the readability of the paper considerably.

      We have edited the title (removing the word ‘primarily’) and key sentences throughout to tone down the conclusions, generally to state that the importance of a phase code in WM modulations is *possible* given the observed results, rather than certain (see abstract lines 26-27, introduction lines 59-62, conclusion lines 310-311).

      Reviewer #3 (Recommendations for the authors):

      (1) My primary comment that came up multiple times as I read the manuscript (and which is summarized above) is that I wasn't ever sure why the authors are focused on analyzing neural coding of task-irrelevant sensory information during a WM task as a function of WM contents (remembered location). Most studies of neural codes supporting WM often focus on coding the remembered information - not other information. Conceptually, it seems that the brain would want to suppress - or at least not enhance - representations of task-irrelevant information when performing a demanding task, especially when there is no search requirement, and when there is no feature correspondence between the remembered and viewed stimuli. (i.e., the interaction between WM and visual input is more obvious for visual search for a remembered target). Why, in theory, would a visual region need to improve its coding of non-remembered information as a function of WM? This isn't meant to detract from the results, which are indeed very interesting and I think quite informative. The authors are correct that this is certainly relevant for sensory recruitment models of WM - there's clear evidence for a role of feedback from PFC to extrastriate cortex - but what role, specifically, each region plays in this task is critical to describe clearly, especially given the task-irrelevance of the input. Put another way: what if the animal was remembering an oriented grating? In that case, MI between spike-based measures and orientation would be directly relevant to questions of neural WM representations, as the remembered feature is itself being modeled. But here, the focus seems to be on incidental coding.

      Indeed, here we have focused on how maintaining spatial WM impacts visual processing of incoming sensory information, rather than on how the spatial WM signal itself is represented and maintained. Behaviorally, this impact on visual signals could be related to the effects of the content of WM on perception and reaction times (e.g., Soto et al. 2008, Awh et al. 1998, Teng et al. 2019), but no such link to behavior is shown in our data.

      Whether similar phase coding is also used to represent the content of object WM (for example, if the animal was remembering an oriented grating), or whether phase coding is only observed for WM’s modulation of the representation of incoming sensory signals, is an important question to be addressed in future work.

      (2) Related to the above, the phrasing of the second sentence of the Discussion (lines 291-292) is ambiguous - do the authors mean that the FEF sends signals that carry WM content to V4, or that FEF sends projections to V4, and V4 has the WM content? As presently phrased, either of these are reasonable interpretations, yet they're directly opposing one another (the next sentence clarifies, but I imagine the authors want to minimize any confusion).

      We have edited this sentence to read, “Within prefrontal areas, FEF sends direct projections to extrastriate visual areas, and activity in these projections reflects the content of WM.”

      (3) I'm curious about how the authors consider the spatial WM task here different from a cued spatial attention task. Indeed, both require sustained use of a location for further task performance. The section of the Discussion addressing similar results with attention (lines 307-311) presently just summarizes the similarities of results but doesn't offer a theoretical perspective for how/why these different types of tasks would be expected to show similar neural mechanisms.

      We have added discussion regarding the relationship of these results to previous findings during attention in the discussion section (lines 315-333).

      (4) As far as I can tell, there is no consideration of behavioral performance on the memory-guided saccade task (RT, precision) across the different stimulus background conditions. This should be reported for completeness, and to determine whether there is an impact of the (likely) task-irrelevant background on task performance. This analysis should also be reported for Figure 3's results characterizing how FEF inactivation disrupts behavior (if background conditions were varied, see point 7 below).

      We have added the effect of inactivation on behavioral RT and % correct across the different stimulus background conditions (Fig. S8). Background contrast and orientation did not impact either RT or % correct.

      (5) Results from Figure 2 (especially Figures 2A-B) concerning phase-locked spiking in V4 should be shown for 0%-contrast trials as well, as these trials better align with 'typical' WM tasks.

      We have added a new supplementary figure to show the effect of WM on V4 LFP power and SPL in 0% contrast trials (Fig. S6). These results (increases in beta LFP power and SPL) match our previous report for the effect of spatial WM on LFP power and SPL within extrastriate area MT (Bahmani et al. 2018).

      (6) The magnitude of SPL difference in aggregate (Figure 2B) is much, much smaller than that of the example site shown (Figure 2A), such that Figure 2A's neuron doesn't appear to be visible on Figure 2B's scatterplot. Perhaps a more representative sample could be shown? Or, the full range of x/y axes in Figure 2B could be plotted to illustrate the full distribution.

      We have updated Fig. 2A with a more representative sample neuron.

      (7) I'm a bit confused about the FEF inactivation experiments. In the Methods (lines 512-513), the authors mention there was no background stimulus presented during the inactivation experiment, and instead, a typical 8-location MGS task was employed. However, in the results on pg 8 (Lines 201-214), and Figure 3G, the authors quantify a phase code MI. The previous phase code MI analysis was looking at MI between each spike's phase and the background stimulus - but if there's no background, what's used to compute phase code MI? Perhaps what they meant to write was that, in addition to the primary task with a manipulation of background properties, an 8-location MGS task was additionally employed.

      The reviewer is correct that both tasks were used after inactivation (the 8-location task to assess the spread of the behavioral effect of inactivation, and the MGS-background task for measuring MI). We have edited the methods text to clarify.

      (8) How is % Correct defined for the MGS task? (what is the error threshold? Especially for the results described in lines 192-193).

      The % correct is defined as correct completed trials divided by the total number of trials; the target window was a circle with radius of 2 or 4 dva (depending on cue eccentricity). These details have been added to the Methods.

      (9) The paragraph from lines 183-200 describes a number of behavioral results concerning "scatter" and "RT" - the RT shown seems extremely high, and perhaps is normalized. Details of this normalization should be included in the Methods. The "scatter" is listed as dva, but it's not clear how scatter is quantified (std dev of endpoint distribution? Mean absolute error), nor how target eccentricity is incorporated (as scatter is likely higher for greater target eccentricity).

      We have renamed ‘scatter’ to ‘saccade error’ in the text to match the figure, and now provide details in the Methods section. Both RT and saccade error are normalized for each session, details are now provided in the Methods. Since error was normalized for each session before performing population statistics, no other adjustment for eccentricity was made.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Summary:

      The authors propose a new model of biologically realistic reinforcement learning in the direct and indirect pathway spiny projection neurons in the striatum. These pathways are widely considered to provide a neural substrate for reinforcement learning in the brain. However, we do not yet have a full understanding of mechanistic learning rules that would allow successful reinforcement learning like computations in these circuits. The authors outline some key limitations of current models and propose an interesting solution by leveraging learning with efferent inputs of selected actions. They show that the model simulations are able to recapitulate experimental findings about the activity profile in these populations of mice during spontaneous behavior. They also show how their model is able to implement off-policy reinforcement learning.

      Strengths:

      The manuscript has been very clearly written and the results have been presented in a readily digestible manner. The limitations of existing models, that motivate the presented work, have been clearly presented and the proposed solution seems very interesting. The novel contribution of the proposed model is the idea that different patterns of activity drive current action selection and learning. Not only does this allow the model is able to implement reinforcement learning computations well, but this suggestion may have interesting implications regarding why some processes selectively affect ongoing behavior and others affect learning. The model is able to recapitulate some interesting experimental findings about various activity characteristics of dSPN and iSPN pathway neuronal populations in spontaneously behaving mice. The authors also show that their proposed model can implement off-policy reinforcement learning algorithms with biologically realistic learning rules. This is interesting since off-policy learning provides some unique computational benefits and it is very likely that learning in neural circuits may, at least to some extent, implement such computations.

      We thank the reviewer for the positive comments.

      Weaknesses:

      A weakness in this work is that it isn’t clear how a key component in the model - an efferent copy of selected actions - would be accessible to these striatal populations. The authors propose several plausible candidates, but future work may clarify the feasibility of this proposal.

      We agree that the biological substrate of the efference copy remains a key open question. We discuss potential pathways in the Discussion section of our manuscript and hope that future experimental studies clarify the question.

      Reviewer #2:

      Summary:

      The basal ganglia is often understood within a reinforcement learning (RL) framework, where dopamine neurons convey a reward prediction error that modulates cortico-striatal connections onto spiny projection neurons (SPNS) in the striatum. However, current models of plasticity rules are inconsistent with learning in a reinforcement learning framework.

      This paper proposes a new model that describes how distinct learning rules in direct and indirect pathway striatal neurons allow them to implement reinforcement learning models. It proposes that two distinct components of striatal activity affect action selection and learning. They show that the proposed implementation allows learning in simple tasks and is consistent with experimental data from calcium imaging data in direct and indirect SPNs in freely moving mice.

      Strengths:

      Despite the success of reward prediction errors at characterizing the responses of dopamine neurons as the temporal difference error within an RL framework, the implementation of RL algorithms in the rest of the basal ganglia has been unclear. A key missing aspect has been the lack of a RL implementation that is consistent with the distinction of direct- and indirect SPNs. This paper proposes a new model that is able to learn successfully in simple RL tasks and explains recent experimental results.

      The author shows that their proposed model, unlike previous implementations, this model can perform well in RL tasks. The new model allows them to make experimental predictions. They test some of these predictions and show that the dynamics of dSPNs and iSPNs correspond to model predictions.

      More generally, this new model can be used to understand striatal dynamics across direct and indirect SPNs in future experiments.

      We thank the reviewer for the positive comments.

      Weaknesses:

      The authors could characterize better the reliability of their experimental predictions and the description of the parameters of some of the simulations.

      In addition to the descriptions in the Methods, we have provided code implementing the key features of our simulations, which should contribute to reproducibility of our results.

      The authors propose some ideas about how the specificity of the striatal efferent inputs but should highlight better that this is a key feature of the model whose anatomical implementation has yet to be resolved.

      We have clarified in the Discussion section “Biological substrates of striatal efferent inputs” that these represent assumptions or predictions that have not yet been demonstrated experimentally.

      Reviewer #3:

      Summary:

      This paper points out an inconsistency of the roles of the striatal spiny neurons projecting to the indirect pathway (iSPN) and the synaptic plasticity rule of those neurons expressing dopamine D2 receptors and proposes a novel, intriguing mechanisms that iSPNs are activated by the efference copy of the chosen action that they are supposed to inhibit.

      The proposed model was supported by simulations and analysis of the neural recording data during spontaneous behaviors.

      Strengths:

      Previous models suggested that the striatal neurons learn action-value functions, but how the information about the chosen action is fed back to the striatum for learning was not clear. The author pointed out that this is a fundamental problem for iSPNs that are supposed to inhibit specific actions and its synaptic inputs are potentiated with dopamine dips.

      The authors propose a novel hypothesis that iSPNs are activated by efference copy of the selected action which they are supposed to inhibit during action selection. Even though intriguing and seemingly unnatural, the authors demonstrated that the model based on the hypothesis can circumvent the problem of iSPNs learning to disinhibit the actions associated with negative reward errors. They further showed by analyzing the cell-type specific neural recording data by Markowitz et al. (2018) that iSPN activities tend to be anti-correlated before and after action selection.

      We thank the reviewer for the positive comments.

      Weaknesses:

      It is not correct to call the action value learning using the externally-selected action as “offpolicy.” Both off-policy algorithm Q-learning and on-policy algorithm SARSA update the action value of the chosen action, which can be different from the greedy action implicated by the present action values. In standard reinforcement learning terminology, on-policy or off-policy is regarding the actions in the subsequent state, whether to use the next action value of (to be) chosen action or that of greedy choice as in equation (7).

      It is worth noting that this paper suggested that dopamine neurons encode on-policy TD errors: Morris G, Nevet A, Arkadir D, Vaadia E, Bergman H (2006). Midbrain dopamine neurons encode decisions for future action. Nat Neurosci, 9, 1057-63. https://doi.org/10.1038/nn1743.

      We regret that we do not completely follow the reviewer’s comment. We use “off-policy” to refer to the fact that, considered in isolation, the basal ganglia reinforcement learning system that we model learns a target policy that may be distinct from the behavioral policy of the organism as a whole.

      It is also confusing to contract TD learning and Q-learning, as the latter is considered as one type of TD learning. In the TD error signal by state value function (6) is dependent on the chosen action at−1 implicitly in rt and st based on the reward and state transition function.

      We agree that this was confusing. We have therefore changed the places in our paper where we intended to refer to “TD learning of a value function V (s)” to specifically mention V (s), rather than just “TD learning.”

      It is not clear why interferences of the activities for action selection and learning can be avoided, especially when actions are taken with short intervals or even temporal overlaps. How can the efference copy activation for the previous action be dissociated with the sensory cued activation for the next action selection?

      The non-interference arises from the orthogonality of the difference (action selection) and sum (efference copy) modes, as described in Figure 3. However, we agree with the reviewer that the problem of temporal credit assignment, when many actions are taken before reward feedback is obtained, is present in our model, as in any standard RL model.

      Although it may be difficult to single out the neural pathway that carries the efference copy signal to the striatum, it is desired to consider their requirements and difference possibilities. A major issue is that the time delay from actions to reward feedback can be highly variable.

      An interesting candidate is the long-latency neurons in the CM thalamus projecting to striatal cholinergic interneurons, which are activated following low-reward actions: Minamimoto T, Hori Y, Kimura M (2005). Complementary process to response bias in the centromedian nucleus of the thalamus. Science, 308, 1798-801. https://doi.org/10.1126/science.1109154.

      We are grateful for the interesting suggestion and reference, which we have added to the manuscript. However, we note that the issue of delayed reward feedback may also be partially addressed by using a sufficiently long eligibility trace.

      In the paragraph before Eq. (3), Eq. (1) should be Eq. (2) for the iSPN.

      Corrected.

    1. Author response:

      eLife Assessment

      This manuscript offers important insights into how polyphosphate (polyP) influences protein phase separation differently from DNA. The authors present compelling evidence that polyP distinguishes between protein conformational states, leading to diverse condensate behaviors. However, differences in charge density between polyP and DNA complicate direct comparisons, and the extent to which polyP-driven phase transitions reveal initial protein states remains unclear. Addressing these concerns would strengthen the manuscript's impact for researchers interested in biomolecular condensates, protein dynamics, and stress response mechanisms.

      We thank the editorial team for the favorable assessment. We, however, contend the specific point on the difference in charge density. We have already performed experiments wherein a higher concentration of DNA is used to match the overall ‘concentration of charges’ as in the experiments with polyP (see Figure S6), and we do not identify or observe any differences in the maturation behavior with DNA, i.e. we see only dissolution at both higher and lower concentrations of DNA. Charge density (i.e. the number of charges per unit volume of the polymer), on the other hand, is an intrinsic feature of the polymer which is naturally different between DNA and polyP. In fact, the primary result of our work is our observation that polyP can discern the starting ensembles more efficiently, likely through actively engaging and interacting with the ensemble while DNA appears to be a passive player. 

      Reviewer #1 (Public review):

      Summary:

      In the article titled "Polyphosphate discriminates protein conformational ensembles more efficiently than DNA promoting diverse assembly and maturation behaviors," Goyal and colleagues investigate the role of negatively charged biopolymers, i.e., polyphosphate (polyP) and DNA, play in phase separation of cytidine repressor (CytR) and fructose repressor (FruR). The authors find that both negative polymers drive the formation of metastable protein/polymer condensates. However, polyPdriven condensates form more gel- or solid-like structures over time while DNA-driven condensates tend to dissipate over time. The authors link this disparate condensate behavior to polyP-induced structures within the enzymes. Specifically, they observe the formation of polyproline II-like structures within two tested enzyme variants in the presence of polyP. Together their results provide a unique insight into the physical and structural mechanism by which two unique negatively charged polymers can induce distinct phase transitions with the same protein. This study will be a welcomed addition to the condensate field and provide new molecular insights into how binding partner-induced structural changes within a given protein can affect the mesoscale behavior of condensates. The concerns outlined below are meant to strengthen the manuscript.

      Strengths:

      Throughout the article, the authors used the correct techniques to probe physical changes within proteins that can be directly linked to phase transition behaviors. Their rigorous experiments create a clear picture of what occurs at the molecular level with CytR and FruR are exposed to either DNA or polyP, which are unique, highly negatively charged biopolymers found within bacteria. This work provides a new view of mechanisms by which bacteria can regulate the cytoplasmic organization upon the induction of stress. Furthermore, this is likely applicable to mammalian and plant cells and likely to numerous proteins that undergo condensation with nucleic acids and other charged biopolymers.

      Weaknesses:

      The biggest weakness of this study is that compares the phase behavior of enzymes driven by negatively charged polymers that have intrinsic differences in net charge and charge density. Because these properties are extremely important for controlling phase separation, any differences may result in the observed phase transitions driven by DNA and polyP. The authors should perform an additional experiment to control for these differences as best they can. The results from these experiments will provide additional insight into the importance of charge-based properties for controlling phase transitions.

      We thank the reviewer for providing a positive review of our work. On the comment related to the final paragraph, we note that we have already conducted an experiment with a higher DNA concentration (11.24 µM) to explore if the concentration of charges plays any significant role. The results of this experiment are presented in Figure S6. We observe that even at a higher DNA concentration, the condensates dissolve over time. Therefore, the difference in the maturation behavior of condensates with varying initial protein ensembles is due to the nature of polyP (likely through its enhanced flexibility). 

      Reviewer #2 (Public review):

      Summary:

      In this study, Goyal et al demonstrate that the assembly of proteins with polyphosphate into either condensates or aggregates can reveal information on the initial protein ensemble. They show that, unlike DNA, polyphosphate is able to effectively discriminate against initial protein ensembles with different conformational heterogeneity, structure, and compactness. The authors further show that the protein native ensemble is vital on whether polyphosphate induces phase separation or aggregation, whereas DNA induces a similar outcome regardless of the initial protein ensemble. This work provides a way to improve our mechanistic understanding of how conformational transitions of proteins may regulate or drive LLPS condensate and aggregate assemblies within biological systems.

      Strengths:

      This is a thoroughly conducted study that provides an alternative route for inducing phase separation that is more informative on the initial protein ensemble involved. This is particularly useful and a complementary means to investigate the role played by protein dynamics and plasticity in phase transitions. The authors use an appropriate set of techniques to investigate unique phase transitions within proteins induced by polyphosphates. An alternative protein system is used to corroborate their findings that the unique assemblies induced by polyphosphates when compared to DNA are not restricted to a single system. The work here is well-documented, easy to interpret, and of relevance for the condensate community.

      Weaknesses:

      The major weakness of this manuscript is that it is unclear if the information on the initial protein conformational ensemble can be determined solely from the assembly and maturation behavior and the discrimination abilities of polyphosphates. In both systems studied (CytR and FruR), polyphosphate discriminates and results in unique assemblies and maturation behaviors based on the initial protein ensemble. However, it seems the assembly and maturation behavior are not a direct result of the degree of conformational dynamics and plasticity in the initial protein. In the case of CytR, the fully-folded system forms condensates that resolubilize, while the highly disordered state immediately aggregates. Whereas, in the case of FruR, the folded state induces spontaneous aggregation, and the more dynamic, molten globular, system results in short-lived condensates. These results seem to suggest the polyphosphates' ability to discriminate between the initial protein ensemble may not be able to reveal what that initial protein ensemble is unless it is already known.

      We thank the reviewer for providing constructive comments on our work. On the final paragraph: we agree that the outcome does not provide information on nature of the starting ensemble. As of now, our experimental results are primarily observations on questions related to maturation outcomes when protein ensembles of varying structure, compactness and stability interact with polyP. if there are differences in the native ensemble due to mutations (which at times cannot be revealed by ensemble probes), polyP appears to discern it more efficiently than DNA.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study aimed to investigate the effects of optically stimulating the A13 region in healthy mice and a unilateral 6-OHDA mouse model of Parkinson's disease (PD). The primary objectives were to assess changes in locomotion, motor behaviors, and the neural connectome. For this, the authors examined the dopaminergic loss induced by 6-OHDA lesioning. They found a significant loss of tyrosine hydroxylase (TH+) neurons in the substantia nigra pars compacta (SNc) while the dopaminergic cells in the A13 region were largely preserved. Then, they optically stimulated the A13 region using a viral vector to deliver the channelrhodopsine (CamKII promoter). In both sham and PD model mice, optogenetic stimulation of the A13 region induced pro-locomotor effects, including increased locomotion, more locomotion bouts, longer durations of locomotion, and higher movement speeds. Additionally, PD model mice exhibited increased ipsi lesional turning during A13 region photoactivation. Lastly, the authors used whole-brain imaging to explore changes in the A13 region's connectome after 6-OHDA lesions. These alterations involved a complex rewiring of neural circuits, impacting both afferent and efferent projections. In summary, this study unveiled the pro-locomotor effects of A13 region photoactivation in both healthy and PD model mice. The study also indicates the preservation of A13 dopaminergic cells and the anatomical changes in neural circuitry following PD-like lesions that represent the anatomical substrate for a parallel motor pathway.

      Strengths:

      These findings hold significant relevance for the field of motor control, providing valuable insights into the organization of the motor system in mammals. Additionally, they offer potential avenues for addressing motor deficits in Parkinson's disease (PD). The study fills a crucial knowledge gap, underscoring its importance, and the results bolster its clinical relevance and overall strength.

      The authors adeptly set the stage for their research by framing the central questions in the introduction, and they provide thoughtful interpretations of the data in the discussion section. The results section, while straightforward, effectively supports the study's primary conclusion - the pro-locomotor effects of A13 region stimulation, both in normal motor control and in the 6-OHDA model of brain damage.

      We thank the reviewer for their positive comments.

      Weaknesses:

      (1) Anatomical investigation. I have a major concern regarding the anatomical investigation of plastic changes in the A13 connectome (Figures 4 and 5). While the methodology employed to assess the connectome is technically advanced and powerful, the results lack mechanistic insight at the cell or circuit level into the pro-locomotor effects of A13 region stimulation in both physiological and pathological conditions. This concern is exacerbated by a textual description of results that doesn't pinpoint precise brain areas or subareas but instead references large brain portions like the cortical plate, making it challenging to discern the implications for A13 stimulation. Lastly, the study is generally well-written with a smooth and straightforward style, but the connectome section presents challenges in readability and comprehension. The presentation of results, particularly the correlation matrices and correlation strength, doesn't facilitate biological understanding. It would be beneficial to explore specific pathways responsible for driving the locomotor effects of A13 stimulation, including examining the strength of connections to well-known locomotor-associated regions like the Pedunculopontine nucleus, Cuneiformis nucleus, LPGi, and others in the diencephalon, midbrain, pons, and medulla.

      We initially considered two approaches. The first was to look at specific projections to the motor regions, focusing on the MLR. The second was to utilize a whole-brain analysis, which is presented here. Given what we know about the zona incerta, especially its integrative role, we felt that examining the full connectome was a reasonable starting point.

      The value of the whole-brain approach is that it provides a high-level overview of the afferents and efferents to the region. The changes in the brain that occur following Parkinson-like lesions, such as those in the nigrostriatal pathway, are complex and can affect neighbouring regions such as the A13. Therefore, we wished to highlight the A13, which we considered a therapeutic target, and examine changes in connectivity that could occur following acute lesions affecting the SNc. We acknowledge that this study does not provide a causal link, but it presents the fundamental background information for subsequent hypothesis-driven, focused, region-specific analysis.

      The terms provided were taken from the Allen Brain Atlas terminology and presented as abbreviations. We have added two new figures focusing on motor regions to make the information more comprehensible (new Figures 4 and 5) and rewrote the connectomics section to make it easier to understand.

      Additionally, identifying the primary inputs to A13 associated with motor function would enhance the study's clarity and relevance.

      This is a great point to help simplify the whole-brain results. We have presented the motor-related inputs and outputs as part of a new figure in the main paper (Figure 5) and added accompanying text in the results section. We have also updated the correlation matrices to concentrate on motor regions (Figure 4). This highlights possible therapeutic pathways. We have also enhanced our discussion of these motor-related pathways. We have retained the entire dataset and added it to our data repository for those interested.

      The study raises intriguing questions about compensatory mechanisms in Parkinson's disease and a new perspective on the preservation of dopaminergic cells in A13, despite the SNc degeneration, and the plastic changes to input/output matrices. To gain inspiration for a more straightforward reanalysis and discussion of the results, I recommend the authors refer to the paper titled "Specific populations of basal ganglia output neurons target distinct brain stem areas while collateralizing throughout the diencephalon from the David Kleinfeld laboratory." This could guide the authors in investigating motor pathways across different brain regions.

      Thank you for the advice. As pointed out, Kleinfeld’s group presented their data in a nice, focused way. For the connectomic piece, we have added Figure 5, which provides a better representation than our previous submission.

      (2) Description of locomotor performance. Figure 3 provides valuable data on the locomotor effects of A13 region photoactivation in both control and 6-OHDA mice. However, a more detailed analysis of the changes in locomotion during stimulation would enhance our understanding of the pro-locomotor effects, especially in the context of 6-OHDA lesions. For example, it would be informative to explore whether the probability of locomotion changes during stimulation in the control and 6-OHDA groups. Investigating reaction time, speed, total distance, and could reveal how A13 is influencing locomotion, particularly after 6-OHDA lesions. The laboratory of Whelan has a deep knowledge of locomotion and the neural circuits driving it so these features may be instructive to infer insights on the neural circuits driving movement. On the same line, examining features like the frequency or power of stimulation related to walking patterns may help elucidate whether A13 is engaging with the Mesencephalic Locomotor Region (MLR) to drive the pro-locomotor effects. These insights would provide a more comprehensive understanding of the mechanisms underlying A13-mediated locomotor changes in both healthy and pathological conditions.

      Thank you for these suggestions. We have reorganized Figure 3 to highlight the metrics by separating the 6-OHDA from the Sham experiments (3F-J, which highlights distance travelled, average speed and duration). We have also added additional text to highlight these metrics better in the text. We have relabelled Supplementary Figure S3, which presents reaction time as latency to initiate locomotion and updated the main text to address the reviewers' points.

      Reviewer #2 (Public Review):

      Summary:

      The paper by Kim et al. investigates the potential of stimulating the dopaminergic A13 region to promote locomotor restoration in a Parkinson's mouse model. Using wild-type mice, 6-OHDA injection depletes dopaminergic neurons in the substantia nigra pars compacta, without impairing those of the A13 region and the ventral tegmentum area, as previously reported in humans. Moreover, photostimulation of presumably excitatory (CAMKIIa) neurons in the vicinity of the A13 region improves bradykinesia and akinetic symptoms after 6-OHDA injection. Whole-brain imaging with retrograde and anterograde tracers reveals that the A13 region undergoes substantial changes in the distribution of its afferents and projections after 6-OHDA injection. The study suggests that if the remodeling of the A13 region connectome does not promote recovery following chronic dopaminergic depletion, photostimulation of the A13 region restores locomotor functions.

      Strengths:

      Photostimulation of presumably excitatory (CAMKIIa) neurons in the vicinity of the A13 region promotes locomotion and locomotor recovery of wild-type mice 1 month after 6-OHDA injection in the medial forebrain bundle, thus identifying a new potential target for restoring motor functions in Parkinson's disease patients.

      Weaknesses:

      Electrical stimulation of the medial Zona Incerta, in which the A13 region is located, has been previously reported to promote locomotion (Grossman et al., 1958). Recent mouse studies have shown that if optogenetic or chemogenetic stimulation of GABAergic neurons of the Zona Incerta promotes and restores locomotor functions after 6-OHDA injection (Chen et al., 2023), stimulation of glutamatergic ZI neurons worsens motor symptoms after 6-OHDA (Lie et al., 2022).

      Thank you - we have added this reference. It is helpful as Grossman did stimulate the zona incerta in the cat and elicit locomotion, suggesting that stimulation of the area in normal mice has external validity. Grossman’s results prompted a later clinical examination of the zona incerta, but it concentrated on the zona incerta regions close to the subthalamic regions (Ossowska 2019), further caudal to the area we focused on. Chen et al. (2023) targeted the area in the lateral aspect of central/medial zona incerta, formed by dorsal and ventral zona incerta, which may account for the differing results. Our data were robust for stimulation of the medial aspect of the rostromedial zona incerta. The thigmotactic behaviour that we observed in our work that focused on CamKII neurons has not been observed with chemogenetic, optogenetic activation or with photoinhibition of GABAergic central/medial ZI (Chen et al. 2023).

      GABAergic activation of mZI to Cuneiform projections (Sharma et al. 2024) also did not produce thigmotactic behavior. We have added these points to the discussion.

      Although CAMKIIa is a marker of presumably excitatory neurons and can be used as an alternative marker of dopaminergic neurons, behavioral results of this study raise questions about the neuronal population targeted in the vicinity of the A13 region. Moreover, if YFP and CHR2-YFP neurons express dopamine (TH) within the A13 region (Fig. 2), there is also a large population of transduced neurons within and outside of the A13 region that do not, thus suggesting the recruitment of other neuronal cell types that could be GABAergic or glutamatergic.

      We found that CamKII transfection of the A13 region was extremely effective in promoting locomotor activity, which was critical for our work in exploring its possible therapeutic potential. We have since quantified the cell number, we found that the c-fos cell number was increased following ChR2 activation. There is evidence of TH activation - but the data suggest that other cell types contribute. C-fos alone is a blunt tool to assess specificity - rather, it is better at showing overall photostimulus efficacy - which we have demonstrated. Moreover, there is evidence that cell types are not purely dopaminergic, with GABA co-localized (Negishi et al. 2020). We acknowledge that specific viral approaches that target the GABAergic, glutamatergic, and dopaminergic circuits would be very useful. The range of tools to target A13 dopaminergic circuits is more limited than the SNc, for example, because the A13 region lacks DAT, and TH-IRES-Cre approaches, while helpful, are less specific than DAT-Cre mouse models. Intersectional approaches targeting multiple transmitters (glutamate & dopamine, for example) may be one solution as we do not expect that a single transmitter-specific pathway would work, as well as broad targeting of the A13 region. Our recent work suggests that GABAergic neuron activation may have more general effects on behaviour rather than control of ongoing locomotor parameters (Sharma et al. 2024). Recent work shows a positive valence effect of dopamine A13 activation on motivated food-seeking behavior, which differs from consummatory behavior observed with GABAergic modulation (Ye, Nunez, and Zhang 2023). Chemogenetic inactivation and ablation of dopaminergic A13 revealed that they contribute to grip strength and prehensile movements, uncoupling food-seeking grasping behavior from motivational factors (Garau et al. 2023). Overall, this suggests differing effects of GABA compared to DA and/or glutamatergic cell types, consistent with our effects of stimulating CamKII. The discussion has been updated.

      Regarding the analysis of interregional connectivity of the A13 region, there is a lack of specificity (the viral approach did not specifically target the A13 region), the number of mice is low for such correlation analyses (2 sham and 3 6-OHDA mice), and there are no statistics comparing 6-OHDA versus sham (Fig. 4) or contra- versus ipsilesional sides (Fig. 5). Moreover, the data are too processed, and the color matrices (Fig. 4) are too packed in the current format to enable proper visualization of the data. The A13 afferents/efferents analysis is based on normalized relative values; absolute values should also be presented to support the claim about their upregulation or downregulation.

      Generally, papers using tissue-clearing imaging approaches have low sample sizes due to technical complexity and challenges. The technical challenges of obtaining these data were substantial in both collection and analysis. There are multiple technical complexities arising from dual injections (A13 and MFB coordinates) and targeting the area correctly. The A13 region is difficult to target as it spans only around 300 µm in the anterior-posterior axis. While clearing the brain takes weeks, and light-sheet imaging also takes time, the time necessary to analyze the tissue using whole-brain quantification is labor intensive, especially with a lack of a standardized analysis pipeline from atlas registrations, signal segmentations, and quantifications. The field is still relatively new, requiring additional time to refine pipelines.

      Correlation matrices are often used in analyzing connectivity patterns on a brain-wide scale, as they can identify any observable patterns within a large amount of data. We used correlation matrices to display estimated correlation coefficients between the afferent and efferent proportions from one brain subregion to another across 251 brain regions in total in a pairwise manner (not for hypothesis testing). We provided descriptive statistics (mean and error bars) in the original Figure 5C and G. As mentioned in comments for Reviewer 1, we have now presented the data in revised Figure 4 and 5 that focuses specifically on motor-related pathways to provide information on possible pathways. The has simplified the correlation matrices and highlighted the differences in 6-OHDA efferent data especially. As suggested, raw values are shared in a supplemental file on our data repository.

      In the absence of changes in the number of dopaminergic A13 neurons after 6-OHDA injection, results from this correlation analysis are difficult to interpret as they might reflect changes from various impaired brain regions independently of the A13 region.

      We acknowledge that models of Parkinson’s disease, particularly those using 6-OHDA, induce plasticity in various regions, which may subsequently affect A13 connectivity. We aim to emphasize the residual, intact A13 pathways that could serve as therapeutic targets in future investigations. This emphasis is pertinent in the context of potential clinical applications, as the overall input and output to the region fundamentally dictate the significance of the A13 region in lesioned nigrostriatal models. We agree with the reviewer that the changes certainly can be independent of A13; however, the fact that there was a significant change in the connectome post-6-OHDA injection and striatonigral degeneration is in and of itself important to document. We have added a sentence acknowledging this limitation to the discussion.

      There is no causal link between anatomical and behavioral data, which raises questions about the relevance of the anatomical data.

      This point was also addressed earlier in response to a comment from Reviewer 1. Focusing on specific motor pathways is one avenue to explore. However, given that the zona incerta acts as an integrative hub, we believed it is prudent to initially examine both afferent and efferent pathways using a brain-wide approach. For instance, without employing this methodology, the potential significance of cortical interconnectivity to the A13 region might not have been fully appreciated. As mentioned previously, we will place additional emphasis on motor-related regions in our revised paper, thereby enhancing the relevance of the anatomical data presented. With these modifications, we anticipate that our data will underscore specific motor-related targets for future exploration, employing optogenetic targeting to assess necessity and sufficiency.

      Overall, the study does not take advantage of genetic tools accessible in the mouse to address the direct or indirect behavioral and anatomical contributions of the A13 region to motor control and recovery after 6-OHDA injection.

      Our study has not specifically targeted neurons that express dopaminergic, glutamatergic, or GABAergic properties (refer to earlier comment for more detail). However, like others, we find that targeting one neuronal population often does not result in a pure transmitter phenotype. For instance, evidence suggests co-localization of dopamine neurons with a subpopulation of GABA neurons in the A13/medial zona incerta (Negishi et al. 2020). In the hypothalamus, research by Deisseroth and colleagues (Romanov et al. 2017) indicates the presence of multiple classes of dopamine cells, each containing different ratios of co-localized peptides and/or fast neurotransmitters. Consequently, we believe our work lays the foundation for the investigations suggested by the reviewer. Furthermore, if one considers this work in the context of a preclinical study to determine whether the A13 might be a target in human Parkinson's disease, the existing technology that could be utilized is deep brain stimulation (DBS) or electrical modulation, which would also affect different neuronal populations in a non-specific manner.

      While optogenetic stimulation therapy is longer term, using CamKII combined with the DJ hybrid AAV could be a translatable strategy for targeting A13 neuronal populations in non-human primates (Watakabe et al. 2015; Watanabe et al. 2020). We have added to the discussion.

      Reviewer #3 (Public Review):

      Kim, Lognon et al. present an important finding on pro-locomotor effects of optogenetic activation of the A13 region, which they identify as a dopamine-containing area of the medial zona incerta that undergoes profound remodeling in terms of afferent and efferent connectivity after administration of 6-OHDA to the MFB. The authors claim to address a model of PD-related gait dysfunction, a contentious problem that can be difficult to treat with dopaminergic medication or DBS in conventional targets. They make use of an impressive array of technologies to gain insight into the role of A13 remodeling in the 6-OHDA model of PD. The evidence provided is solid and the paper is well written, but there are several general issues that reduce the value of the paper in its current form, and a number of specific, more minor ones. Also, some suggestions, that may improve the paper compared to its recent form, come to mind.

      Thank you for the suggestions and careful consideration of our work - it is appreciated.

      The most fundamental issue that needs to be addressed is the relation of the structural to the behavioral findings. It would be very interesting to see whether the structural heterogeneity in afferent/effects projections induced by 6-OHDA is related to the degree of symptom severity and motor improvement during A13 stimulation.

      As mentioned in comments for Reviewer 1, we have performed additional analysis and present this in Figure 5. We have also revised Figure 4, focusing on motor regions. Our work will provide a roadmap for future studies to disentangle divergent or convergent A13 pathways that are involved in different or all PD-related motor symptoms. Because we could not measure behavioural change in the same animals studied with the anatomic study (essentially because the optrode would have significantly disrupted the connectome we are measuring), we cannot directly compare behaviour to structure.

      The authors provide extensive interrogation of large-scale changes in the organization of the A13 region afferent and efferent distributions. It remains unclear how many animals were included to produce Fig 4 and 5. Fig S5 suggests that only 3 animals were used, is that correct? Please provide details about the heterogeneity between animals. Please provide a table detailing how many animals were used for which experiment. Were the same animals used for several experiments?

      The behavioral set and the anatomical set were necessarily distinct. In the anatomical experiments, we employed both anterograde and retrograde viral approaches to target the afferent and efferent A13 populations with fluorescent proteins. For the behavioral approach, a single ChR2 opsin was utilized to photostimulate the A13 region; hence combining the two populations was not feasible. We were also concerned that the optrode itself would interfere with connectomics. A lower number of animals were used for the whole-brain work due to technical limitations described earlier. We have now provided additional information regarding numbers in all figures and the text. Using Spearman’s correlation analysis, we found afferent and efferent proportions across animals to be consistent, with an average correlation of 0.91, which is reported in Figure S6.

      While the authors provide evidence that photoactivation of the A13 is sufficient in driving locomotion in the OFT, this pro-locomotor effect seems to be independent of 6-OHDA-induced pathophysiology. Only in the pole test do they find that there seems to be a difference between Sham vs 6-OHDA concerning the effects of photoactivation of the A13. Because of these behavioral findings, optogenic activation of A13 may represent a gain of function rather than disease-specific rescue. This needs to be highlighted more explicitly in the title, abstract, and conclusion.

      Optogenetic activation of A13 may represent a gain of function in both healthy and 6-OHDA mice, highlighting a parallel descending motor pathway that remains intact. 6-OHDA lesions have multiple effects on motor and cognitive function. This makes a single pathway unlikely to rescue all deficits observed in 6-OHDA models. The lack of locomotion observed in 6-OHDA models can be reversed by A13 region photostimulation. Therefore, this is a reversal of a loss of function, in this case. However, the increase in turning represents a gain of function. We have highlighted this as suggested in the discussion.

      The authors claim that A13 may be a possible target for DBS to treat gait dysfunction. However, the experimental evidence provided (in particular the lack of disease-specific changes in the OFT) seems insufficient to draw such conclusions. It needs to be highlighted that optogenetic activation does not necessarily have the same effects as DBS (see the recent review from Neumann et al. in Brain: https://pubmed.ncbi.nlm.nih.gov/37450573/). This is important because ZI-DBS so far had very mixed clinical effects. The authors should provide plausible reasons for these discrepancies. Is cell-specificity, which only optogenetic interventions can achieve, necessary? Can new forms of cyclic burst DBS achieve similar specificity (Spix et al, Science 2021)? Please comment.

      Thank you for the valuable comments. They have been incorporated into the discussion.

      Our study highlights a parallel motor pathway provided by the A13 region that remains intact in 6-OHDA mice and can be sufficiently driven to rescue the hypolocomotor pathology observed in the OFT and overcome bradykinesia and akinesia. The photoactivation of ipsilesional A13 also has an overall additive effect on ipsiversive circling, representing a gain of function on the intact side that contributes to the magnitude of overall motor asymmetry against the lesioned side. The effects of DBS are rather complex, ranging from micro-, meso-, to macro-scales, involving activation, inhibition, and informational lesioning, and network interactions. This could contribute to the mixed clinical effects observed with ZI-DBS, in addition to differences in targeting and DBS programming among the studies (see review (Ossowska 2019) ). Also the DBS studies targeting ZI have never targeted the rostromedial ZI which extends towards the hypothalamus and contains the A13. Furthermore, DBS and electrical stimulation of neural tissue, in general, are always limited by current spread and lower thresholds of activation of axons (e.g., axons of passage), both of which can reduce the specificity of the true therapeutic target. Optogenetic studies have provided mechanistic insights that could be leveraged in overcoming some of the limitations in targeting with conventional DBS approaches. Spix et al. (2021) provided an interesting approach highlighting these advancements. They devised burst stimulation to facilitate population-specific neuromodulation within the external globus pallidus. Moreover, they found a complementary role for optogenetics in exploring the pathway-specific activation of neurons activated by DBS. To ascertain whether A13 DBS may be a viable therapy for PD gait, it will be necessary to perform many more preclinical experiments, and tuning of DBS parameters could be facilitated by optogenetic stimulation in these murine models. We have added to the discussion.

      In a recent study, Jeon et al (Topographic connectivity and cellular profiling reveal detailed input pathways and functionally distinct cell types in the subthalamic nucleus, 2022, Cell Reports) provided evidence on the topographically graded organization of STN afferents and McElvain et al. (Specific populations of basal ganglia output neurons target distinct brain stem areas while collateralizing throughout the diencephalon, 2021, Neuron) have shown similar topographical resolution for SNr efferents. Can a similar topographical organization of efferents and afferents be derived for the A13/ ZI in total?

      The ZI can be subdivided into four subregions in the antero-posterior axis: rostral (ZIr), dorsal (ZId), ventral (ZIv), and caudal (ZIc) regions. The dorsal and ventral ZI is also referred together as central/medial/intermediate ZI. There are topographical gradients in different cell types and connectivity across these subregions (see reviews: (Mitrofanis 2005; Monosov et al. 2022; Ossowska 2019). Recent work by Yang and colleagues (2022) demonstrated a topographical organization among the inputs and outputs of GABAergic (VGAT) populations across four ZI subregions. Given that A13 region encompasses a smaller portion (the medial aspect) of both rostral and medial/central ZI (three of four ZI subregions) and coexpress VGAT, A13 region likely falls under rostral and intermediate medial ZI dataset found in Yang et al. (2022). With our data, we would not be able to capture the breadth of topographical organization shown in Yang et al (2022).

      In conclusion, this is an interesting study that can be improved by taking into consideration the points mentioned above.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Figure 2 indeed presents valuable information regarding the effects of A13 region photoactivation. To enhance the comprehensiveness of this figure and gain a deeper understanding of the neurons driving the pro-locomotor effect of stimulation, it would be beneficial to include quantifications of various cell types:

      • cFos-Positive Cells/TH-Positive Cells: it can help determine the impact of A13 stimulation on dopaminergic neurons and the associated pro-locomotor effect in the healthy condition and especially in the context of Parkinson's disease (PD) modeling.

      • cFos-Positive Cells /TH-Negative Cells: Investigating the number of TH-negative cells activated by stimulation is also important, as it may reveal non-dopaminergic neurons that play a role in locomotor responses. Identifying the location and characteristics of these TH-negative cells can provide insights into their functional significance.

      We have completed this analysis. The data is presented in Figure 2F, where we show increased c-fos intensity with photoactivation. We observed an increase in the number of cells activated in the A13 region. However, we did not definitively see increases in TH+ cells, suggesting a heterogeneous set of neurons responsible for the effects—possibly glutamatergic neurons.

      Incorporating these quantifications into Figure 2 would enhance the figure's informativeness and provide a more comprehensive view of the neuronal populations involved in the locomotor effects of A13 stimulation.

      We have added text and a new graph.

      (2) Refer to Figure 3. In the main text (page 5) when describing the animal with 6-OHDA the wrong panels are indicated. It is indicated in Figure 2A-E but it should be replaced with 3A-E.

      Please do that.

      Done, and we have updated the figure to improve readability, by separating the 6-OHDA findings from sham in all graphs.

      Reviewer #2 (Recommendations For The Authors):

      Abstract

      Page 1: Inhibitory or lesion studies will be necessary to support the claim that the global remodeling of afferent and efferent projections of the A13 region highlights the Zona Incerta's role as a crucial hub for the rapid selection of motor function.

      Overall, there is quite a bit of evidence that the zona incerta is a hub for afferent/efferents.

      Mitrofanis (2005) and, more recently, Wang et al. (2020) summarize some of the evidence. Yang (2022) illustrates that the zona incerta shows multiple inputs to GABAergic neurons and outputs to diverse regions. Recent work suggests that the zona incerta contributes to various motor functions such as hunting, exploratory locomotion, and integrating multiple modalities (Zhao et al. 2019; Wang et al. 2019; Monosov et al. 2022; Chometton et al. 2017). The introduction has been updated.

      Introduction

      Page 2, paragraph 2: "However, little attention has been placed on the medial zona incerta (mZI), particularly the A13, the only dopamine-containing region of the rostral ZI" Is the A13 region located in the rostral or medial ZI or both?

      It should have been written “rostromedial” ZI. The A13 is located in the medial aspect of rostromedial ZI. Introduction has been updated.

      Page 2, para 3: Li et al (2021) used a mini-endoscope to record the GCaMP6 signal. Masini and Kiehn, 2022 transiently blocked the dopaminergic transmission; they never used 6-OHDA.

      Please correct through the text.

      Corrected.

      Page 2, para 4: the A13 connectome encompasses the cerebral cortex,... MLR. The MLR is a functional region, correct this for the CNF and PPN.

      Corrected.

      Page 3, the last paragraph of the introduction could be clarified by presenting the behavioral data first, followed by the anatomy.

      This has been corrected

      Figure 1 is nice and clear, and well summarizes the experimental design.

      Thank you.

      Figure 2 shows an example of the extent of the ChR2-YFP expression and the position of an optical fiber tip above the dopaminergic A13 region from a mouse. Without any quantification, these images could be included in Figure 1. Despite a very small volume (36.8nL) of AAV, the extent of ChR2-YFP expression is quite large and includes dopaminergic and unidentified neurons within the A13 region but also a large population of unidentified neurons outside of it, thus raising questions about the volume and the types of neurons recruited.

      This is an important consideration. The issue of viral spread is complex and depends on factors including tissue type, serotype, and promotor of the virus. Li et al. (2021), for example, used different virus serotypes and promotors, injecting 150nL, whereas we used AAV DJ, injecting 36.8nL. AAV-DJ is a hybrid viral type consisting of multiple serotypes. It has a high transduction efficiency, which leads to greater gene delivery than single-serotype AAV viral constructs (Mao et al. 2016). A secondary consideration regarding translation was that AAV-DJ could effectively transduce non-primate neurons (Watanabe et al. 2020). We have addressed the issue of neurons recruited earlier, provided c-Fos quantification, and provided a new supplementary figure showing viral spread (Figure S1).

      Anatomical reconstruction of the extent of the ChR2-YFP expression and the location of the tip of the optical fiber will be necessary to confirm that ChR2-YFP expression was restricted to the A13 region.

      We will provide additional information regarding viral spread, ferrule tip placement, and c-fos cell counts. This has been done in Figure 2 and we also present a new Figure S1 where we have quantified the viral spread.

      Page 5, 1st para: Double-check the references, as not all of them are 6-OHDA injections in the MLF.

      Corrected. Removed Kiehn reference.

      Page 5, 1st para, 4th line: Replace ferrule with optical canula or fiber.

      Done

      Page 5, 1st para, 9th line: Replace Figure 2 with Figure 3.

      Done

      Page 5, 2nd para: About the refractory decrease in traveled distance by sham-ChR2 mice: is this significant?

      It was not significant (Figure S1C, 1-way RM ANOVA: F5,25 = 0.486, P \= 0.783). This has been updated in the text.

      Figure 3 showing behavioral assessments is nice, but the stats are not always clear. In Fig 3A, are each of the off and on boxes 1 minute long? The figure legend states the test lasts 1 min, but isn't it 4 minutes? In Figure 3B-E and 3J-M, what are the differences? Do the stats identify a significant difference only during the stimulation phase? Fig. 3F-I are nice and could have been presented as primary examples prior to data analysis in Fig. 3B-E. Group labels above the graph would help.

      Yes, the off-on boxes are 1 minute long. The error is corrected in the legend. Great suggestion for F-I - they have been moved ahead of the summary figures. We have also updated new Fig 3F-,I, J, L, M) to make the differences between 6-OHDA and sham graphs easier to visualize. The stats do indicate a significant difference during the stimulation phase. We have added group labels, and reorganized the figure, and it is much easier to read now.

      Fig. 3L-M, what do PreSur, Post, and Ferrule mean? I assume that Ferrule refers to mice tested with the optical fiber without stimulation, whereas Stim. refers to the stimulation. It would be helpful to standardize the format of stats in Fig. 3B-E and 3-J-M. What are time points a, b, and c referring to?

      We have renamed the figure names to be more intuitive. We have standardized the presentation of statistics in the figure, and eliminated the a,b,c nomenclature. We have also updated the caption to provide descriptions of the tests in Fig 3 L-M.

      Figure S2A: the higher variability in 6-OHDA-YFP mice in comparison to 6-OHDA-ChR2 mice prior to stimulation suggests that 6-OHDA-YFP mice were less impaired. Why use boxplots only for these data? Would a pairwise comparison be more appropriate?

      We have removed these plots from Figure S2. We now present the Baseline to Pre values across the experimental timespan to illustrate the fact that distance travelled returned to baseline values for all trials conducted.

      Fig. S2B: add the statistical marker.

      We have removed this from Figure S2.

      Page 7, para 1, line 8: to add "in comparison to 6-OHDA-YFP and YFP mice" to during photostimulation... (Figure 3E).

      Done

      Page 7, para 3, line 5: about larger improvement, replace "sham ChR2" with "6-OHDA."

      Done

      Page 8, para 1, line 4: Perier et al., 2000 reported that 6-OHDA injection increased the firing frequency of the ZI over a month.

      Added the timeframe to this sentence.

      Page 8, para 2, line 1: Since the results were expected, add some references.

      Done.

      Page 8, para 3, line 4. Double-check the reference.

      Corrected.

      Page 8: About large-scale changes in the A13 region, the relevance of correlation matrices is difficult to grasp. Analysis of local connectivity would have been more informative in the context of GABAergic and glutamatergic neurons of the ZI in the vicinity of the A13 region.

      We have updated the figures for connectivity throughout the manuscript. Overall, there are new Figures 4 and 5 in the main text. We also provide a revised Supplementary Figure 8. Unfortunately, we could not do that experiment regarding local connectivity. In light of our new work (Sharma et al. 2024), it is clear that this will be critical going forward.

      Page 8, para 3, line: given Fig. 2, there is concern about the claim that only the A13 region was targeted. The time of the analysis after 6-OHDA should be mentioned. Some sections of the paragraph could be moved to methods.

      We have provided more information about the viral spread in the text and Supplementary Figure 1. The functional and anatomical experiments are separate, which we realize caused confusion. We have mentioned analysis time after 6-OHDA and inserted this into the text.

      Fig. 4: The color code helps the reader visualize distribution differences. However, statistical analyses comparing 6-OHDA versus sham should be included. Quantification per region would greatly help readers visualize the data and support the conclusion. The relationship between the type of correlation (positive or negative) and absolute change (increase or decrease) is unknown in the current format, which limits the interpretation of the data. Moreover, examples of raw images of axons and cells should be presented for several brain regions. The experimental design with a timeline, as in Fig. 1, would be helpful. The legend for Fig. 4 is a bit long. Some sections are very descriptive, whereas others are more interpretive.

      We have provided a new Figure 5 where we present quantification per region, and the correlation matrices have been updated in Figure 4. We have also focused on motor regions as mentioned earlier. We also provide examples of raw regions in Supplementary Figure 8. Raw values are shared on our data repository.

      Page 10, para 1, line 1: add "afferent" to "changes in -afferent and- projection patterns."

      Done

      Page 10, para 1, line 9: remove the 2nd "compared to sham" in the sentence.

      Done

      Page 10, para 1, line 10: remove "coordinated" in "several regions showed a coordinated reduction in afferent density." We cannot say anything about the timing of events, as there is only info at 1 month.

      Done

      Page 10, para 2: the section should be written in the past tense.

      Done

      Page 13, para 2, the last sentence is overstated. Please remove "cells" and refer to the A13 region instead.

      Done

      About differential remodelling of the A13 region connectome: Figure 5C and 5G: The proportion of total afferents ipsi- and contralateral to 6-OHDA injection argues that the A13 region primarily receives inputs from the cortical plate and the striatum. Unfortunately, there are no statistics.

      Due to the small sample size, we provided descriptive statistics (mean and error bars) in Figure 5A. As mentioned in comments for Reviewers 1 and 2, we have revised Figure 5 to present data focusing on motor-related pathways to provide clarity. In addition, absolute values are shared on our data repository.

      Figure 5 D and 5H: Changes in the proportion of total afferents/projections are relatively modest (less than 10% of the whole population for the highest changes). There is no standard deviation for these data and no statistics. Do they reflect real changes or variability from the injection site?

      The changes are relatively modest (less than 10%) since a small brain region usually provides a small proportion of total input (McElvain et al. 2021; Yang et al. 2022). The changes in the proportions reflect real differences between average proportions observed in sham and 6-OHDA mice. The variability in the total labelling of neurons and fibers was minimized by normalizing individual regional counts against total counts found in each animal. This figure has been updated as reviewers requested.

      Fig 5F and H: The example in F shows a huge decrease in the striatum, but H indicates only a 2% change, which makes the example not very representative. Absolute values would be helpful.

      While a 2% change may seem small, it represents a relatively large change in the A13 efferent connectome. To provide further clarity, we have provided absolute values as suggested in our new supplemental table.

      Figure 6 is inaccurate and unnecessary.

      Figure 6 has been removed.

      Discussion

      Although interesting, the discussion is too long.

      The discussion has been reduced by about three quarters of a page.

      Methods

      Page 17, para 1: include the stereotaxic coordinates of the optical cannula above the A13 region.

      Added.

      References

      Chen, Fenghua, Junliang Qian, Zhongkai Cao, Ang Li, Juntao Cui, Limin Shi, and Junxia Xie. 2023. “Chemogenetic and Optogenetic Stimulation of Zona Incerta GABAergic Neurons Ameliorates Motor Impairment in Parkinson’s Disease.” i Science 26 (7). https://doi.org/ 10.1016/j.isci.2023.107149.

      Chometton, S., K. Charrière, L. Bayer, C. Houdayer, G. Franchi, F. Poncet, D. Fellmann, and P. Y. Risold. 2017. “The Rostromedial Zona Incerta Is Involved in Attentional Processes While Adjacent LHA Responds to Arousal: C-Fos and Anatomical Evidence.” Brain Structure & Function 222 (6): 2507–25.

      Garau, Celia, Jessica Hayes, Giulia Chiacchierini, James E. McCutcheon, and John Apergis-Schoute. 2023. “Involvement of A13 Dopaminergic Neurons in Prehensile Movements but Not Reward in the Rat.” Current Biology: CB, October.

      https://doi.org/ 10.1016/j.cub.2023.09.044.

      Li, Zhuoliang, Giorgio Rizzi, and Kelly R. Tan. 2021. “Zona Incerta Subpopulations Differentially Encode and Modulate Anxiety.” Science Advances 7 (37): eabf6709.

      Mao, Yingying, Xuejun Wang, Renhe Yan, Wei Hu, Andrew Li, Shengqi Wang, and Hongwei Li. 2016. “Single Point Mutation in Adeno-Associated Viral Vectors -DJ Capsid Leads to Improvement for Gene Delivery in Vivo.” BMC Biotechnology 16 (January):1.

      McElvain, Lauren E., Yuncong Chen, Jeffrey D. Moore, G. Stefano Brigidi, Brenda L. Bloodgood, Byung Kook Lim, Rui M. Costa, and David Kleinfeld. 2021. “Specific Populations of Basal Ganglia Output Neurons Target Distinct Brain Stem Areas While Collateralizing throughout the Diencephalon.” Neuron 109 (10): 1721–38.e4.

      Mitrofanis, J. 2005. “Some Certainty for the ‘Zone of Uncertainty’? Exploring the Function of the Zona Incerta.” Neuroscience 130 (1): 1–15.

      Monosov, Ilya E., Takaya Ogasawara, Suzanne N. Haber, J. Alexander Heimel, and Mehran Ahmadlou. 2022. “The Zona Incerta in Control of Novelty Seeking and Investigation across Species.” Current Opinion in Neurobiology 77 (December):102650.

      Negishi, Kenichiro, Mikayla A. Payant, Kayla S. Schumacker, Gabor Wittmann, Rebecca M.  Butler, Ronald M. Lechan, Harry W. M. Steinbusch, Arshad M. Khan, and Melissa J. Chee. 2020. “Distributions of Hypothalamic Neuron Populations Coexpressing Tyrosine Hydroxylase and the Vesicular GABA Transporter in the Mouse.” The Journal of Comparative Neurology 528 (11): 1833–55.

      Ossowska, Krystyna. 2019. “Zona Incerta as a Therapeutic Target in Parkinson’s Disease.” Journal of Neurology. https://doi.org/ 10.1007/s00415-019-09486-8.

      Romanov, Roman A., Amit Zeisel, Joanne Bakker, Fatima Girach, Arash Hellysaz, Raju Tomer, Alán Alpár, et al. 2017. “Molecular Interrogation of Hypothalamic Organization Reveals Distinct Dopamine Neuronal Subtypes.” Nature Neuroscience 20 (2): 176–88.

      Sharma, Sandeep, Cecilia A. Badenhorst, Donovan M. Ashby, Stephanie A. Di Vito, Michelle A. Tran, Zahra Ghavasieh, Gurleen K. Grewal, Cole R. Belway, Alexander McGirr, and Patrick J. Whelan. 2024. “Inhibitory Medial Zona Incerta Pathway Drives Exploratory Behavior by Inhibiting Glutamatergic Cuneiform Neurons.” Nature Communications 15 (1): 1160.

      Spix, Teresa A., Shruti Nanivadekar, Noelle Toong, Irene M. Kaplow, Brian R. Isett, Yazel  Goksen, Andreas R. Pfenning, and Aryn H. Gittis. 2021. “Population-Specific Neuromodulation Prolongs Therapeutic Benefits of Deep Brain Stimulation.” Science 374 (6564): 201–6.

      Wang, Xiyue, Xiaolin Chou, Bo Peng, Li Shen, Junxiang J. Huang, Li I. Zhang, and Huizhong W. Tao. 2019. “A Cross-Modality Enhancement of Defensive Flight via Parvalbumin Neurons in Zona Incerta.” eLife 8 (April). https://doi.org/ 10.7554/eLife.42728.

      Wang, Xiyue, Xiao-Lin Chou, Li I. Zhang, and Huizhong Whit Tao. 2020. “Zona Incerta: An Integrative Node for Global Behavioral Modulation.” Trends in Neurosciences 43 (2): 82–87.

      Watakabe, Akiya, Masanari Ohtsuka, Masaharu Kinoshita, Masafumi Takaji, Kaoru Isa, Hiroaki Mizukami, Keiya Ozawa, Tadashi Isa, and Tetsuo Yamamori. 2015. “Comparative Analyses of Adeno-Associated Viral Vector Serotypes 1, 2, 5, 8 and 9 in Marmoset, Mouse and Macaque Cerebral Cortex.” Neuroscience Research 93 (April):144–57.

      Watanabe, Hidenori, Hiromi Sano, Satomi Chiken, Kenta Kobayashi, Yuko Fukata, Masaki  Fukata, Hajime Mushiake, and Atsushi Nambu. 2020. “Forelimb Movements Evoked by Optogenetic Stimulation of the Macaque Motor Cortex.” Nature Communications 11 (1): 3253.

      Yang, Yang, Tao Jiang, Xueyan Jia, Jing Yuan, Xiangning Li, and Hui Gong. 2022. “Whole-Brain Connectome of GABAergic Neurons in the Mouse Zona Incerta.” Neuroscience Bulletin 38 (11): 1315–29.

      Ye, Qiying, Jeremiah Nunez, and Xiaobing Zhang. 2023. “Zona Incerta Dopamine Neurons Encode Motivational Vigor in Food Seeking.” bioRxiv: The Preprint Server for Biology, June. https://doi.org/ 10.1101/2023.06.29.547060.

      Zhao, Zheng-Dong, Zongming Chen, Xinkuan Xiang, Mengna Hu, Hengchang Xie, Xiaoning Jia, Fang Cai, et al. 2019. “Zona Incerta GABAergic Neurons Integrate Prey-Related Sensory Signals and Induce an Appetitive Drive to Promote Hunting.” Nature Neuroscience 22 (6): 921–32.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      Summary

      In this extensive comparative study, Moreno-Borrallo and colleagues examine the relationships between plasma glucose levels, albumin glycation levels, diet and lifehistory traits across birds. Their results confirmed the expected positive relationship between plasma blood glucose level and albumin glycation rate but also provided findings that are somewhat surprising or contrast with findings of some previous studies (positive relationships between blood glucose and lifespan, or absent relationships between blood glucose and clutch mass or diet). This is the first extensive comparative analysis of glycation rates and their relationships to plasma glucose levels and life history traits in birds that is based on data collected in a single study, with blood glucose and glycation measured using unified analytical methods (except for blood glucose data for 13 species collected from a database).

      Strengths

      This is an emerging topic gaining momentum in evolutionary physiology, which makes this study a timely, novel and important contribution. The study is based on a novel data set collected by the authors from 88 bird species (67 in captivity, 21 in the wild) of 22 orders, except for 13 species, for which data were collected from a database of veterinary and animal care records of zoo animals (ZIMS). This novel data set itself greatly contributes to the pool of available data on avian glycemia, as previous comparative studies either extracted data from various studies or a ZIMS database (therefore potentially containing much more noise due to different methodologies or other unstandardised factors), or only collected data from a single order, namely Passeriformes. The data further represents the first comparative avian data set on albumin glycation obtained using a unified methodology. The authors used LC-MS to determine glycation levels, which does not have problems with specificity and sensitivity that may occur with assays used in previous studies. The data analysis is thorough, and the conclusions are substantiated. Overall, this is an important study representing a substantial contribution to the emerging field evolutionary physiology focused on ecology and evolution of blood/plasma glucose levels and resistance to glycation.

      Weaknesses

      Unfortunately, the authors did not record handling time (i.e., time elapsed between capture and blood sampling), which may be an important source of noise because handling-stress-induced increase in blood glucose has previously been reported. Moreover, the authors themselves demonstrate that handling stress increases variance in blood glucose levels. Both effects (elevated mean and variance) are evident in Figure ESM1.2. However, this likely makes their significant findings regarding glucose levels and their associations with lifespan or glycation rate more conservative, as highlighted by the authors.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      I understand that your main objective regarding glycation rate and lifespan, was to analyse the species resistance to glycation with respect to lifespan, while factoring out the species-specific variation in blood glucose level. However, I still believe that the absolute glycation level (i.e., not controlled for blood glucose level) may also be important for the evolution of lifespan. Given that blood glucose is positively related to both glycation and lifespan (although with a plateau in the latter case), lifespan could possibly be positively correlated with absolute glycation levels. If significant, that would be an interesting and counterintuitive finding, which would call for an explanation, thereby potentially stimulating further research. If not significant, it would show that long-lived species do not have higher glycation levels, despite having higher blood glucose levels, thereby strengthening your argument about higher resistance of longlived species to glycation. So, in my opinion, the inclusion of an additional model of glycation level on life-history traits, without controlling for blood glucose, is worth considering.

      We include now this model as supplementary material, indicating it in several parts of the text, including some of these issues we discussed here.

      Lines 230-231: Please, provide a citation for these GVIF thresholds

      We include it now.

      Figure 3: I think that showing both glucose and glycation rate on the linear scale, rather than log scale, would better illustrate your conclusion - the slowing rise of glycation rate with increasing glucose levels.

      That is a good point, although it may also be confusing for readers to see a graph that represents the data in a different way as the models. Maybe showing both graphs (as 3.A and 3.B) can solve it?

      Figure 4. I recommend stating in the caption that the whiskers do not represent interquartile ranges (a standard option in box plots) but credible intervals as mentioned in the current version of the public author response.

      Sorry about that, it was missed. Now it is included. Nevertheless, interquartile ranges from the posterior distributions can still be observed here represented with the boxes. Then the whiskers are the credible intervals.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Guo and colleagues used a cell rounding assay to screen a library of compounds for inhibition of TcdB, an important toxin produced by Clostridioides difficile. Caffeic acid and derivatives were identified as promising leads, and caffeic acid phenethyl ester (CAPE) was further investigated.

      Strengths:

      Considering the high morbidity rate associated with C. difficile infections (CDI), this manuscript presents valuable research in the investigation of novel therapeutics to combat this pressing issue. Given the rising antibiotic resistance in CDI, the significance of this work is particularly noteworthy. The authors employed a robust set of methods and confirmatory tests, which strengthened the validity of the findings. The explanations provided are clear, and the scientific rationale behind the results is well-articulated. The manuscript is extremely well-written and organized. There is a clear flow in the description of the experiments performed. Also, the authors have investigated the effects of CAPE on TcdB in careful detail and reported compelling evidence that this is a meaningful and potentially useful metabolite for further studies.

      Weaknesses:

      This is really a manuscript about CAPE, not caffeic acid, and the title should reflect that. Also, a few details are missing from the description of the experiments. The authors should carefully revise the manuscript to ascertain that all details that could affect the interpretation of their results are presented clearly. Just as an example, the authors state in the results section that TcdB was incubated with compounds and then added to cells. Was there a wash step in between? Could compound carryover affect how the cells reacted independently from TcdB? This is just an example of how the authors should be careful with descriptions of their experimental procedures. Lastly, authors should be careful when drawing conclusions from the analysis of microbiota composition data. Ascribing causality to correlational relationships is a recurring issue in the microbiome field. Therefore, I suggest authors carefully revise the manuscript and tone down some statements about the impact of CAPE treatment on the gut microbiota.

      Thanks for your constructive suggestion. We have carefully revised the manuscript, including the description of title, results and methods sections.

      Reviewer #2 (Public review):

      Summary:

      This work is towards the development of nonantibiotic treatment for C. difficile. The authors screened a chemical library for activity against the C. difficile toxin TcdB, and found a group of compounds with antitoxin activity. Caffeic acid derivatives were highly represented within this group of antitoxin compounds, and the remaining portion of this work involves defining the mechanism of action of caffeic acid phenethyl ester (CAPE) and testing CAPE in mouse C. difficile infection model. The authors conclude CAPE attenuates C. difficile disease by limiting toxin activity and increasing microbial diversity during C. difficile infection.

      Strengths/ Weaknesses:

      The strategy employed by the authors is sound although not necessarily novel. A compound that can target multiple steps in the pathogenies of C. difficile would be an exciting finding. However, the data presented does not convincingly demonstrate that CAPE attenuates C. difficile disease and the mechanism of action of CAPE is not convincingly defined. The following points highlight the rationale for my evaluation.

      (1) The toxin exposure in tissue culture seems brief (Figure 1). Do longer incubation times between the toxin and cells still show CAPE prevents toxin activity?

      Thanks for your comments. The cytotoxicity assay was employed to directly assess the protective capacity of CAPE against cell death induced by TcdB. Our observations at 1 and 12 h post-TcdB exposure revealed that CAPE effectively mitigated the toxic effects of the TcdB at both time points, demonstrating its potent protective role. Please see Figure S1.

      (2) The conclusion that CAPE has antitoxin activity during infection would be strengthened if the mouse was pretreated with CAPE before toxin injections (Figure 1D).

      Thanks for your constructive comments. According to your suggestion, we administered TcdB 2 h after pretreatment with CAPE. The outcomes demonstrated that CAPE pretreatment significantly enhanced the survival rate of the intoxicated mice, confirming that CAPE retains its antitoxin efficacy during the infection process. Please see Figure S2.

      (3) CAPE does not bind to TcdB with high affinity as shown by SPR (Figure 4). A higher affinity may be necessary to inhibit TcdB during infection. The GTD binds with millimolar affinity and does not show saturable binding. Is the GTD the binding site for CAPE? Auto processing is also affected by CAPE indicating CAPE is binding non-GTD sites on TcdB.

      Thanks for your comments. Our findings indicate that the GTD domain is a critical binding site for CAPE. CAPE exerts its protective effects at multiple stages of TcdB-mediated cell death, including inhibiting TcdB's self-cleavage and blocking the activity of GTD, thereby preventing the glycosylation modification of Rac1 by TcdB.

      (4) In the infection model, CAPE does not statistically significantly attenuate weight loss during C. difficile infection (Figure 6). I recognize that weight loss is an indirect measure of C. difficile disease but histopathology also does not show substantial disease alleviation (see below).

      Thanks for your comments. Our comparative analysis revealed a notable distinction in the body weight of mice on the third day post-infection (Figure 6B). Similarly, the dry/wet stool ratio exhibited a comparable pattern, suggesting that treatment with phenethyl caffeic acid ameliorated Clostridium difficile-induced diarrhea to a significant degree (Figure 6C).

      (5) In the infection model (Figure 6), the histopathology analysis shows substantial improvement in edema but limited improvement in cellular infiltration and epithelial damage. Histopathology is probably the most critical parameter in this model and a compound with disease-modifying effects should provide substantial improvements.

      Thanks for your comments. Edema, inflammatory factor infiltration, and epithelial damage served as key evaluation metrics. Statistical analysis revealed that the pathological scores of mice treated with CAPE were markedly reduced compared to those in the model group (Figure 6F).

      (6) The reduction in C. difficile colonization is interesting. It is unclear if this is due to antitoxin activity and/or due to CAPE modifying the gut microbiota and metabolites (Figure 6). To interpret these data, a control is needed that has CAPE treatment without C. difficile infection or infection with an atoxicogenic strain.

      The observed reduction in C. difficile fecal colonization following drug treatment may be attributed to the CAPE's antitoxin properties or its capacity to modify the intestinal microbiota and metabolites. These two mechanisms likely work in tandem to combat CDI. CDI is primarily triggered by the toxins A (TcdA) and B (TcdB) secreted by the bacterium. Certain therapies, including monoclonal antibodies like bezlotoxumab, target CDI by neutralizing these toxins, thereby mitigating gut damage and subsequent C. difficile colonization(1,2). The establishment of C. difficile in the gut is intricately linked to the equilibrium of the intestinal microbiota. Although antibiotic treatments can inhibit C. difficile growth, they may also disrupt the microbial balance, potentially facilitating the overgrowth of other pathogens. Consequently, interventions such as fecal microbiota transplantation (FMT) are designed to reestablish gut flora balance and consequently decrease C. difficile colonization(3,4). Moreover, the administration of probiotics and prebiotics is considered to reduce C. difficile colonization by modifying the gut environment(5,6).

      (7) Similar to the CAPE data, the melatonin data does not display potent antitoxin activity and the mouse model experiment shows marginal improvement in the histopathological analysis (Figure 9). Using 100 µg/ml of melatonin (~ 400 micromolar) to inactivate TcdB in cell culture seems high. Can that level be achieved in the gut?

      The uptake and dissemination of melatonin within the body varies with the dose administered. For instance, in rats, the bioavailability of melatonin following administration was found to be 53.5%, whereas in dogs, bioavailability was nearly complete (100%) at a dose of 10 mg/kg, yet it decreased to 16.9% at a lower dose of 1 mg/kg(7). This data suggests that the absorption of melatonin differs across various animal species and is influenced by the dose administered. Moreover, it underscores the higher potential bioavailability of melatonin, implying that a dose of 200 mg/kg should be adequate to achieve the desired concentration in the body post-administration.

      (8) The following parameters should be considered and would aid in the interpretation of this work. Does CAPE directly affect the growth of C. difficile? Does CAPE affect the secretion of TcdB from C. difficile? Does CAPE alter the sporulation and germination of C. diffcile?

      We incorporated CAPE into the MIC assay for detecting C. difficile, as well as for assessing the sporulation capacity of C. difficile and evaluating the secretion level of TcdB. The findings revealed that CAPE markedly repressed tcdB transcription at a concentration of 16 μg/mL and effectively suppressed the growth and sporulation of C. difficile BAA-1870 at a concentration of 32 μg/mL. Please see Figure S3.

      References:

      (1) Skinner AM, et al. Efficacy of bezlotoxumab to prevent recurrent Clostridioides difficile infection (CDI) in patients with multiple prior recurrent CDI. Anaerobe. 2023 Dec; 84: 102788.

      (2) Wilcox MH, et al. Bezlotoxumab for Prevention of Recurrent Clostridium difficile Infection. N Engl J Med. 2017 Jan 26;376(4):305-317.

      (3) Khoruts A, Sadowsky MJ. Understanding the mechanisms of faecal microbiota transplantation. Nat Rev Gastroenterol Hepatol. 2016 Sep;13(9):508-16.

      (4) Khoruts A, Staley C, Sadowsky MJ. Faecal microbiota transplantation for Clostridioides difficile: mechanisms and pharmacology. Nat Rev Gastroenterol Hepatol. 2021 Jan;18(1):67-80.

      (5) Mills JP, Rao K, Young VB. Probiotics for prevention of Clostridium difficile infection. Curr Opin Gastroenterol. 2018 Jan;34(1):3-10.

      (6) Lau CS, Chamberlain RS. Probiotics are effective at preventing Clostridium difficile-associated diarrhea: a systematic review and meta-analysis. Int J Gen Med. 2016 Feb 22; 9:27-37.

      (7) Yeleswaram K, et al. Pharmacokinetics and oral bioavailability of exogenous melatonin in preclinical animal models and clinical implications. J Pineal Res. 1997 Jan;22(1):45-51.

      Reviewer #3 (Public review):

      Summary:

      The study is well written, and the results are solid and well demonstrated. It shows a field that can be explored for the treatment of CDI.

      Strengths:

      The results are really good, and the CAPE shows a good and promising alternative for treating CDI. The methodology and results are well presented, with tables and figures that corroborate them. It is solid work and very promising.

      Weaknesses:

      Some references are too old or missing.

      Thanks for your constructive suggestion. We have included and refreshed several references to enhance the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      While the manuscript convincingly demonstrates that CAPE affects the TcdB toxin and reduces its toxicity in vitro, it would be beneficial to include data on the effect of CAPE on the growth of C. difficile. This would help ensure that the observed in vivo effects are not merely due to reduced bacterial growth but rather due to the specific action of CAPE on the toxin.

      Thanks for your constructive suggestion. We have augmented our findings with the impact of CAPE on the bacteria themselves, revealing that CAPE not only hampers the growth of the bacterial cells but also suppresses their capacity to produce spores. Please see Figure S3.

      (1) Line 41, line 115 - authors should clarify what they mean when mentioning Bacteroides within parentheses.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (2) Line 71 - Is C. difficile really found "in the environment"?

      Thanks for your comments. C. difficile is prevalent across various natural settings, including soil and water ecosystems. A study has identified highly diverse strains of this bacterium within environmental samples(1). Moreover, the significant presence of C. difficile in soil and lawn specimens collected near Australian hospitals indicates that the organism is indeed a common inhabitant in the environment(2).

      (3) Lines 128-130 - Was there a wash step here? What could be the impact of compound carryover in this experiment?

      Thanks for your comments. Following pre-incubation of TcdB with CAPE, remove the compounds that have not bound to TcdB through centrifugation. The persistence of the compound in the culture post-washing could result in an inflated assessment of its efficacy, particularly if it continues to engage with TcdB or the cells beyond the initial 1-hour pre-incubation window. The carryover of the compound might also give rise to misleading positive results, where the compound seems to confer protection or inhibition against TcdB-mediated cell rounding, whereas such effects are actually due to the lingering activity of the compound. This carryover could skew the determination of the compound's minimum effective concentration, as the effective concentration interacting with the cells might be inadvertently elevated. Furthermore, if the compounds possess cytotoxic properties or impact cell viability, carryover could generate artifacts in cell morphology that are unrelated to the direct interaction between TcdB and the compounds.

      (4) Lines 133-134 - I suggest authors mention how many caffeic acid derivatives there were in the entire library so that the suggested "enrichment" of them in the group of bioactive compounds can be better judged.

      Thanks for your comments. The natural compound library contained eight caffeic acid derivatives, of which methyl caffeic acid and ferulic acid displayed no efficacy. This information has been incorporated into the manuscript.

      (5) Line 135 - I recommend the authors add the molarity of the compound solutions used.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (6) Line 247 - I think the term "CAPE mice" is confusing. Please use a full description.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (7) Line 248 - I also think the terms "model mice" and "model group" are confusing. Maybe call them "control mice"?

      Thanks for your comments. The terms "model mice" and "model group" are indeed synonymous, and we have subsequently clarified that control mice refer to those that have not been infected with C. difficile.

      (8) Line 273 - "most abundant species at the genus level" is incorrect. I think what you mean is "most abundant TAXA".

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (9) Line 278 - Please include your p-value cut-off together with the LDA score.

      Thanks for your comments. We have revised the above description to “LDA score > 3.5, p < 0.05”.

      (10) Line 292 - Details on how metabolomics was performed should be included here.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (11) Line 299 - 1.5 is a fairly low cut-off. The authors should at a minimum also include the p-value cut-off used.

      Response: Thanks for your comments. We have revised the above description to “fold change > 1.5, p < 0.05”.

      (12) Line 307 - Purine "degradation" would be better here.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (13) Line 328 onward - The melatonin experiment is a weird one. Although I fully understand the rationale behind testing the effect of melatonin in the mouse model, the idea that just because melatonin levels changed in the gut it would act as a direct inhibitor of TcdB was very far-fetched, even though it ended up working. Authors should explain this in the manuscript.

      Thanks for your comments. Furthermore, beyond our murine studies, we have confirmed that melatonin significantly diminishes TcdB-induced cytotoxicity at the cellular level (Figure 9A). Additionally, it has been documented that melatonin, acting as an antimicrobial adjuvant and anti-inflammatory agent, can decrease the recurrence of CDI(3). Consequently, we contend that the aforementioned statement is substantiated.

      (14) Lines 429-435 - There are seemingly contradictory pieces of information here. The authors state that adenosine is released from cells upon inflammation and that CAPE treatment caused an increase in adenosine levels. Later in this section, the authors state that adenosine prevents TcdA-mediated damage and inflammation. This should be clarified and better discussed.

      Thanks for your comments. Adenosine modulates immune responses and inflammatory cascades by interacting with its receptors, including its capacity to suppress the secretion of specific pro-inflammatory mediators. We have updated this depiction in the manuscript.

      (15) Lines 513-514 - How was this phenotype quantified?

      Thanks for your comments. Initially, we introduced TcdB at a final concentration of 0.2 ng/mL along with various concentrations of compounds into 1 mL of medium for a 1-h pre-incubation period. Subsequently, unbound compounds were removed through centrifugation, and the resulting mixture was then applied to the cells.

      (16) Figure 3 - panels are labeled incorrectly.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (17) Figure 5C - it is unclear what the different colors and labels represent.

      Thanks for your comments. In the depicted graph, blue denotes the total binding energy, red signifies the electrostatic interactions, green corresponds to the van der Waals forces, and orange indicates solvation or hydration effects. The horizontal axis represents the mutation of the amino acid residue at the respective position to alanine. As illustrated in Figure 5C, the mutations W520A and GTD exhibit the highest binding energies.

      References:

      (1) Janezic S, et al. Highly Divergent Clostridium difficile Strains Isolated from the Environment. PLoS One. 2016 Nov 23;11(11): e0167101.

      (2) Perumalsamy S, Putsathit P, Riley TV. High prevalence of Clostridium difficile in soil, mulch and lawn samples from the grounds of Western Australian hospitals. Anaerobe. 2019 Dec; 60:102065.

      (3) Sutton SS, et al. Melatonin as an Antimicrobial Adjuvant and Anti-Inflammatory for the Management of Recurrent Clostridioides difficile Infection. Antibiotics (Basel). 2022 Oct 25;11(11):1472.

      Reviewer #2 (Recommendations for the authors):

      Minor comments and questions.

      (1) Which form of TcdB is being used in these experiments?

      Thanks for your comments. The TcdB proteins used in this study are TcdB1 subtypes.

      (2) Why are THP-1 cells being used in these assays?

      Thanks for your comments. For the purposes of this study, we employed a diverse array of cell lines, including Vero, HeLa, THP-1, Caco-2, and HEK293T. Each cell line was selected to serve a specific experimental objective. The inclusion of the THP-1 cell line was necessitated by the need to incorporate a macrophage cell line to ensure the comprehensive nature of our experiments, allowing for the testing of both epithelial cells and macrophages. C. difficile is a kind of intestinal pathogenic bacteria, and immune clearance plays a vital role in the process of pathogen infection, so THP-1 cells are used as important immune cells.

      (3) Please improve the quality of the microscopy images in Figure 1.

      Thanks for your comments. We have improved the quality of the microscopy images in Figure 1.

      (4) Does the flow cytometry experiment in Figure 2B show internalization? Surface-bound toxins would provide the same histogram.

      Thanks for your comments. Figure 2B was employed to assess the internalization of TcdB, and the findings indicate that CAPE does not influence the internalization process of TcdB.

      (5) The sensogram in Figure 4A does not look typical and should be clarified.

      Thanks for your comments. Typically, small molecules and proteins engage in a rapid binding and dissociation dynamic. However, as depicted in Figure 4A, the interaction between CAPE and TcdB demonstrates a gradual progression towards equilibrium. This behavior can be primarily explained by the swift occupation of the protein's primary binding sites by the small molecule in the initial stages. Subsequently, CAPE binds to secondary or lower affinity sites, extending the time needed to reach equilibrium. Additionally, the likelihood of CAPE binding to multiple sites on TcdB requires time for the exploration and occupation of these diverse locations before equilibrium is attained, we have incorporated an analysis of this potential scenario into the manuscript.

      Reviewer #3 (Recommendations for the authors):

      These are my suggestions for the text:

      (1) Line 29: high recurrent rates.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (2) Line 32: Where is the caffeic acid identified? I think a line should be included.

      Thanks for your comments. Caffeic acid was identified from natural compounds library and we have completed the corresponding modifications according to the suggestions.

      (3) Line 39: C. difficile is not italic.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (4) Line 41: Bacteroides spp.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (5) Line 56: This number of casualties 56.000 is still happening or it was in the past?

      Thanks for your comments. The mortality rates reported in the manuscript reflect a downturn in the incidence and fatality of CDI around 2017(1), as the infection gained broader recognition. Nonetheless, a recent study reveals that the mortality rate for CDI cases in Germany can soar to 45.7% within a year, with the overall economic burden amounting to approximately 1.6 billion euros. This underscores the ongoing significance of CDI as a global public health challenge(2).

      (6) Line 104: Where did the idea of testing caffeic acid come from? Any previous study of the authors? Any studies with the inhibition of other pathogens?

      Thanks for your comments. Initially, we conducted a screen of a compound library comprising 2,076 compounds and identified several potent inhibitors, which, upon structural analysis, were revealed to be caffeic acid derivatives. Prior to our investigation, no studies had explored the potential of CAPE in this context.

      (7) Line 115: Bacteroides spp.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      Results section

      (8) Did the authors try the caffeic acid with the TcdA or binary toxin? I know this is not the purpose of the study, but TcdA toxin has a high identity structure with TcdB and generates inflammation in the gut via neutrophils. Negative strains for the major toxins and positive for the binary toxin also cause severe cases of CDI.

      Thanks for your comments. Although we acknowledge the significance of TcdA and binary toxins in CDI, we did not investigate the impact of CAPE on these toxins. Our focus was exclusively on the effect of CAPE against TcdB, as it is the primary virulence factor in C. difficile pathogenesis. Since TcdA and TcdB are highly similar in structure, we will analyze the neutralization effect of CAPE on TcdA in later studies.

      (9) Does caffeic acid have any effect on C. difficle? Or does it only gain the toxins? That would be ideal.

      Thanks for your comments. We have included additional related assays in our study. Beyond directly neutralizing TcdB, CAPE also demonstrates the capacity to inhibit the growth and spore formation of C. difficile.

      (10) Line 230: C. difficile BAA-1870 is a clinical strain? There are no details about it in the paper.

      Thanks for your comments. C. difficile BAA-1870 (RT027/ST1), a highly virulent isolate frequently employed in research(3-6), was kindly donated by Professor Aiwu Wu. We have meticulously noted the PCR ribotype in our manuscript.

      (11) Line 236: Did the mice fully recover from CDI after the administration of the CAPE? Was one dose enough?

      Thanks for your comments. CAPE was administered orally at 24 h intervals, commencing with the initial dose on Day 0. By the time a significant difference was observed on Day 3, the treatment had been administered a total of three times.

      Methodology

      (12) Most of the methods do not have a reference.

      Thanks for your comments. We have added several references to the methods.

      Discussion section

      (13) The first two paragraphs of the discussion should be summarized. Those details were already explained in the introduction.

      Thanks for your comments. The discussion section and the introduction address slightly different focal points; therefore, we aim to retain the first two paragraphs to maintain continuity and context.

      (14) Line 382: Bezolotoxumab was approved by the FDA in 2016. It is not recent.

      Thanks for your comments. We have revised the above description.

      (15) Line 410: "Despite the high 410 cure rate and increasing popularity of FMT, its safety remains controversial. Although this is true, recently (2022) the FDA approved the Rebyota, which was later cited by the authors.

      Thanks for your comments. We have revised the above description.

      (16) Lines 415-416: "the abundance of Bacteroides, a critical gut microbiota component that is required for C. difficile resistance". There is only one reference cited by the authors. I suppose that if it is true, more studies should be mentioned. Why are probiotics with Bacteroides spp. not available in the market?

      Thanks for your comments. We have supplemented additional references. The scarcity of probiotic products containing Bacteroides spp. on the market is primarily attributable to the stringent requirements of their survival conditions. As most Bacteroides spp. are anaerobic, they thrive in oxygen-deprived environments. This unique survival trait poses challenges in maintaining their viability during product preservation and distribution, which in turn escalates production costs and complexity. Furthermore, despite the significant role of Bacteroides in gut health, research into its potential probiotic benefits and safety is comparatively underexplored.

      References:

      (1) Guh AY, et al. Emerging Infections Program Clostridioides difficile Infection Working Group. Trends in U.S. Burden of Clostridioides difficile Infection and Outcomes. N Engl J Med. 2020 Apr 2;382(14):1320-1330.

      (2) Schley K, et al. Costs and Outcomes of Clostridioides difficile Infections in Germany: A Retrospective Health Claims Data Analysis. Infect Dis Ther. 2024 Nov 20.

      (3) Saito R, et al. Hypervirulent clade 2, ribotype 019/sequence type 67 Clostridioides difficile strain from Japan. Gut Pathog. 2019 Nov 4; 11:54.

      (4) Pellissery AJ, Vinayamohan PG, Venkitanarayanan K. In vitro antivirulence activity of baicalin against Clostridioides difficile. J Med Microbiol. 2020 Apr;69(4):631-639.

      (5) Shao X, et al. Chemical Space Exploration around Thieno[3,2-d]pyrimidin-4(3H)-one Scaffold Led to a Novel Class of Highly Active Clostridium difficile Inhibitors. J Med Chem. 2019 Nov 14;62(21):9772-9791.

      (6) Mooyottu S, Flock G, Venkitanarayanan K. Carvacrol reduces Clostridium difficile sporulation and spore outgrowth in vitro. J Med Microbiol. 2017 Aug;66(8):1229-1234.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      Chabukswar et al analysed endogenous retrovirus (ERV) Env variation in a set of primate genomes using consensus Env sequences from ERVs known to be present in hominoids using a Blast homology search with the aim of characterising env gene changes over time. The retrieved sequences were analysed phylogenetically, and showed that some of the integrations are LTR-env recombinants.

      Strengths

      The strength of the manuscript is that such an analysis has not been performed yet for the subset of ERV Env genes selected and most of the publicly available primate genomes.

      Weaknesses

      Unfortunately, the weaknesses of the manuscript outnumber its strengths. Especially the methods section does not contain sufficient information to appreciate or interpret the results. The results section contains methodological information that should be moved, while the presentation of the data is often substandard. For instance, the long lists of genomes in which a certain Env was found could better be shown in tables. Furthermore, there is no overview of the primate genomes Saili how did you answer to this?, or accession numbers, used. It is unclear whether the analyses, such as the phylogenetic trees, are based on nucleotide or amino acid sequences since this is not stated. tBLASTn was used in the homology searches, so one would suppose aa are retrieved. In the Discussion, both env (nt?) and Env (aa?) are used.

      For the non-hominoids, genome assembly of publicly available sequences is not always optimal, and this may require Blasting a second genome from a species. Which should for instance be done for the HML2 sequences found in the Saimiri boliviensis genome, but not in the related Callithrix jacchus genome. Finally, the authors propose to analyse recombination in Env sequences but only retrieve env-LTR recombinant Envs, which should likely not have passed the quality check.

      Since the Methods section does not contain sufficient information to understand or reproduce the results, while the Results are described in a messy way, it is unclear whether or not the aims have been achieved. I believe not, as characterisation of env gene changes over time is only shown for a few aberrant integrations containing part of the LTR in the env ORF.

      We thank the reviewer for the critiques of the manuscript and their constructive suggestions to improve the clarity, methodological rigor, and data presentation.

      (1) The concern regarding the insufficient data in the methods has been resolved in the revised manuscript by adding a supplementary file that contains the genome assemblies that  were used to perform the tBLAStn analysis using the reconstructed Env sequences. The requested accession numbers are available for all sequences in the supplementary phylogenetic figures.

      (2) We have also modified the manuscript by moving a portion of the results section in the methods section, in particular all the methodological description of the reconstruction of Env part (Line 197-231).

      (3) As suggested, the long list of genomes mentioned in the results section in which the Env tBLASTn hits were obtained are now provided in the table form (Table 2) as an overall summary of the distribution of ERV Env in the genomes and the genome assemblies are mentioned in Supplementary file 2.

      (4) As for the point regarding the tBLASTn usage in the homology searches, we first performed tBLASTn analysis using the reconstructed Env amino acid sequences as query and performed tBLASTn similarity search in the primate genomes. The tBLASTn algorithm uses the amino acid sequences to compare with the translated nucleotide database in all six frames and hence the hits obtained are nucleotide sequences (Line 381-383). These nt sequences were used for all the further analysis such as sequence alignment, phylogenetic analysis and recombination analysis. For better clarity, we have specified the use of env nt alignments in the methods section to avoid the raised confusion in the discussion.

      (5) For the HML supergroup characterization in squirrel monkey genome (Saimiri boliviensis), we used the tBLASTn hits obtained in the S. boliviensis from the initial analysis to perform the comparative genomics in two Platyrrhini genomes available on UCSC Genome browser. In particular, this analysis was performed to confirm the presence of specific members of HML supergroup in squirrel monkey genomes that has not been previously reported. We used the available genome assemblies because of the annotations available on Genome browser, and especially the possibility to use the repeatmasker tracks and the comparative genomics tools in order to use the human genome as a reference. We reported the coordinates for the members of HML supergroup that were retrieved through the comparative genomic assemblies by applying the repeat masker custom track, that have many ERVS that are not present in NCBI reference genomes.

      (6) The concern regarding only retrieving env-LTR recombinant Envs has been addressed in the revised results section (Lines 747-758). As also mentioned in the methods section, the RDP software detects the recombinant sequences and a breakpoint position for the recombinant signals and hence we confirmed only those sequences that were predicted as potential recombinant sequences by the RDP software through comparative genomics. All the sequences predicted by the software were env-LTR recombinant and hence we confirmed and reported only those recombinant sequences in the manuscript.

      Reviewer #1 (Recommendations for the authors):

      The paper could be strengthened by:

      - a rigorous rewriting and shortening of the manuscript, thereby eliminating all textbook-like paragraphs, and all biological misinterpretations and confusions. Distinguish between retroviral replication as an exogenous virus, and host genome remodeling affecting ERVs. Rewrite the sections on template switching by RT being the basis for the observed recombinations, while host genome recombinations are far more likely. ERVs with such aberrant env/LTR gene recombination are unlikely to be fit for cross-species transmission. Likely, such a recombinant was generated in a common ancestor. Also, host RNA polymerase II transcribes retroviral RNA (line 79), not RT.

      - check lines 89-90 as pro is part of the pol gene in gamma- and lentiviruses.

      We thank the reviewer for the suggestion, we have revised the manuscript by shortening the introduction section and eliminating the textbook like paragraphs and also clarifying the recombination mechanism. We have revised the introduction section at Lines 102-111, and the clarification for the recombination mechanism is provided at lines 1668-1675

      - adding much more information to the Methods section. Such as which genomes were searched, were nt or aa have been retrieved and analysed, were multiple genomes of a species searched, a list of databases used ('various databases' in line 164 does not suffice), etc.

      We thank the reviewer for the observation. As mentioned above, in the revised manuscript we have provided more detailed methods by including a supplementary file for the genome assemblies used for tBLASTn analysis and comparative genomics. For the sequence alignment, phylogenetic analysis and recombination analysis we used nt sequences, as it is also mentioned in the revised version. Lastly, all the databases that were used and are mentioned in the methods section.

      - more information is needed on the alignments and phylogenetic trees. For instance, how were indels treated? How long were the alignments on average regarding informative sites?

      We thank the reviewer for the questions, to answer them we have added a paragraph (Lines 359-362) describing the reconstruction process in more details.

      - confirm the findings about the presence or absence of an ERV, such as for the squirrel monkey genome, using additional genomes of the species

      As mentioned above, we only used the genome assemblies available on the genome browser because of the annotations available on Genome browser, blasting the second NCBI RefSeq genome using the BLAST algorithm does not provide accurate information and annotations compared to that of Genome browser and hence we reported the coordinates for the members of HML supergroup that were retrieved through the comparative genomic assemblies by applying the repeat masker custom track, that have many ERVS that are not present in NCBI reference genomes.

      - present the lists of findings in primate genomes on pages 9 and 10 in tables

      We thank the reviewer for the suggestion, we have provided a new table (Table 2) in the revised version summarizing the ERV Env distribution results.

      - a significant limitation of the study is that only env ERVs found in hominoids have been searched in OWM and NWM, not ones specific for monkeys. This should be mentioned somewhere.

      As the reviewer pointed out, the study was designed to explore ERVs’ Env  sequences in hominoids which were then searched in the OWM and NWM genomes, this is now better stated in the introduction at Lines 57-60.

      - define abbreviations at first use (e.g. HML in abstract)

      We thank the reviewer for the suggestion, we have mentioned the abbreviations in the abstract, where we mentioned HML first (Line 65)

      - explain 'pathological domestication' (line 42). Domestication implies usefulness to the host. And over time, deleterious insertions would have been likely purged from a population.

      We thank the reviewer for the observation, we have modified the sentence and provided a clearer explanation for the pathological and physiological consequences of ERVs’ env (lines 52-57).

      Furthermore:

      - why begin the discussion with a lengthy description of domestication and syncytins, which is not part of the current study?

      We thank the reviewer for the critique. Accordingly, we have now modified the discussion section by shortening the part about domestication of syncytins, and just mentioned them as an example at lines 942-944.

      - how can 96 hits have been retrieved for spuma-like envs (line 506), while it was earlier reported (line 333), that the most hits were gamma-like?

      We thank the reviewer for the observation, we have clarified and explained how 96 hits have been retrieved for spuma-like envs in lines 670-677 of the discussion section.

      English grammar should be improved throughout the manuscript.

      And I could not open half of the supplementary files

      As suggested we have revised English and checked that all files were correctly open.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Chabukswar et al. describes a comprehensive attempt to identify and describe the diversity of retroviral envelope (env) gene sequences present in primate genomes in the form of ancient endogenous retrovirus (ERV) sequences.

      Strengths:

      The focus on env can be justified because of the role the Env proteins likely played in determining viral tropism and host range of the viruses that gave rise to the ERV insertions, and to a lesser extent, because of the potential for env ORFs to be coopted for cellular functions (in the rare cases where the ORF is still intact and capable of encoding a functional Env protein). In particular, these analyses can reveal the potential roles of recombination in giving rise to novel combinations of env sequences. The authors began by compiling env sequences from the human genome (from human endogenous retrovirus loci, or "HERVs") to build consensus Env protein sequences, and then they use these as queries to screen other primate genomes for group-specific envs by tBLASTn. The "groups" referred to here are previously described, as unofficial classifications of endogenous retrovirus sequences into three very broad categories - Class I, Class II and Class III. These are not yet formally recognized in retroviral taxonomy, but they each comprise representatives of multiple genera, and so would fall somewhere between the Family and Genus levels. The retrieved sequences are subject to various analyses, most notably they are screened for evidence of recombination. The recombinant forms appear to include cases that were probably viral dead-ends (i.e. inactivating the env gene) even if they were propagated in the germline.

      The availability of the consensus sequences (supplement) is also potentially useful to others working in this area.

      Weaknesses:

      The weaknesses are largely in presentation. Discussions of ERVs are always complicated by the lack of a formal and consistent nomenclature and the confusion between ERVs as loci and ERVs as indirect information about the viruses that produced them. For this reason, additional attention needs to be paid to precise wording in the text and/or the use of illustrative figures.

      We thank the reviewer for the general observation. We put additional attention to the wording in text/figures, and hope to have improved the manuscript clarity.

      Reviewer #2 (Recommendations for the authors):

      Reviewing the manuscript was a challenge because figures were difficult to read. As provided, the fonts were sometimes too small to read in a standard layout and had to be expanded on screen.

      The tree in Figure 3 could also be made easier to read, for example if the authors collapsed related branches and gave the clusters a single, clear label (this is not necessary, just a suggestion) - especially if the supplementary trees have all the labelled branches for any readers who want specific details.

      I also recommend asking a third party (perhaps a scientific colleague) with fluency in English grammar and familiarity with English scientific idiom to provide some editorial feedback on the text.

      Figure 4 legend is confusing. From the description it sounds like the tree in 4B is a host phylogeny, but it's not clearly stated. And if so, how was the tree generated? Is it based on entire genomes? Include at least enough methodological detail or citations that someone could recreate it, if necessary. The details and how it was done should be briefly mentioned here and in detail in the Methods section.

      We thank the reviewer for the observation. As for Figure 4 we have modified its legend and more clearly stated how the phylogenetic tree of the primate genomes was generated using TimeTree. We have also provided further details in the methods section (Lines 475-489).

      As suggested we have revised English.

      Line 42 - what is "pathological domestication"? It sounds like a contradiction in terms.

      We thank the reviewer for the observation. We have modifies the sentence and provided clearer explanation for the pathological and physiological consequences of ERVs’ env (lines 52-57).

      Lines 166-167 - the authors use the word "classes" but then use a list of terms that correspond to genera within the Retroviridae. The authors should be cautious here, as "class" and "genus" are both official taxonomic terms with different meanings. Do they mean genus? Or, if a more informal term is needed, perhaps "group"?

      Thank you for the observation, the ERVs have been classified into three classes (Class I, II and III) based on the relatedness to the exogenous retroviruses Gammaretrovirus, Betaretrovirus and Spumaretrovirus genera respectively and hence have been mentioned in the manuscript as per the nomenclature proposed by Gifford et al., 2018 which has been cited at Lines 122-125.

      Line 221- "defferent" should be "different"

      Corrected

      Lines 233-234 - what is meant by "canonical" and "non-canonical" forms? Can the authors please define these two terms?

      Thank you for the question, canonical refers to sequences that are well-preserved and match the structural and functional features of complete env genes, and non-canonical refers to sequences with significant structural alterations or truncations that deviate from this typical form. This explanation has been mentioned in the revised version at Lines 475-479.

      Line 252 - if/is

      Corrected

      Lines 274-276 needs a citation to the paper(s) that reported this.

      Corrected

      Line 283-285 - this was confusing. How could the authors have noted distinct occurrences and clusters of these if they were excluded from the BLAST analysis? It says the consensus sequences were effectively representing these, but doesn't this raise the possibility that the consensus sequences are not specific enough? Could this also then lead to false identification? Perhaps a few more words to explain should be added.

      We thank the reviewer for the observation. While performing the tBlastn search we did obtain the hits for HERV15, HERVR, ERVV1, ERVV2 and PABL, and we have mentioned the detailed explanation about this observation in the revised manuscript at lines 619-627.

      Line 298 - missing comma

      Corrected

      Lines 348-351- this list is not a list of recombination mechanisms. Template switching is a mechanism of recombination, but "acquisition" is simply a generic term, "degradation" is not a mechanism, and "cross-species transmission" might be a driver or a result of recombination, but it is not a mechanism of recombination.

      We thank the reviewer for the observation. We have revised the explanation for the recombination events in the discussion section, as some parts of the results have been moved to discussion section (Lines 1058-1065)

      Lines 369-372. It's not clear why this means the event was a "very recent occurrence". Do the authors mean that there were shared integration sites between some of the species, and that these sites lacked the insertions in other species (e.g. gibbon, orangutan, monkeys)?

      For the long section on recombination events involving an env sequence with an LTR in it, can the authors explain how they know when it's a recombination event versus integration of one provirus into another one, followed by recombination between LTRs to generate a solo-LTR?

      We thank the reviewer for the observation. Regarding the very recent occurrence of the recombination event, we have explained it in revised manuscript at lines 769-824 writing “In fact, the recombinant sequences were shared only between 4 species of Catarrhini parvorder and were absent in more distantly related primates (such as gibbons, orangutans, etc.). This with the presence of shared recombination sites suggests that the insertion occurred after the divergence of these species, while its absence in others indicate that it is a recombination event.”

      For the observation regarding the env-LTR recombination events, the recombinants were first detected by the RDP software and were further validated through the BLAT search in the genomes available on genome browser. The explanation on how we obtained these env-LTR recombination events is now provided in lines 746-763 of the revised manuscript.

      Methods Lines 151-168 and Figure 1 legend Lines 689-690 - how did the authors distinguish between "translated regions" corresponding to the actual Env protein sequence from translation of the other two reading frames? That is, there must have been substantial "translatable" stretches of sequence in the two incorrect reading frames as well as the reading frame corresponding to Env, so the question is how were the correct ones identified for the reconstruction?

      We thank the reviewer for the observation. We have provided the detailed explanation to the observation in the methods section (Lines 335-359).

      Line 495 - "previously reported" should include citation(s) of the prior report(s).

      We thank the reviewer for the observation, we have provided appropriate citations.

      Line 525 - the authors propose that the mechanism "is the co-packaging of different ERVs in a virus particle". First, I assume they meant to say that RNA from different ERVs is co-packaged. Second, isn't it also possible or likely that these could arise from co-packaging of exogenous retrovirus RNAs and recombination, especially if the related exogenous forms were still circulating at the time these things arose?

      We thank the reviewer for the observation. We have modified in the revised manuscript a proposed mechanism that includes also the possibility of co-packaging of exogenous retrovirus RNAs and recombination, at lines 1082-1099

      Line 686 - env should either be italicized (gene) or capitalized (protein), depending on what the authors intended here.

      We thank the reviewer for the observation. We have corrected the typological error in the new version of manuscript.

      Reviewer #3 (Public review):

      Summary:

      Retroviruses have been endogenized into the genome of all vertebrate animals. The envelope protein of the virus is not well conserved and acquires many mutations hence can be used to monitor viral evolution. Since they are incorporated into the host genome, they also reflect the evolution of the hosts. In this manuscript the authors have focused their analyses on the env genes of endogenous retroviruses in primates. Important observations made include the extensive recombination events between these retroviruses that were previously unknown and the discovery of HML species in genomes prior to the splitting of old and new world monkeys.

      Strengths:

      They explored a number of databases and made phylogenetic trees to look at the distribution of retroviral species in primates. The authors provide a strong rationale for their study design, they provide a clear description of the techniques and the bioinformatics tools used.

      Weaknesses:

      The manuscript is based on bioinformatics analyses only. The reference genomes do not reflect the polymorphisms in humans or other primate species. The analyses thus likely underestimates the amount of diversity in the retroviruses. Further experimental verification will be needed to confirm the observations.

      Not sure which databases were used, but if not already analyzed, ERVmap.com and repeatmesker are ones that have many ERVs that are not present in the reference genomes. Also, long range sequencing of the human genome has recently become available which may also be worth studying for this purpose.

      We thank the reviewer for the observations and comments. We would like to clarify that the intent of the work was to perform bioinformatics analysis and so a wet lab experimental verification of the observations are out of the scope of the present manuscript. For the aim of the manuscript, we have used the NCBI reference genomes, while for the report of the coordinates of HML supergroup in the squirrel monkey genome and the coordinates of the recombination events through BLAT search we have used genomes assemblies available on Genome browser with repeat masker custom track, since it has well represented ERV annotations.

      The suggestion regarding using long range sequencing of human genome is an interesting perspective and hence in the future work we will try to implement it in our analysis as well as perform an experimental verification, since, again, the focus of the present work does not include wet experimental part.

      Reviewer #3 (Recommendations for the authors):

      In a few places the term HERV has been used when describing ERVs in non-human primates. This needs to be corrected.

      We thank the reviewer for the observation. We have checked and accordingly modified the terms in the manuscript wherever necessary.

    1. Author response:

      eLife Assessment

      This study provides a valuable contribution to understanding how negative affect influences food-choice decision making in bulimia nervosa, using a mechanistic approach with a drift diffusion model (DDM) to examine the weighting of tastiness and healthiness attributes. The solid evidence is supported by a robust crossover design and rigorous statistical methods, although concerns about low trial counts, possible overfitting, and the absence of temporally aligned binge-eating measures limit the strength of causal claims. Addressing modeling transparency, sample size limitations, and the specificity of mood induction effects, would enhance the study's impact and generalizability to broader populations.

      We thank the Editor and Reviewers for their summary of the strengths of our study, and for their thoughtful review and feedback on our manuscript. We apologize for the confusion in how we described the multiple steps performed and hierarchical methods used to ensure that the model we report in the main text was the best fit to the data while not overfitting. We are not certain about what is meant by “[a]ddressing model transparency,” but as described in our response to Reviewer 1 below, we have now more clearly explained (with references) that the use of hierarchical estimation procedures allows for information sharing across participants, which improves the reliability and stability of parameter estimates—even when the number of trials per individual is small. We have clarified for the less familiar reader how our Bayesian model selection criterion penalizes models with more parameters (more complex models). Although details about model diagnostics, recoverability, and posterior predictive checks are all provided in the Supplementary Materials, we have clarified for the less familiar reader how each of these steps ensures that the parameters we estimate are not only identifiable and interpretable, but also ensure that the model can reproduce key patterns in the data, supporting the validity of the model. Additionally, we have provided all scripts for estimating the models by linking to our public Github repository. Furthermore, we have edited language throughout to eliminate any implication of causal claims and acknowledged the limitation of the small sample size.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Using a computational modeling approach based on the drift diffusion model (DDM) introduced by Ratcliff and McKoon in 2008, the article by Shevlin and colleagues investigates whether there are differences between neutral and negative emotional states in:

      (1) The timings of the integration in food choices of the perceived healthiness and tastiness of food options between individuals with bulimia nervosa (BN) and healthy participants.

      (2) The weighting of the perceived healthiness and tastiness of these options.

      Strengths:

      By looking at the mechanistic part of the decision process, the approach has the potential to improve the understanding of pathological food choices. The article is based on secondary research data.

      Weaknesses:

      I have two major concerns and a major improvement point.

      The major concerns deal with the reliability of the results of the DDM (first two sections of the Results, pages 6 and 7), which are central to the manuscript, and the consistency of the results with regards to the identification of mechanisms related to binge eating in BN patients (i.e. last section of the results, page 7).

      (1) Ratcliff and McKoon in 2008 used tasks involving around 1000 trials per participant. The Chen et al. experiment the authors refer to involves around 400 trials per participant. On the other hand, Shevlin and colleagues ask each participant to make two sets of 42 choices with two times fewer participants than in the Chen et al. experiment. Shevlin and colleagues also fit a DDM with additional parameters (e.g. a drift rate that varies according to subjective rating of the options) as compared to the initial version of Ratcliff and McKoon. With regards to the number of parameters estimated in the DDM within each group of participants and each emotional condition, the 5- to 10-fold ratio in the number of trials between the Shevlin and colleagues' experiment and the experiments they refer to (Ratcliff and McKoon, 2008; Chen et al. 2022) raises serious concerns about a potential overfitting of the data by the DDM. This point is not highlighted in the Discussion. Robustness and sensitivity analyses are critical in this case.

      We thank the Reviewer for their thoughtful critique. We agree that a limited number of trials can forestall reliable estimation, which we acknowledge in the Discussion section. However, we used a hierarchical estimation approach which leverages group information to constrain individual-level estimates. This use of group-level parameters to inform individual-level estimates reduces overfitting and noise that can arise when trial counts are low, and the regularization inherent in hierarchical fitting prevents extreme parameter estimates that could arise from noisy or limited data (Rouder & Lu, 2005). As a result, hierarchical estimation has been repeatedly shown to work well in settings with low trial counts, including as few as 40 trials per condition (Ratcliff & Childers, 2015; Wiecki et al., 2013), and previous applications of the time-varying DDM to food choice task data has included experiments with as few as 60 trials per condition (Maier et al., 2020). We have added references to these more recent approaches and specifically note their advantages for the modeling of tasks with fewer trials. Additionally, our successful parameter recovery described in the Supplementary Materials supports the robustness of the estimation procedure and the reliability of our results.

      The authors compare different DDMs to show that the DDM they used to report statistical results in the main text is the best according to the WAIC criterion. This may be viewed as a robustness analysis. However, the other DDM models (i.e. M0, M1, M2 in the supplementary materials) they used to make the comparison have fewer parameters to estimate than the one they used in the main text. Fits are usually expected to follow the rule that the more there are parameters to estimate in a model, the better it fits the data. Additionally, a quick plot of the data in supplementary table S12 (i.e. WAIC as a function of the number of parameters varying by food type in the model - i.e. 0 for M0, 2 for M1, 1 for M2 and 3 for M3) suggests that models M1 and potentially M2 may be also suitable: there is a break in the improvement of WAIC between model M0 and the three other models. I would thus suggest checking how the results reported in the main text differ when using models M1 and M2 instead of M3 (for the taste and health weights when comparing M3 with M1, for τS when comparing M3 with M2). If the differences are important, the results currently reported in the main text are not very reliable.

      We thank the Reviewer for highlighting that it would be helpful for the paper to explicitly note that we specifically selected WAIC as one of two methods to assess model fit because it penalizes for model complexity. We now explicitly state that, in addition to being more robust than other metrics like AIC or BIC when comparing hierarchical Bayesian models like those in the current study, model fit metrics like WAIC penalize for model complexity based on the number of parameters (Watanabe, 2010). Therefore, it is not the case that more complex models (i.e., having additional parameters) would automatically have lower WAICs. Additionally, we note that our second method to assess model fit, posterior predictive checks demonstrate that only model M3 can reproduce key behavioral patterns present in the empirical data. As described in the Supplementary Materials, M1 and M2 miss those patterns in the data. In summary, we used best practices to assess model fit and reliability (Wilson & Collins, 2019): results from the WAIC comparison (which in fact penalizes models with more parameters) and results from posterior predictive checks align in showing that M3 best fit to our data. We have added a sentence to the manuscript to state this explicitly.

      (2) The second main concern deals with the association reported between the DDM parameters and binge eating episodes (i.e. last paragraph of the results section, page 7). The authors claim that the DDM parameters "predict" binge eating episodes (in the Abstract among other places) while the binge eating frequency does not seem to have been collected prospectively. Besides this methodological issue, the interpretation of this association is exaggerated: during the task, BN patients did not make binge-related food choices in the negative emotional state. Therefore, it is impossible to draw clear conclusions about binge eating, as other explanations seem equally plausible. For example, the results the authors report with the DDM may be a marker of a strategy of the patients to cope with food tastiness in order to make restrictive-like food choices. A comparison of the authors' results with restrictive AN patients would be of interest. Moreover, correlating results of a nearly instantaneous behavior (i.e. a couple of minutes to perform the task with the 42 food choices) with an observation made over several months (i.e. binge eating frequency collected over three months) is questionable: the negative emotional state of patients varies across the day without systematically leading patients to engage in a binge eating episode in such states.

      I would suggest in such an experiment to collect the binge craving elicited by each food and the overall binge craving of patients immediately before and after the task. Correlating the DDM results with these ratings would provide more compelling results. Without these data, I would suggest removing the last paragraph of the Results.

      We thank the Reviewer for these interesting suggestions and appreciate the opportunity to clarify that we agree that claims about causal connections between our decision parameters and symptom severity metrics would be inappropriate. Per the Reviewer’s suggestions, we have eliminated the use of the word “predict” to describe the tested association with symptom metrics.  We also agree that more time-locked associations with craving ratings and near-instantaneous behavior would be useful, and we have added this as an important direction for future research in the discussion. However, associating task-based behavior with validated self-report measures that assess symptom severity over long periods of time that precede the task visit (e.g., over the past 2 weeks in depression, over the past month in eating disorders) is common practice in computational psychiatry, psychiatric neuroimaging, and clinical cognitive neuroscience (Hauser et al., 2022; Huys et al., 2021; Wise et al., 2023), and this approach has been used several times specifically with food choice tasks (Dalton et al., 2020; Steinglass et al., 2015). We have revised the language throughout the manuscript to clarify: the results suggest that individuals whose task behavior is more reactive to negative affect tend to be the most symptomatic, but the results do not allow us to determine whether this reactivity causes the symptoms.

      In response to this Reviewer’s important point about negative affect not always producing loss-of-control eating in individuals with BN, we also now explicitly note that while several studies employing ecological momentary assessments (EMA) have repeatedly shown that increases in negative affect significantly increase the likelihood of subsequent loss-of-control eating (Alpers & Tuschen-Caffier, 2001; Berg et al., 2013; Haedt-Matt & Keel, 2011; Hilbert & Tuschen-Caffier, 2007; Smyth et al., 2007), not all loss-of-control eating occurs in the context of negative affect, and that future studies should integrate food choice task data pre and post-affect inductions with measures that capture the specific frequency of loss of control eating episodes that occur during states of high negative affect.

      (3) My major improvement point is to tone down as much as possible any claim of a link with binge eating across the entire manuscript and to focus more on the restrictive behavior of BN patients in between binge eating episodes (see my second major concern about the methods). Additionally, since this article is a secondary research paper and since some of the authors have already used the task with AN patients, if possible I would run the same analyses with AN patients to test whether there are differences between AN (provided they were of the restrictive subtype) and BN.

      We appreciate the Reviewer’s perspective and suggestions. We have adjusted our language linking loss-of-control eating frequency with decision parameters, and we have added additional sentences focusing on the implications for the restrictive behavior of patients with BN between binge eating episodes. In the Supplementary Materials. We have added an analysis of the restraint subscale of the EDE-Q and confirmed no relationship with parameters of interest. While we agree additional analyses with AN patients would be of interest, this is outside the scope of the paper. Our team have collected data from individuals with AN using this task, but not with any affect induction or measure of affect. Therefore, we have added this important direction for future research to the discussion.

      Reviewer #2 (Public review):

      Summary:

      Binge eating is often preceded by heightened negative affect, but the specific processes underlying this link are not well understood. The purpose of this manuscript was to examine whether affect state (neutral or negative mood) impacts food choice decision-making processes that may increase the likelihood of binge eating in individuals with bulimia nervosa (BN). The researchers used a randomized crossover design in women with BN (n=25) and controls (n=21), in which participants underwent a negative or neutral mood induction prior to completing a food-choice task. The researchers found that despite no differences in food choices in the negative and neutral conditions, women with BN demonstrated a stronger bias toward considering the 'tastiness' before the 'healthiness' of the food after the negative mood induction.

      Strengths:

      The topic is important and clinically relevant and methods are sound. The use of computational modeling to understand nuances in decision-making processes and how that might relate to eating disorder symptom severity is a strength of the study.

      Weaknesses:

      The sample size was relatively small and may have been underpowered to find differences in outcomes (i.e., food choice behaviors). Participants were all women with BN, which limits the generalizability of findings to the larger population of individuals who engage in binge eating. It is likely that the negative affect manipulation was weak and may not have been potent enough to change behavior. Moreover, it is unclear how long the negative affect persisted during the actual task. It is possible that any increases in negative affect would have dissipated by the time participants were engaged in the decision-making task.

      We thank the Reviewer for their comments on the strengths of the paper, and for highlighting these important considerations regarding the sample demographics and the negative affect induction. As in the original paper that focused only on ultimate food choice behaviors, we now specifically acknowledge that the study was only powered to detect small to medium group differences in the effect of negative emotion on these final choice behaviors. Regarding the sample demographics, we agree that the study’s inclusion of only female participants is a limitation.  Although the original decision for this sampling strategy was informed by data suggesting that bulimia nervosa is roughly six times more prevalent among females than males (Udo & Grilo, 2018), we now note in the discussion that our female-only sample limits the generalizability of the findings.

      We also agree with the Reviewer’s noted limitations of the negative mood induction, and based on the reviewer’s suggestions, we have added to our original description of these limitations in the Discussion. Specifically, we now note that although the task was completed immediately after the affect induction, the study did not include intermittent mood assessments throughout the choice task, so it is unclear how long the negative affect persisted during the actual task.

      Reviewer #3 (Public review):

      Summary:

      The study uses the food choice task, a well-established method in eating disorder research, particularly in anorexia nervosa. However, it introduces a novel analytical approach - the diffusion decision model - to deconstruct food choices and assess the influence of negative affect on how and when tastiness and healthiness are considered in decision-making among individuals with bulimia nervosa and healthy controls.

      Strengths:

      The introduction provides a comprehensive review of the literature, and the study design appears robust. It incorporates separate sessions for neutral and negative affect conditions and counterbalances tastiness and healthiness ratings. The statistical methods are rigorous, employing multiple testing corrections.

      A key finding - that negative affect induction biases individuals with bulimia nervosa toward prioritizing tastiness over healthiness - offers an intriguing perspective on how negative affect may drive binge eating behaviors.

      Weaknesses:

      A notable limitation is the absence of a sample size calculation, which, combined with the relatively small sample, may have contributed to null findings. Additionally, while the affect induction method is validated, it is less effective than alternatives such as image or film-based stimuli (Dana et al., 2020), potentially influencing the results.

      We agree that the small sample size and specific affect induction method may have contributed to the null model-agnostic behavioral findings. Based on this Reviewer’s and Reviewer 2’s comments, we have added these factors to our original acknowledgements of limitations in the Discussion.

      Another concern is the lack of clarity regarding which specific negative emotions were elicited. This is crucial, as research suggests that certain emotions, such as guilt, are more strongly linked to binge eating than others. Furthermore, recent studies indicate that negative affect can lead to both restriction and binge eating, depending on factors like negative urgency and craving (Leenaerts et al., 2023; Wonderlich et al., 2024). The study does not address this, though it could explain why, despite the observed bias toward tastiness, negative affect did not significantly impact food choices.

      We thank the Reviewer for raising these important points and possibilities. In the supplementary materials, we have added an additional analysis of the specific POMS subscales that comprise the total negative affect calculation that was reported in the original paper (Gianini et al., 2019), and which we now report in the main text. Ultimately, we found that, across both groups, the negative affect induction increased responses related to anger, confusion, depression, and tension while reducing vigor.

      We agree with the Reviewer that factors like negative urgency and cravings are relevant here. The study did not collect any measures of craving, and in response to Reviewer 1 and this Reviewer, we now note in the discussion that replication studies including momentary craving assessments will be important. While we don’t have any measurements of cravings, we did measure negative urgency. Despite these prior findings, the original paper (Gianini et al., 2019) did not find that negative urgency was related to restrictive food choices. We have now repeated those analyses, and we also were unable to find any meaningful patterns. Nonetheless, we have added an analysis of negative urgency scores and decision parameters to the supplementary materials.      

      References

      Alpers, G. W., & Tuschen-Caffier, B. (2001). Negative feelings and the desire to eat in bulimia nervosa. Eating Behaviors, 2(4), 339–352. https://doi.org/10.1016/S1471-0153(01)00040-X

      Berg, K. C., Crosby, R. D., Cao, L., Peterson, C. B., Engel, S. G., Mitchell, J. E., & Wonderlich, S. A. (2013). Facets of negative affect prior to and following binge-only, purge-only, and binge/purge events in women with bulimia nervosa. Journal of Abnormal Psychology, 122(1), 111–118. https://doi.org/10.1037/a0029703

      Dalton, B., Foerde, K., Bartholdy, S., McClelland, J., Kekic, M., Grycuk, L., Campbell, I. C., Schmidt, U., & Steinglass, J. E. (2020). The effect of repetitive transcranial magnetic stimulation on food choice-related self-control in patients with severe, enduring anorexia nervosa. International Journal of Eating Disorders, 53(8), 1326–1336. https://doi.org/10.1002/eat.23267

      Gianini, L., Foerde, K., Walsh, B. T., Riegel, M., Broft, A., & Steinglass, J. E. (2019). Negative affect, dietary restriction, and food choice in bulimia nervosa. Eating Behaviors, 33, 49–54. https://doi.org/10.1016/j.eatbeh.2019.03.003

      Haedt-Matt, A. A., & Keel, P. K. (2011). Revisiting the affect regulation model of binge eating: A meta-analysis of studies using ecological momentary assessment. Psychological Bulletin, 137(4), 660–681. https://doi.org/10.1037/a0023660

      Hauser, T. U., Skvortsova, V., Choudhury, M. D., & Koutsouleris, N. (2022). The promise of a model-based psychiatry: Building computational models of mental ill health. The Lancet Digital Health, 4(11), e816–e828. https://doi.org/10.1016/S2589-7500(22)00152-2

      Hilbert, A., & Tuschen-Caffier, B. (2007). Maintenance of binge eating through negative mood: A naturalistic comparison of binge eating disorder and bulimia nervosa. International Journal of Eating Disorders, 40(6), 521–530. https://doi.org/10.1002/eat.20401

      Huys, Q. J. M., Browning, M., Paulus, M. P., & Frank, M. J. (2021). Advances in the computational understanding of mental illness. Neuropsychopharmacology, 46(1), 3–19. https://doi.org/10.1038/s41386-020-0746-4

      Maier, S. U., Raja Beharelle, A., Polanía, R., Ruff, C. C., & Hare, T. A. (2020). Dissociable mechanisms govern when and how strongly reward attributes affect decisions. Nature Human Behaviour, 4(9), Article 9. https://doi.org/10.1038/s41562-020-0893-y

      Ratcliff, R., & Childers, R. (2015). Individual differences and fitting methods for the two-choice diffusion model of decision making. Decision, 2(4), 237–279. https://doi.org/10.1037/dec0000030

      Rouder, J. N., & Lu, J. (2005). An introduction to Bayesian hierarchical models with an application in the theory of signal detection. Psychonomic Bulletin & Review, 12(4), 573–604. https://doi.org/10.3758/BF03196750

      Smyth, J. M., Wonderlich, S. A., Heron, K. E., Sliwinski, M. J., Crosby, R. D., Mitchell, J. E., & Engel, S. G. (2007). Daily and momentary mood and stress are associated with binge eating and vomiting in bulimia nervosa patients in the natural environment. Journal of Consulting and Clinical Psychology, 75(4), 629–638. https://doi.org/10.1037/0022-006X.75.4.629

      Steinglass, J., Foerde, K., Kostro, K., Shohamy, D., & Walsh, B. T. (2015). Restrictive food intake as a choice—A paradigm for study. International Journal of Eating Disorders, 48(1), 59–66. https://doi.org/10.1002/eat.22345

      Udo, T., & Grilo, C. M. (2018). Prevalence and Correlates of DSM-5–Defined Eating Disorders in a Nationally Representative Sample of U.S. Adults. Biological Psychiatry, 84(5), 345–354. https://doi.org/10.1016/j.biopsych.2018.03.014

      Watanabe, S. (2010). Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory. Journal of Machine Learning Research, 11, 3571–3594.

      Wiecki, T. V., Sofer, I., & Frank, M. J. (2013). HDDM: Hierarchical Bayesian estimation of the drift-diffusion model in Python. Frontiers in Neuroinformatics, 7. https://doi.org/10.3389/fninf.2013.00014

      Wilson, R. C., & Collins, A. G. (2019). Ten simple rules for the computational modeling of behavioral data. eLife, 8, e49547. https://doi.org/10.7554/eLife.49547

      Wise, T., Robinson, O. J., & Gillan, C. M. (2023). Identifying Transdiagnostic Mechanisms in Mental Health Using Computational Factor Modeling. Biological Psychiatry, 93(8), 690–703. https://doi.org/10.1016/j.biopsych.2022.09.034

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study has preliminarily revealed the role of ACVR2A in trophoblast cell function, including its effects on migration, invasion, proliferation, and clonal formation, as well as its downstream signaling pathways.

      Strengths:

      The use of multiple experimental techniques, such as CRISPR/Cas9-mediated gene knockout, RNA-seq, and functional assays (e.g., Transwell, colony formation, and scratch assays), is commendable and demonstrates the authors' effort to elucidate the molecular mechanisms underlying ACVR2A's regulation of trophoblast function. The RNA-seq analysis and subsequent GSEA findings offer valuable insights into the pathways affected by ACVR2A knockout, particularly the Wnt and TCF7/c-JUN signaling pathways.

      Weaknesses:

      The molecular mechanisms underlying this study require further exploration through additional experiments. While the current findings provide valuable insights into the role of ACVR2A in trophoblast cell function and its involvement in the regulation of migration, invasion, and proliferation, further validation in both in vitro and in vivo models is needed. Additionally, more experiments are required to establish the functional relevance of the TCF7/c-JUN pathway and its clinical significance, particularly in relation to pre-eclampsia. Additional techniques, such as animal models and more advanced clinical sample analyses, would help strengthen the conclusions and provide a more comprehensive understanding of the molecular pathways involved.

      Reviewer #2 (Public review):

      Summary:

      ACVR2A is one of a handful of genes for which significant correlations between associated SNPs and the incidences of preeclampsia have been found in multiple populations. It is one of the TGFB family receptors, and multiple ligands of ACVR2A, as well as its coreceptors and related inhibitors, have been implicated in placental development, trophoblast invasion, and embryo implantation. This useful study builds on this knowledge by showing that ACVR2A knockout in trophoblast-related cell lines reduces trophoblast invasion, which could tie together many of these observations. Support for this finding is incomplete, as reduced proliferation may be influencing the invasion results. The implication of cross-talk between the WNT and ACRV2A/SMAD2 pathways is an important contribution to the understanding of the regulation of trophoblast function.

      Strengths:

      (1) ACVR2A is one of very few genes implicated in preeclampsia in multiple human populations, yet its role in pathogenesis is not very well studied and this study begins to address that hole in our knowledge.

      (2) ACVR2A is also indirectly implicated in trophoblast invasion and trophoblast development via its connections to many ligands, inhibitors, and coreceptors, suggesting its potential importance.

      (3) The authors have used multiple cell lines to verify their most important observations.

      Weaknesses:

      (1) There are a number of claims made in the introduction without attribution. For example, there are no citations for the claims that family history is a significant risk factor for PE, that inadequate trophoblast invasion of spiral arteries is a key factor, and that immune responses, and reninangiotensin activity are involved.

      Thank you for pointing out the lack of citations in some parts of the introduction. We have revised the manuscript to include appropriate references for the claims regarding family history as a risk factor for PE, the role of inadequate trophoblast invasion in spiral arteries, and the involvement of immune responses and the renin-angiotensin system. The revised text now includes citations to well-established studies in the field (Salonen Ros et al., 2000; Chappell LC et al., 2021; Brosens et al., 2002; Knofler et al., 2019; Redman CWG et al., 1999; LaMarca B et al., 2008). We believe these additions improve the scientific rigor of the manuscript.

      (2) The introduction states "As a receptor for activin A, ACVR2A..." It's important to acknowledge that ACVR2A is also the receptor for other TGFB family members, with varying affinities and coreceptors. Several TGFB family members are known to regulate trophoblast differentiation and invasion. For example, BMP2 likely stimulates trophoblast invasion at least in part via ACVR2A (PMID 29846546).

      Thank you for highlighting the broader role of ACVR2A as a receptor for multiple members of the TGF-β superfamily. We have revised the introduction to acknowledge that ACVR2A is not only the receptor for activin A but also interacts with other ligands, such as BMP2, which likely stimulates trophoblast invasion via ACVR2A (PMID: 29846546). This addition provides a more comprehensive view of ACVR2A's function in trophoblast biology. While the focus of our current study is on activin A, we agree that ACVR2A's role in mediating the effects of other TGF-β family members is an important topic for future research.

      (3) An alternative hypothesis for the potential role of ACVR2A in preeclampsia is its functions in the endometrium. In the mouse ACVR2A knockout in the uterus (and other progesterone receptorexpressing cells) leads to embryo implantation failure.

      Thank you for bringing up the potential role of ACVR2A in the endometrium as an alternative hypothesis. We have revised the discussion to acknowledge this possibility and cited relevant studies showing that uterine-specific knockout of ACVR2A in mice leads to embryo implantation failure (Monsivais et al., 2021). This suggests that ACVR2A may play a critical role in uterine receptivity and embryo implantation, which could influence placental development and preeclampsia pathogenesis. While our current study focuses on trophoblast-related functions of ACVR2A, we agree that investigating its role in the uterine environment is an important direction for future research.

      (4) In the description of the patient population for placental sample collections, preeclampsia is defined only by hypertension, and this is described as being in accordance with ACOG guidelines. ACOG requires a finding of hypertension in combination with either proteinuria or one of the following: thrombocytopenia, elevated creatinine, elevated liver enzymes, pulmonary, edema, and new onset unresponsive headache.

      We appreciate the reviewer’s detailed observation regarding the definition of preeclampsia.

      We have reviewed and clarified our description of the diagnostic criteria based on the American College of Obstetricians and Gynecologists (ACOG) guidelines. Specifically, we have revised the definition in the Materials and Methods section under "Collection of Placenta and Decidua Specimens," as follows: In accordance with the guidelines from the American College of Obstetricians and Gynecologists (ACOG, 2023), preeclampsia (PE) is diagnosed as hypertension (systolic blood pressure ≥140 mmHg or diastolic blood pressure ≥90 mmHg on at least two occasions) in combination with one or more of the following: proteinuria (≥300 mg/24-hour urine collection or protein/creatinine ratio ≥0.3), thrombocytopenia, elevated serum creatinine, elevated liver enzymes, pulmonary edema, or new-onset headache unresponsive to treatment.

      (5) I believe that Figures 1a and 1b are data from a previously published RNAseq dataset, though it is not entirely clear in the text. The methods section does not include a description of the analysis of these data undertaken here. It would be helpful to include at least a brief description of the study these data are taken from - how many samples, how were the PE/control groups defined, gestational age range, where is it from, etc. For the heatmap presented in B, what is the significance of the other genes/ why are they being shown? If the purpose of these two panels is to show differential expression specifically of ACVR2A in this dataset, that could be shown more directly.

      Clarification of RNAseq dataset: The Methods section has been revised to specify the dataset source (GEO accession number: GSE114691), which includes 20 PE and 21 control placental samples with gestational ages ranging from 34 to 38 weeks. PE and control groups were defined using clinical criteria such as hypertension and proteinuria, and these details have also been added to the Results section. RNAseq analysis description: We have included details of the differential gene expression analysis in the Methods section. Specifically, the DESeq2 R package was used, with thresholds of FDR < 0.05 and |log2(fold change) | ≥ 1. The selection of WNT pathwayrelated genes in Figure 1B is based on these analyses. Significance of the heatmap genes: The genes displayed in Figure 1B were selected based on their significant differential expression and enrichment in pathways relevant to PE pathogenesis, such as the WNT signaling pathway. We have clarified this in the Results section and updated the figure legend to explain their biological relevance. Purpose of Figures 1A and 1B: Figure 1A emphasizes the downregulation of ACVR2A in PE placentas, while Figure 1B complements this by presenting differentially expressed genes associated with the WNT pathway. These figures collectively highlight the role of ACVR2A in PE and its connection to broader molecular pathways. Text descriptions have been updated to improve clarity and focus.

      (6) More information is needed in the methods section to understand how the immunohistochemistry was quantified. "Quantitation was performed" is all that is provided. Was staining quantified across the whole image or only in anchoring villous areas? How were HRP & hematoxylin signals distinguished in ImageJ? How was the overall level of HRP/DAB development kept constant between the NC and PE groups?

      Thank you for pointing out the need for more details regarding the quantification of immunohistochemistry (IHC). We have now clarified and expanded the description of the IHC quantification process in the Methods section as follows: Quantification Across the Entire Section: IHC staining was assessed across the entire tissue section to account for global expression patterns. For quantitative analysis, representative regions from the anchoring villous areas, where ACVR2A expression is most prominent, were selected for comparison between NC and PE groups. This ensured that the analysis focused on biologically relevant regions. ImageJ Analysis:

      Images of stained sections were captured under identical magnifications and lighting conditions. Hematoxylin (blue, nuclear staining) and DAB/HRP (brown, protein-specific signal) were distinguished using ImageJ's color deconvolution plugin. The DAB/HRP signal was isolated and quantified based on the integrated optical density (IOD) within the selected regions. Consistency in HRP/DAB Development: To maintain consistency between NC and PE groups, all tissue samples were processed under identical experimental conditions, including the same antibody dilution, incubation times, and DAB/HRP development durations. Negative controls (without primary antibody) were included to monitor background staining, and the DAB reaction was stopped simultaneously across all samples to avoid overdevelopment. Statistical Analysis: The quantified DAB signal intensity was normalized to the area of the selected regions, and comparisons between NC and PE groups were performed using statistical tests (e.g., Student’s ttest). Results are reported as mean ± SD. We hope this additional detail addresses your concerns.

      (7) In Figure 1E it is not immediately obvious to many readers where the EVT are. It is probably worth circling or putting an arrow to the little region of ACVR2A+ EVT that is shown in the higher magnification image in Figure 1E. These are actually easier to see in the pictures provided in the supplement Figure 1. Of note, the STB is also staining positive. This is worth pointing out in the results text.

      Thank you for your suggestion regarding Figure 1E. To make the location of the ACVR2A+ extravillous trophoblasts (EVTs) more apparent, we have updated Figure 1E by adding arrows to indicate the regions of EVTs in the higher magnification image. Additionally, we have included annotations in the supplemental Figure S1 to further aid visualization. We appreciate your observation that syncytiotrophoblasts (STBs) also show positive staining for ACVR2A. We have revised the Results section to explicitly mention this finding and its potential significance.

      (8) It is not possible to judge whether the IF images in 1F actually depict anchoring villi. The DAPI is really faint, and it's high magnification, so there isn't a lot of context. Would it be possible to include a lower magnification image that shows where these cells are located within a placental section? It is also somewhat surprising that this receptor is expressed in the cytoplasm rather than at the cell surface. How do the authors explain this?

      Thank you for your suggestion to provide more context for the immunofluorescence (IF) images in Figure 1F. To address this, we have included lower magnification images in Supplementary Figure S2, showing the overall structure of the placental section and the location of the anchoring villi. These images help to contextualize the regions analyzed in Figure 1F, which were selected to clearly illustrate ACVR2A expression in extravillous trophoblasts (EVTs). In Figure 1F, we have focused on higher magnification images for better visualization of ACVR2A staining patterns in EVTs. Regarding the subcellular localization of ACVR2A, the receptor is predominantly expressed on the cell surface, as shown in our images. However, some intracellular staining is also observed, which may reflect receptor trafficking or recycling processes, consistent with the behavior of other activin receptors under physiological or pathological conditions. We have clarified these points in the Results and Discussion sections.

      (9) The results text makes it sound like the data in Figure 2A are from NCBI & Protein atlas, but the legend says it is qPCR from this lab. The methods do not detail how these various cell lines were grown; only HTR-SVNeo cell culture is described. Similarly, JAR cells are used for several experiments and their culture is not described.

      Thank you for pointing out the need for clarification regarding Figure 2A and cell culture methods. The data in Figure 2A were generated using RT-qPCR conducted in our laboratory, not solely based on data from NCBI or the Human Protein Atlas. We have revised the Results section to reflect this more accurately. Regarding the culture conditions, we acknowledge that the methods for other cell lines were not explicitly detailed. For this study, all cell lines, including JAR and other cancer cell lines, were cultured following standard protocols provided by the suppliers. Specifically, JAR cells and other cell lines were purchased from Wuhan Punosei Life Technology and were maintained in RPMI-1640 medium supplemented with 10% fetal bovine serum (FBS) and 1% penicillin/streptomycin under standard conditions (37°C, 5% CO<sub>2</sub>). This information has been added to the Methods section for clarity.

      (10) Under RT-qPCR methods, the phrase "cDNA reverse transcription cell RNA was isolated..." does not make any sense.

      Thank you for pointing out the unclear phrasing in the RT-qPCR methods section. We agree that the original description was not precise. To address this, we have revised the relevant section to improve clarity and accuracy. Specifically, the methods now explicitly describe the two key steps: RNA isolation and cDNA synthesis. The revised text reads: Total RNA was isolated from cells using a Total RNA Extraction Kit (TIANGEN, China) following the manufacturer’s instructions. The extracted RNA was reverse-transcribed into complementary DNA (cDNA) using a cDNA Synthesis Kit (Takara, Japan) according to the protocol provided by the manufacturer.

      (11) The paragraph beginning "Consequently, a potential association..." is quite confusing. It mentions analyzing ACVR2A expression in placentas, but then doesn't point to any results of this kind and repeats describing the results in Figure 2a, from various cell lines.

      Thank you for your comment regarding the paragraph beginning with "Consequently, a potential association...". We understand that the current wording may create confusion. The primary aim of this section is to compare ACVR2A expression levels across various cell lines, including trophoblast-derived and non-trophoblast cell lines, to highlight the relevance of ACVR2A in trophoblast function, particularly in invasion and migration. To address your concerns, we have revised the paragraph for clarity and logical flow. The updated text explicitly focuses on the comparison of ACVR2A expression across cell lines (Figure 2A) and how this supports the hypothesis that ACVR2A plays a key role in trophoblast invasion and migration. Additionally, the discussion of placental samples has been separated to avoid confusion with cell line results. We hope this revision resolves the issue.

      (12) The authors should acknowledge that the effect of the ACVR2A knockout on proliferation makes it difficult to draw any conclusions from the trophoblast invasion assays. That is, there might be fewer migrating or invading cells in the knockout lines because there are fewer cells, not because the cells that are there are less invasive. Since this is a central conclusion of the study, it is a major drawback.

      Thank you for highlighting this important point. We agree that the reduced proliferation observed in ACVR2A knockout cells could influence the results of the invasion assays, as fewer cells may inherently lead to reduced invasion. To minimize this effect, we conducted the invasion and migration assays under low-serum conditions (1–2% serum) to limit cell proliferation during the experimental timeframe. This approach was based on optimization trials and existing literature, as serum-free conditions were found to negatively impact cell viability and experimental reproducibility. While these efforts helped to mitigate the impact of proliferation on the results, we acknowledge this as a limitation of our study and have added this discussion to the manuscript. Future studies could incorporate approaches such as normalizing cell numbers or using additional proliferation-independent methods to confirm the findings. We hope this clarification and the steps taken address your concerns.

      (13) The legend and the methods section do not agree on how many fields were selected for counting in the transwell invasion assays in Figure 3C. The methods section and the graph do not match the number of replicate experiments in Figure 3D (the number of replicate experiments isn't described for 3C).

      Thank you for pointing out the inconsistencies regarding the number of fields counted and the number of replicates in the Transwell invasion assays (Figure 3C) and colony formation assays (Figure 3D). We apologize for the lack of clarity in the Methods section and figure legend. To address this, we have revised both the figure legends and the Methods section for consistency and added detailed descriptions. For Figure 3C, cell invasion was quantified by randomly selecting 5 fields of view per sample under 300× magnification. Images shown in the figure were taken at lower magnification to provide a better visual comparison between experimental and control groups. For Figure 3D, each experiment was independently repeated at least 10 times to ensure robust and reproducible results. These clarifications have been incorporated into the revised manuscript. We appreciate your feedback and believe this revision improves the clarity and transparency of our methods.

      (14) Discussion says "Transcriptome sequencing analysis revealed low ACVR2A expression in placental samples from PE patients, consistent with GWAS results across diverse populations." The authors should explain this briefly. Why would SNPs in ACVR2A necessarily affect levels of the transcript?

      Thank you for raising this important point. We acknowledge that our study did not directly investigate how SNPs in the ACVR2A gene affect transcript levels. However, prior studies have suggested that SNPs can influence gene expression through various mechanisms. For example, SNPs in regulatory regions (e.g., promoters, enhancers, or untranslated regions) may affect transcription factor binding, RNA stability, or splicing efficiency, ultimately altering transcript levels. While we did not directly assess the functional consequences of ACVR2A SNPs in this study, the observed downregulation of ACVR2A in PE placentas aligns with the potential regulatory impact of SNPs previously identified in GWAS studies. To address this, we have revised the Discussion section to clarify the relationship between SNPs and transcript levels and acknowledge this limitation.  

      (15) "The expression levels of ACVR2A mRNA were comparable to those of tumor cells such as A549. This discovery suggested a potential pivotal role of ACVR2A in the biological functions of trophoblast cells, especially in the nurturing layer." Alternatively, ACVR2A expression resembles that of tumors because the cell lines used here are tumor cells (JAR) or immortalized cells (HTR8). These lines are widely used to study trophoblast properties, but the discussion should at least acknowledge the possibility that the behavior of these cells does not always resemble normal trophoblasts.

      Thank you for pointing out this important limitation. We agree that the JAR and HTR8/SVneo cell lines, being tumor-derived or immortalized, may not fully replicate the behavior of normal trophoblast cells. While these cell lines are widely used as models for studying trophoblast properties due to their ease of culture and invasive behavior, their gene expression and signaling pathways could partially reflect their tumorigenic or immortalized origins. We have revised the Discussion section to acknowledge this limitation and clarify the interpretation of ACVR2A expression levels in these cells.

      (16) The authors should discuss some of what is known about the relationship between the TCF7/c-JUN pathway and the major signaling pathway activated by ACVR2A, Smad 2/3/4. The Wnt and TGFB family cross-talk is quite complex and it has been studied in other systems.

      Thank you for highlighting the relationship between the TCF7/c-JUN pathway and Smad2/3/4 signaling. In our study, we chose to focus on Smad1/5 due to its strong association with ACVR2A in placental development, as demonstrated in a recent study(DOI: 10.1038/s41467-021-23571-5). This study showed that the BMP signaling pathway, mediated through ACVR2A-Smad1/5, is essential for endometrial receptivity and embryo implantation. While Smad2/3/4 are wellestablished mediators of TGF-β signaling, Smad1/5 activation is more directly linked to ACVR2A in the context of reproductive biology.

      In PE placentas, we observed a significant downregulation of Smad1/5 expression, which supports the hypothesis that ACVR2A-mediated Smad signaling is disrupted in this condition. Although we did not directly assess Smad2/3/4 in this study, prior research has shown that Smad2/3 can interact with TCF/LEF transcription factors to regulate Wnt-related target genes, suggesting potential cross-talk between these pathways. We have now clarified this rationale and included a discussion of these interactions in the revised manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Several points need to be addressed to improve the clarity and robustness of the presented findings:

      (1) From a clinical perspective, several concerns arise regarding the interpretation of these findings. First, the small sample size of 20 patients may not be representative of the broader population, limiting the generalizability of the results. Additionally, although no significant differences in age and pre-pregnancy BMI were observed between the PE and normal control groups, other clinical variables, such as hypertension or gestational diabetes, may also influence ACVR2A expression and contribute to PE development. Furthermore, while the study suggests a correlation between reduced ACVR2A expression and PE, it remains unclear whether this association holds true across different subtypes of PE or whether there are other underlying clinical factors that could account for these changes in gene expression. These factors need to be considered in future studies to better understand the clinical relevance of ACVR2A in PE.

      Thank you for raising these insightful concerns about the clinical interpretation of our findings. We agree that the small sample size of 20 patients may limit the generalizability of our results. To address this, we are actively expanding our cohort by collecting additional clinical samples from PE patients and normotensive controls. This effort aims to strengthen the robustness of our findings and provide stronger evidence for the role of ACVR2A in PE. We would also like to clarify that, during the initial sample collection, we specifically included only PE patients without comorbidities such as gestational diabetes, chronic hypertension, or other pregnancy-related complications. This strict selection criterion was implemented to minimize the potential influence of confounding clinical variables and ensure that our findings specifically reflect the association between ACVR2A expression and PE. While our study provides important initial insights, we recognize the need for larger-scale studies to validate these findings. The ongoing collection of clinical samples will allow us to address this limitation and enhance the translational relevance of our research. We have revised the manuscript to reflect these points and highlight our plans to strengthen the study by increasing the sample size.

      (2) The section "Precision Genome Surgery: ACVR2A Knockout via CRISPR/Cas9" in the results contains some issues with expression details. The results section should be more structured, with data presented in a more detailed and clear manner, ensuring that there is a clear connection between each experimental step and its corresponding result. For example, the sentence "Following multiple rounds of monoclonal culture, genotype identification, RT-qPCR and Western blotting (WB) analysis for screening, specific double-knockout monoclonal cell lines were distinctly chosen" contains redundant phrasing and unnecessary details, which affect the flow of the text.

      Thank you for your constructive feedback on the “Precision Genome Surgery: ACVR2A Knockout via CRISPR/Cas9” section. We agree that this section can be better structured to present the data in a more detailed and coherent manner. To address this, we have reorganized the results into distinct steps, ensuring a clear connection between each experimental step and its corresponding result. Redundant phrasing has been removed to improve the flow and readability of the text. The revised section emphasizes the purpose of each step, the screening process, and the specific results obtained.

      (3) The figure legends and panel labels in Figure 3 should be revised to ensure clarity and consistency. The figure legend should specify the exact panels (e.g., Figure 3A, 3B, 3C, etc.) and clearly describe the experimental conditions and results shown in each part.

      Thank you for pointing out the need for improved clarity and consistency in the figure legends and panel labels for Figure 3. We have revised the figure legend to specify each panel (e.g., Figure 3A, 3B, 3C, etc.) and included detailed descriptions of the experimental conditions and results displayed in each part. These updates aim to ensure better understanding and alignment between the figure legend and the panels.

      (4) Lack of In Vivo Validation of ACVR2A Knockout: The study does not include in vivo experiments to validate the effects of ACVR2A knockout. It would be important to investigate whether similar regulatory effects of ACVR2A on trophoblast cell migration and invasion can be observed in animal models or in larger clinical studies. The lack of in vivo data raises questions about the translational relevance of the findings.

      Thank you for highlighting the importance of in vivo validation to assess the translational relevance of our findings. While we acknowledge that in vivo experiments could provide additional insights into the role of ACVR2A in trophoblast migration and invasion, this study was primarily designed as an in vitro investigation to explore the molecular mechanisms underlying ACVR2A function in trophoblast cells. The choice of an in vitro model allowed us to perform precise and controlled mechanistic analyses, which are critical for establishing a foundation for future research. We agree that in vivo studies using animal models or larger clinical cohorts are important next steps to validate the regulatory effects of ACVR2A on trophoblast function and its contribution to PE pathogenesis. These directions will be pursued in future research to further establish the translational potential of our findings. We have included this perspective in the revised Discussion section.

      (5) TCF7/c-JUN Pathway in Clinical Samples: In the study of the TCF7/c-JUN pathway, the authors mention assessing protein expression in clinical samples through immunohistochemistry (IHC). However, the manuscript does not provide a clear explanation of how the findings from laboratory cell models (such as HTR8/SVneo and JAR) relate to the clinical samples. Specifically, while ACVR2A knockout is shown to affect these proteins at the cellular level, it is unclear whether this effect is observed in clinical samples. Therefore, further validation of the TCF7/c-JUN pathway in the cell models and exploration of its relationship with protein expression in clinical samples is necessary. Additional experiments, such as immunofluorescence staining or mass spectrometry, could further confirm the role of the TCF7/c-JUN pathway in cells and provide a more direct comparison with clinical data.

      Thank you for highlighting the need to connect findings from cell models to clinical samples, particularly with respect to the TCF7/c-JUN pathway. In response to your comment, we conducted additional experiments using Western blot analysis to evaluate the expression of ACVR2A, SMAD1/5, SMAD4, pSMAD1/5/9, and TCF7L1/TCF7L2 in PE placental tissues compared to normotensive controls (Figure 7A). The results demonstrated significantly reduced expression of these proteins in PE placentas, providing evidence that disruptions in the ACVR2A-SMAD and TCF7/c-JUN signaling pathways observed in vitro are also present in clinical samples.

      These findings strengthen the translational relevance of our study by directly linking the molecular mechanisms identified in cell models to clinical observations. We have updated the Results and Discussion sections to incorporate these new data, and we believe this addition addresses your concern about the relationship between in vitro and clinical findings.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      The authors have constructively responded to previous referee comments and I believe that the manuscript is a useful addition to the literature. I particularly appreciate the quantitative approach to social behavior, but have two cautionary comments.

      (1) Conceptually it is important to further justify why this particular maximum entropy model is appropriate. Maximum entropy models have been applied across a dizzying array of biological systems, including genes, neurons, the immune system, as well as animal behavior, so would seem quite beneficial to explain the particular benefits here, for mouse social behavior as coarse-grained through the eco-hab chamber occupancy. This would be an excellent chance to amplify what the models can offer for biological understanding, particularly in the realm of social behavior

      We thank the reviewer for this comment. Maximum entropy models, along with other statistical inference methods that learn interaction patterns from simultaneously-measured degrees of freedom, help distinguish various types of interactions, e.g. direct vs. indirect interactions among animals, individual preference to food vs. social interaction with pairs. As research on social behavior expands from focusing on pairs of animals to studying groups in (semi-)naturalistic environments, maximum entropy models serve as a crucial link between high-throughput data and the need to identify and distinguish interaction rules. Specifically, among all possible maximum entropy models, the pairwise maximum entropy model is one of the simplest that can describe interactions among individuals, which serves as an excellent starting point to understand collective and social behavior in animals.

      Although the Eco-HAB setup currently records spatially coarse-grained data, it still provides more spatial information compared to the traditional three-chamber tests used to assess sociability for rodents. By showing that the maximum entropy model can effectively analyze Eco-HAB data, we hope to highlight its potential in research of social behavior in animals.

      To amplify what the models can offer for biological understanding particularly in the realm of social behavior, We have updated the Introduction to add a more logical structure to the need of using maximum entropy models to identify interactions among mice. Additionally, we updated the first paragraph of the Discussion to make it specific that it is the use of maximum entropy models that identifies interaction patterns from the high-throughput data. Finally, we have also added in the Discussion (line 422-425) arguments supporting the specific use of pairwise maximum entropy models to study social behaviors.

      (2) Maximum entropy models of even intermediate size systems involve a large number of parameters. The authors are transparent about that limitation here, but I still worry that the conclusion of the sufficiency of pairwise interactions is simply not general, and this may also relate to the differences from previous work. If, as the authors suggest in the discussion, this difference is one of a choice of variables, then that point could be emphasized. The suggestion of a follow up study with a smaller number of mice is excellent.

      We thank the reviewer for raising the issue and agree that the caveat of how general pairwise interactions can describe social behavior of animals needs to be discussed. We have added a sentence in the Discussion to point out this important caveat. “More generally, this discrepancy when looking at different choices of variables raises the issue that when studying social behavior of animals in a group, it is important to test and compare interaction models with different complexity (e.g. pairwise or with higher-order interactions).” We have also toned down our conclusion to limit our results of pairwise interactions describing mice co-localization patterns to the data collected in Eco-HAB (also see Reviewer 3 Major Point 2).

      Reviewer #3 (Public review):

      Summary:

      Chen et al. present a thorough statistical analysis of social interactions, more precisely, co-occupying the same chamber in the Eco-HAB measurement system. They also test the effect of manipulating the prelimbic cortex by using TIMP-1 that inhibits the MMP-9 matrix metalloproteinase. They conclude that altering neural plasticity in the prelimbic cortex does not eliminate social interactions, but it strongly impacts social information transmission.

      Strengths:

      The quantitative approach to analyzing social interactions is laudable and the study is interesting. It demonstrates that the Eco-HAB can be used for high throughput, standardized and automated tests of the effects of brain manipulations on social structure in large groups of mice.

      Weaknesses:

      A demonstration of TIMP-1 impairing neural plasticity specifically in the prelimbic cortex of the treated animals would greatly strengthen the biological conclusions. The Eco-HAB provides coarser spatial information compared to some other approaches, which may influence the conclusions.

      Recommendations for the authors:  

      Reviewer #3 (Recommendations for the authors):

      Major points

      (1) Do the Authors have evidence that TIMP-1 was effective, as well as specific to the prelimbic cortex?

      We refer to the literature for the effectiveness and specificity of TIMP-1 to the prelimbic cortex.

      Specifically, the study by Okulski et al. (Biol. Psychiatry 2007) provides clear evidence that TIMP1 plays a role in synaptic plasticity in the prefrontal cortex. They showed that TIMP-1 is induced in the medial prefrontal cortex (mPFC) following stimulation that triggers late long-term potentiation (LTP), a key model of synaptic plasticity. Overexpression of TIMP-1 in the mPFC blocked the activity of matrix metalloproteinases (MMPs) and prevented the induction of late LTP in vivo. Similar effects were observed with pharmacological inhibition of MMP-9 in vitro, reinforcing the idea that TIMP-1 regulates extracellular proteolysis as part of the plasticity mechanism in the prefrontal cortex. These findings confirm that TIMP-1 is both effective and active in this specific brain region.

      Further evidence comes from Puścian et al. (Mol. Psychiatry 2022), who used TIMP-1-loaded nanoparticles to influence neuronal plasticity in the amygdala. They found that TIMP-1 affected MMP expression, LTP, and dendritic morphology, showing its impact on synaptic modifications. More directly relevant, Winiarski et al. (Sci. Adv. 2025) demonstrated that injecting TIMP-1-loaded nanoparticles into the prelimbic cortex altered responses to social stimuli, further supporting the idea that TIMP-1 has region-specific effects on behavioral processes.

      We have also updated the main text (page 8, 1st paragraph of “Effect of impairing neuronal plasticity in the PL on subterritory preferences and sociability”) of the manuscript to include the above references.

      (2) The Authors seem to suggest that one main reason for the different results compared to Shemesh et al. 2013 was the coarseness of the Eco-HAB data. In this case, I think this conclusion should be toned down because of this significant caveat.

      We thank the reviewer for pointing this out, and agree that this caveat and difference should be emphasized. To tone down the conclusion, we have

      (1) added details about the Eco-HAB (it being coarse-grained, etc.) in the abstract to tone down the conclusion.

      (2) added to the results summary in the Discussion (top of page 12) that the results are “within in the setup of the semi-naturalistic Eco-HAB experiments”

      (3) added to the Discussion (page 13) that the different results compared to Shemesh et al 2013 means that general studies of social behavior need to compare models with different levels of complexity (e.g. pairwise vs. higher-order interactions). (Also see Reviewer 2 Comment 2.)

      Minor points

      (1) Please explain what is measured in Fig. 1C (what is on the y axis?).

      Figure 1C shows the activity of the mice as measured by the rate of transitions, i.e. the number of times the mice switch boxes during each hour of the day, averaged over all N = 15 mice and T = 10 days (cohort M1). The error bars represent variability of activities across individuals or across days. For mouse-to-mouse variability (blue), we first compute for each mouse its number of transitions averaged over the same hour for all 10 days, then we compute its standard deviation across all 15 mice and plot it as error bars. For day-to-day variability (orange), we first compute for each day the number of transitions for each hour averaged over all mice, then compute its standard deviation across all 10 days as the errorbar. We have added the detailed explanation in the caption of Figure 1C.

      (2) In Fig. 3, it would be better to present the control group also in the main figure instead of the supplementary.

      We have merged Figure 3 and Figure 3 Supplementary 1 to present the control group also in the main figure.

      (3) In Fig. 3 and corresponding supplements, there seems to be a large difference between males and females. I think this would deserve some more discussion.

      While not being the main focus of this paper, we agree with the reviewer that the difference between male and female is important and deserves attention in the discussion and also future study. Thus we have added a paragraph in the Discussion (line 394-399, bottom of page 12).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this report, the authors made use of a murine cell life derived from a MYC-driven liver cancer to investigate the gene expression changes that accompany the switch from normoxic to hypoxia conditions during 2D growth and the switch from 2D monolayer to 3D organoid growth under normoxic conditions. They find a significant (ca. 40-50%) overlap among the genes that are dysregulated in response to hypoxia in 2D cultures and in response to spheroid formation. Unsurprisingly, hypoxia-related genes were among the most prominently deregulated under both sets of conditions. Many other pathways pertaining to metabolism, splicing, mitochondrial electron transport chain structure and function, DNA damage recognition/repair, and lipid biosynthesis were also identified.

      We thank this reviewer for his/her time and efforts, and the insightful comments.

      Major comments:

      (1) Lines 239-240: The authors state that genes involved in DNA repair were identified as being necessary to maintain survival of both 2D and 3D cultures (Figure S6A). Hypoxia is a strong inducer of ROS. Thus, the ROS-specific DNA damage/recognition/repair pathways might be particularly important. The authors should look more carefully at the various subgroups of the many genes that are involved in DNA repair. They should also obtain at least a qualitative assessment of ROS and ROS-mediated DNA damage by staining for total and mitochondrial-specific ROS using dyes such as CM-H2-DCFDA and MitoSox. Actual direct oxidative damage could be assessed by immunostaining for 8-oxo-dG and related to the sub-types of DNA damage-repair genes that are induced. The centrality of DNA damage genes also raises the question as to whether the previously noted prominence of the TP53 pathway (see point 5 below) might represent a response to ROS-induced DNA damage.

      We thank this reviewer for the insightful comments, and agreed that ROS induced by hypoxia could play a role in modulating DNA repair and consequently cellular essentiality. Although pathway enrichment in Figure S6A (now as Figure 2-figure supplement 4A) showed that DNA repair pathway was essential to cell survival in hypoxia and 3D cultures, the genes associated with this pathway (Ddb1;Brf2;Gtf3c5;Guk1;Taf6) are not typical DNA repair genes. They are more likely involved in gene transcription. However, it will be interesting to see if they are specifically involved in DNA damage in response to ROS, which is out of focus of this study.

      (2) Because most of the pathway differences that distinguish the various cell states from one another are described only in terms of their transcriptome variations, it is not always possible to understand what the functional consequences of these changes actually are. For example, the authors report that hypoxia alters the expression of genes involved in PDH regulation but this is quite vague and not backed up with any functional or empirical analyses. PDH activity is complex and regulated primarily via phosphorylation/dephosphorylation (usually mediated by PDK1 and PDP2, respectively), which in turn are regulated by prevailing levels of ATP and ADP. Functionally, one might expect that hypoxia would lead to the down-regulation of PDH activity (i.e. increased PDH-pSer392) as respiration changes from oxidative to non-oxidative. This would not be appreciated simply by looking at PDH transcript levels. This notion could be tested by looking at total and phospho-PDH by western blotting and/or by measuring actual PDH activity as it converts pyruvate to AcCoA.

      We agreed with this reviewer that PDH activity regulation could be affected by multi-factors, and it is worthy of further validation by other approaches.

      (3) Line 439: Related to the above point: the authors state: "It is likely that blockade of acetyl-CoA production by PDH knockout may force cells to use alternative energy sources under hypoxic and 3D conditions, averting the Warburg effect and promoting cell survival under limited oxygen and nutrient availability in 3D spheroids." This could easily be tested by determining whether exogenous fatty acids are more readily oxidized by hypoxic 2D cultures or spheroids than occurs in normoxic 2D cultures.

      We thank for this suggestion. We apologized for not being able to validate everything.

      (4) Line 472: "Hypoxia induces high expression of Acaca and Fasn in NEJF10 cells indicating that hypoxia promotes saturated fatty acid synthesis...The beneficial effect of Fasn and Acaca KO to NEJF10 under hypoxia is probably due to reduction of saturated fatty acid synthesis, and this hypothesis needs to be tested in the future.". As with the preceding comment, this supposition could readily be supported directly by, for example, performing westerns blots for these enzymes and by showing that incubation of hypoxic 2D cells or spheroids converted more AcCoA into lipid.

      We thank for this suggestion. However, functional validation for the Fasn and Acaca KO is out of focus in this study.

      (5) In Supplementary Figure 2B&C, the central hub of the 2D normoxic cultures is Myc (as it should well be) whereas, in the normoxic 3D, the central hub is TP53 and Myc is not even present. The authors should comment on this. One would assume that Myc levels should still be quite high given that Myc is driven by an exogenous promoter. Does the centrality of TP53 indicate that the cells within the spheroids are growtharrested, being subjected to DNA damage and/or undergoing apoptosis?

      The predicted transcription factor activity analysis was based on the differential ATAC-seq peaks among different culture through pairwise comparisons. If TP53 and MYC were not present under that condition, it did not mean their activity was absent.

      “…the centrality of TP53 indicate that the cells within the spheroids are growth-arrested, being subjected to DNA damage and/or undergoing apoptosis?” This reviewer has raised an interesting question. We are investigating this hypothesis and hopefully we can give a clear answer in the future.

      (6) In the Materials and Methods section (lines 711-720), the description of how spheroid formation was achieved is unclear. Why were the cells first plated into non-adherent 96 well plates and then into nonadherent T75 flasks? Did the authors actually utilize and expand the cells from 144 T75 flasks and did the cells continue to proliferate after forming spheroids? Many cancer cell types will initially form monolayers when plated onto non-adherent surfaces such as plastic Petri dishes and will form spheroid-like structures only after several days. Other cells will only aggregate on the "non-adherent" surface and form spheroid-like structures but will not actually detach from the plate's surface. Have the authors actually documented the formation of true, non-adherent spheroids at 2 days and did they retain uniform size and shape throughout the collection period? The single photo in Supplementary Figure 1 does not explain when this was taken. The authors include a schematic in Figure 2A of the various conditions that were studied. A similar cartoon should be included to better explain precisely how the spheroids were generated and clarify the rationale for 96 well plating. Overall, a clearer and more concise description of how spheroids were actually generated and their appearance at different stages of formation needs to be provided.

      The cells were initially plated in non-adherent 96-well plates to facilitate the formation of spheroids in a controlled and uniform manner. As correctly mentioned by the reviewer, during the initial stages, cells cultured on non-adherent surfaces often form aggregates or clumps, and it takes a few days for them to develop into solid spheroids.

      In our study, we aimed to achieve 3D spheroid formation immediately following the transduction process to allow for screening under both 2D and 3D conditions. Plating the cells into 96-well plates enabled us to monitor and control the formation of spheroids in smaller volumes before scaling up the culture in non-adherent T75 flasks for subsequent experimental steps. This setup allows us to maintain gene editing processes under both 2D and 3D conditions.

      Regarding the proliferation and uniformity of spheroids:

      • Yes, the spheroids continued to proliferate after their formation.

      • True, non-adherent spheroids were documented as early as the next day. This was visually confirmed under microscopy, and size uniformity was maintained throughout the collection period by following optimized culture protocols.

      We also agreed with the reviewer’s suggestion to include a cartoon schematic similar to Figure 2A, illustrating the spheroid generation process and clarifying the rationale for using 96-well plates. We have included such a cartoon and speroid growth curve monitored by Incucyte as Figure 2-figure supplement 2.

      (7) The authors maintained 2D cultures in either normoxic or hypoxic (1% O2) states during the course of their experiments. On the other hand, 3D cultures were maintained under normoxic conditions, with the assumption that the interiors of the spheroids resemble the hypoxic interiors of tumors. However, the actual documentation of intra-spheroid hypoxia is never presented. It would be a good idea for the authors to compare the degree of hypoxia achieved by 2D (1% O2) and 3D cultures by staining with a hypoxia-detecting dye such as Image-iT Green. Comparing the fluorescence intensities in 2D cultures at various O2 concentrations might even allow for the construction of a "standard curve" that could serve to approximate the actual internal O2 concentration of spheroids. This would allow the authors to correlate the relative levels of hypoxia between 2D and 3D cultures.

      This is an excellent idea that we certainly will do it in our future experiments.

      (8) Related to the previous 2 points, the authors performed RNAseq on spheroids only 48 hours after initiating 3D growth. I am concerned that this might not have been a sufficiently long enough time for the cells to respond fully to their hypoxic state, especially given my concerns in Point 6. Might the results have been even more robust had the authors waited longer to perform RNA seq? Why was this short time used?

      We agreed with this reviewer. We were unsure if 48hours was an ideal timepoint. It might be necessary to perform a longitudinal experiment to harvest samples under different timepoints in the future experiments.

      (9) What happens to the gene expression pattern if spheroids are re-plated into standard tissue culture plates after having been maintained as spheroids? Do they resume 2D growth and does the gene expression pattern change back?

      This is a great question and we have never thought about what the gene expression pattern would be if speroids are re-plated in 2D. This could be a challenging experiment because the gene expression and epigenetic changes are timing related. However, the cells do grow well after re-plated in 2D.

      (10) Overall, the paper is quite descriptive in that it lists many gene sets that are altered in response to hypoxia and the formation of spheroids without really delving into the actual functional implications and/or prioritizing the sets. Some of these genes are shown by CRISPR screening to be essential for maintaining viability although in very few cases are these findings ever translated into functional studies (for example, see points 14 above). The list of genes and gene pathways could benefit from a better explanation and prioritization of which gene sets the authors believe to be most important for survival in response to hypoxia and for spheroid formation.

      This was a genome-wide study that integrated RNA-seq, ATAC-seq and CRISPR KO, providing resource to understand the oncogenic pathways in different culture conditions. We believe we have clearly articulated the important genes/pathways in our abstract.

      (11) The authors used a single MYC-driven tumor cell line for their studies. However, in their original paper (Fang, et al. Nat Commun 2023, 14: 4003.) numerous independent cell lines were described. It would help to know whether RNAseq studies performed on several other similar cell lines gave similar results in terms of up & down-regulated transcripts (i.e. representative of the other cell lines are NEJF10 cells).

      We have not generated RNA-seq data for these cell lines cultured in different conditions.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Fang et al., provides a tour-de-force study uncovering cancer cell's varied dependencies on several gene programs for their survival under different biological contexts. The authors addressed genomic differences in 2D vs 3D cultures and how hypoxia affects gene expression. They used a Myc-driven murine liver cancer model grown in 2D monolayer culture in normoxia and hypoxia as well as cells grown as 3D spheroids and performed CRISPR-based genome-wide KO screen to identify genes that play important roles in cell fitness. Some context-specific gene effects were further validated by in-vitro and in-vivo gene KO experiments.

      Strengths:

      The key findings in this manuscript are:

      (1) Close to 50% of differentially expressed genes were common between 2D Hypoxia and 3D spheroids conditions but they had differences in chromatin accessibility.

      (2) VHL-HIF1a pathway had differential cell fitness outcomes under 2D normoxia vs 2D hypoxia and 3D spheroids.

      (3) Individual components of the mitochondrial respiratory chain complex had contrasting effects on cell fitness under hypoxia.

      (4) Knockout of organogenesis or developmental pathway genes led to better cell growth specifically in the context of 3D spheroids and knockout of epigenetic modifiers had varied effects between 2D and 3D conditions.

      (5) Another key program that leads to cells fitness outcomes in normoxia vs hypoxia is the lipid and fatty acid metabolism.

      (6) Prmt5 is a key essential gene under all growth conditions, but in the context of 3D spheroids even partial loss of Prmt5 has a synthetic lethal effect with Mtap deletion and Mtap is epigenetically silenced specifically in the 3D spheroids.

      We appreciate this reviewer for acknowledging the strengths of our study.

      Issues to address:

      (1) The authors should clarify the link between the findings of the enrichment of TGFb-SMAD signaling REACTOME pathway to the findings that knocking out TGFb-SMAD pathway leads to better cell fitness outcomes for cells in the 3D growth conditions.

      We have clarified this link in abstract by saying “Notably, multicellular organogenesis signaling pathways including TGFb-SMAD, which is upregulated in 3D culture, specifically constrict the uncontrolled cell proliferation in 3D while inactivation of epigenetic modifiers (Bcor, Kmt2d, Mettl3 and Mettl14) has opposite outcomes in 2D vs. 3D:

      (2) Supplementary Figure 4C has been cited in the text but doesn't exist in the supplementary figures section.

      Sorry for this typo. It should be 5C which is Figure 2-figure supplement 3C in the new version of MS. We have corrected it now.

      (3) A small figure explaining this ABC-Myc driven liver cancer model in Supplementary Figure 1 would be helpful to provide context.

      We appreciate this suggestion. We have added a cartoon as Figure 1-figure supplement 1A to indicate the procedure for generation of this model.

      (4) The method for spheroids formation is not found in the method section.

      We described the method in our previous publication (Nature Communications 2023 Jul 6;14(1):4003.). However, we have added the information in method now, and the procedure is very simple (line 623-624). We found the murine liver cancer cell lines can readily form spheroids when they are cultured in low-attachment dish with standard DMEM complete media.

      (5) In Supplementary Figure 1b, the comparisons should be stated the opposite way - 3D vs 2D normoxia and 2D-Hypoxia vs 2D-Normoxia.

      We have made correction in the Figure legend of Figure S1B which is Figure 1B now in the new version of MS.

      (6) There are typos in the legend for Supplementary Figure 10.

      We have checked the typos.

      (7) Consider putting Supplementary Figure 1b into the main Figure 1.

      We have moved both Supplementary Figure 1a and 1b into main Figure 1 as Figure 1A and 1B. Hopefully, this will help the readers to catch the information easily.

      (8) Please explain only one timepoint (endpoint) for 3D spheroids was performed for the CRISPR KO screen experiment, while several timepoints were done for 2D conditions? Was this for technical convenience?

      As this reviewer speculated, indeed this was for technical convenience. We found that it was technically challenging to split the spheroids for CRISPR screening.

      (9) In line 372, it is indicated that Bcor KO (Fig 5e) had growth advantage - this was observed in only one of the gRNA -- same with Kmt2d KO in the same figure where there was an opposite effect. Please justify the use of only one gRNA.

      We actually used 4 gRNAs for each gene. In the heatmap, although one of the gRNA for each gene showed some levels of enrichment under hypoxic 2D condition, they were all highly enriched in 3D.

      (10) Why was CRISPR based KO strategy not used for the PRMT5 gene but rather than the use of shRNA.? Note that one of the shRNA for PRMT5 had almost no KO (PRMT5-shRNA2 Figure 7B) but still showed phenotype (Figure 7D) - please explain.

      We used shRNA as second approach for cross-validation. We agreed that the knockdown efficiency of shRNA2 was not as good as the others, with only about 40% knockdown efficiency.

      (11) In Figure 7D, which samples (which shRNA group) were being compared to do the t-test?

      The comparisons were for shCtrl and each of the shPRMT5. We have clarified this in figure legend.

      (12) In line 240, it is stated that oxphos gene set is essential for NEJF10 cell survival in both normoxia and hypoxia conditions. But shouldn't oxphos be non-essential in hypoxia as cells move away from oxphos and become glycolytic?

      This is a great question. While indeed hypoxia may promote the switch from oxphos to glycolysis, several studies showed that the low oxygen concentrations in hypoxic regions of tumors may not be limiting for oxphos, and ATP is generated by oxphos in tumors even at very low oxygen tensions (please see review Clin Cancer Res (2018) 24 (11): 2482–2490.). We therefore speculated that NEJF10 cells were still dependent on oxphos for ATP production under hypoxia. However, this needs further investigation. We have added this discussion in our manuscript (line 250-254).

      (13) In line 485 it is mentioned that Pmvk and Mvd genes which are involved in cholesterol synthesis when knocked out had a positive effect on cell growth in 3D conditions and since cholesterol synthesis is essential for cell growth how does this not matter much in the context of 3D - please explain.

      We thank this reviewer for this note. It seemed that only two gRNA for each were upregulated in 3D and it could be due to technical issue or clonal selection. We have deleted this sentence in our new version of MS.

      Reviewer #3 (Public review):

      Summary:

      In this study, Fang et al. systematically investigate the effects of culture conditions on gene expression, genome architecture, and gene dependency. To do this, they cultivate the murine HCC line NEJF10 under standard culture conditions (2D), then under similar conditions but under hypoxia (1% oxygen, 2D hypoxia) and under normoxia as spheroids (3D). NEJF10 was isolated from a marine HCC model that relies exclusively on MYC as a driver oncogene. In principle, (1) RNA-seq, (2) ATAC-seq and (3) genetic screens were then performed in this isogenic system and the results were systematically compared in the three cultivation methods. In particular, genome-wide screens with the CRISPR library Brie were performed very carefully. For example, in the 2D conditions, many different time points were harvested to control the selection process kinetically. The authors note differential dependencies for metabolic processes (not surprisingly, hypoxia signaling is affected) such as the regulation and activity of mitochondria, but also organogenesis signaling and epigenetic regulation.

      Strengths:

      The topic is interesting and relevant and the experimental set-up is carefully chosen and meaningful. The paper is well written. While the study does not reveal any major surprises, the results represent an important resource for the scientific community.

      We thank this reviewer for his/her positive comments.

      Weaknesses:

      However, this presupposes that the statistical analysis and processing are carried out very carefully, and this is where my main suggestions for revision begin. Firstly, I cannot find any information on the number of replicates in RNA- and ATAC-seq. This should be clearly stated in the results section and figure legends and cut-offs, statistical procedures, p-values, etc. should be mentioned as well. In principle, all NGS experiments (here ATAC- and RNA-seq) should be performed in replicates (at least duplicates, better triplicates) or the results should be validated by RT-PCR in independent biological triplicates. Secondly, the quantification of the analyses shown in the figures and especially in the legends is not sufficiently careful. Units are often not mentioned. Example Figure 4a: The legend says: 'gRNA reads' but how can the read count be -1? I would guess these are FC, log2FC, or Z-values. All figure legends need careful revision.

      Based upon the reviewer’s suggestions, we have added details about the replicates in figure legend. For gRNA read heatmap, the scale bar indicates the Z score. We have added the information in figure legends.

      Furthermore, I would find a comparison of the sgRNA abundances at the earliest harvesting time with the distribution in the library interesting, to see whether and to what extent selection has already taken place before the three culture conditions were established (minor point).

      This is great point. Unfortunately, we did not perform such an analysis.

      Recommendations for the authors:

      Reviewing Editor:

      There are three general issues:

      First, there is a lack of detail regarding much of the analysis. In some cases, this makes it difficult to assess the value of the data, albeit, there is generally a consensus the information is really interesting.

      Second, the findings - although provocative - lack mechanistic details and are focused more on descriptive findings. Hence, the manuscript would be improved by some effort at evaluating identified programs and providing some suggestions of mechanisms.

      Third, the authors need to put much more effort into the clarity and tightness of the presentation.

      We have made clarification in response to the reviewer’s comments.

      Reviewer #1 (Recommendations for the authors):

      Figure S1C. the labeling of the lower x-axis is inverted.

      Due to space limitation, we changed the figure orientation in our old version of MS. We have tilted the figure back in the new version, which is Figure 1-figure supplement 1B now.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Summary:

      The authors address the role of the centromere histone core in force transduction by the kinetochore.

      Strengths:

      They use a hybrid DNA sequence that combines CDEII and CDEIII as well as Widom 601 so they can make stable histones for biophysical studies (provided by the Widom sequence) and maintain features of the centromere (CDE II and III).

      Weaknesses:

      The main results are shown in one figure (Figure 2). Indeed the Centromere core of Widom and CDE II and III contribute to strengthening the binding force for the OA-beads. The data are very nicely done and convincingly demonstrate the point. The weakness is that this is the entire paper. It is certainly of interest to investigators in kinetochore biology, but beyond that, the impact is fairly limited in scope.

      This reviewer might have missed that this is a Research Advance, not an article. Research Advances are limited in scope by definition and provide a new development that builds on research reported in a prior paper. They can be of any length. Our Research Advance builds on our prior work, Hamilton et al., 2020 and provides the new result that native centromere sequences strengthen the attachment of the kinetochore to the nucleosome.

      Reviewer #2:

      Summary:

      This paper provides a valuable addendum to the findings described in Hamilton et al. 2020 (https://doi.org/10.7554/eLife.56582). In the earlier paper, the authors reconstituted the budding yeast centromeric nucleosome together with parts of the budding yeast kinetochore and tested which elements are required and sufficient for force transmission from microtubules to the nucleosome. Although budding yeast centromeres are defined by specific DNA sequences, this earlier paper did not use centromeric DNA but instead the generic Widom 601 DNA. The reason is that it has so far been impossible to stably reconstitute a budding yeast centromeric nucleosome using centromeric DNA.

      In this new study, the authors now report that they were able to replace part of the Widom 601 DNA with centromeric DNA from chromosome 3. This makes the assay more closely resemble the in vivo situation. Interestingly, the presence of the centromeric DNA fragment makes one type of minimal kinetochore assembly, but not the other, withstand stronger forces.

      We thank the reviewer for their careful and positive assessment of our work.

      Which kinetochore assembly turned out to be affected was somewhat unexpected, and can currently not be reconciled with structural knowledge of the budding yeast centromere/kinetochore. This highlights that, despite recent advances (e.g. Guan et al., 2021; Dendooven et al., 2023), aspects of budding yeast kinetochore architecture and function remain to be understood and that it will be important to dissect the contributions of the centromeric DNA sequence.

      We couldn’t agree more.

      Given the unexpected result, the study would become yet more informative if the authors were able to pinpoint which interactions contribute to the enhanced force resistance in the presence of centromeric DNA.

      Strength:

      The paper demonstrates that centromeric DNA can increase the attachment strength between budding yeast microtubules and centromeric nucleosomes.

      Weakness:

      How centromeric DNA exerts this effect remains unclear.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Additional specific mutants would be helpful in interpreting the effect observed. The authors speculate that a small segment of OA near the DNA (based on Dendooven et al., 2023) could be important. Would it be possible to introduce specific mutations and test this?

      This would be an interesting study but is far beyond the scope of a Research Advance. In fact, it would make a nice thesis project for a new student. Although perhaps not obvious, these studies require a large set of reagents including wrapped nucleosomes, which must be made fresh (they cannot be frozen) and five purified recombinant complexes, purified by specialized protocols that maintain their activity. Moreover, each datapoint is gathered one at a time. For example, the data in Figure 2 in this manuscript includes 343 datapoints acquired one at a time over the course of 1.5 years.  

      (2) Please provide the sequences of the other CEN3-W601 chimeras that were tested and did NOT stably wrap centromeric histone octamers. This may help others to design yet different constructs in the future. (Maybe the information is there and I didn't see it?)

      We fully agree and thank the reviewer for this excellent suggestion. The sequences and summaries of their wrapping stability are now provided in Table 3, page 17.

      (3) I wonder whether the authors tested the C0N3 sequence used in Dendooven et al., 2023. If not, could it be tested? This would more tightly couple the functional assay shown here with the structural work.

      We did not test the CON3 sequence, which was published several years after the start of this work. We agree that a tight coupling between the functional assay and the structural work would be useful. However, we also see the advantage of being able to go beyond the structural work and include even more CEN3 sequence than has so far been possible in the structural work.  

      In addition to measuring the role of DNA sequence in Okp1/Ame1 attachment to the nucleosome, we were interested in the role of DNA sequence in the attachment of Mif2. Therefore, we included all 35 bp of the Mif2 footprint in our chimeric CCEN DNA sequence. CON3 only includes 8 bp from CDEII. We did produce stable nucleosomes using CEN3-601 from Guan et al. (see Table 3). Again, CEN3-601 only includes 8 bp of the Mif2 footprint so we opted to study nucleosomes wrapped in our CCEN DNA with the entire Mif2 footprint. Curiously we found that even the entire Mif2 footprint was not enough to find the DNA sequence specificity seen in the EMSA experiments reported by Xiao et al., 2017.

      To help readers understand the differences between all these constructs, we have included them in Table 3.

      (4) Would an AlphaFold 3 prediction of the assemblies used in this paper be feasible and useful?

      The structures of the Dam1 complex (Jenni et al., 2018), Ndc80 complex (Zahm, et al., 2023 and references therein), MIND complex (Dimitrova et al., 2016), OA complex (Dendooven et al., 2023), and the nucleosome (Xaio et al., 2017; Yan et al., 2019; Guan et al., 2021; Dendooven et al., 2023) are published. The interactions between many of these complexes are understood beyond the level that AlphaFold3 could provide (Dimitrova et al., 2016; Dendooven et al., 2023). One of the main questions is how Mif2 interacts with the nucleosome and the other components of the kinetochore. Even structural analyses that included Mif2 in the assembly detect little or no Mif2 in the final structure. Unfortunately, AlphaFold3 is also not helpful as it predicts only the structure of the dimerization domain, which was already known (Cohen et al., 2008).

      AlphaFold3 predicts the rest of Mif2 is largely unstructured with several alpha helices predicted with low confidence.

      (5) Given that the centromeric DNA piece included should be able to bind the CBF3 complex, would it be possible to add this complex and test the effect on force transmission?

      This would be an interesting experiment, and we do expect CBF3 to bind. As stated above, this is far beyond the scope of this Research Advance. In our experience, with each new kinetochore subcomplex that we add into our reconstitutions, there are new challenges purifying the subcomplex in active form and in sufficient quantity. We are eager to add CBF3 but this is not something we can pull off in the context of this Research Advance. Thank you again for the time and energy spent reviewing our manuscript

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The authors set out to analyse the roles of the teichoic acids of Streptococcus pneumoniae in supporting the maintenance of the periplasmic region. Previous work has proposed the periplasm to be present in Gram positive bacteria and here advanced electron microscopy approach was used. This also showed a likely role for both wall and lipo-teichoic acids in maintaining the periplasm. Next, the authors use a metabolic labelling approach to analyse the teichoic acids. This is a clear strength as this method cannot be used for most other well studied organisms. The labelling was coupled with super-resolution microscopy to be able to map the teichoic acids at the subcellular level and a series of gel separation experiments to unravel the nature of the teichoic acids and the contribution of genes previously proposed to be required for their display. The manuscript could be an important addition to the field but there are a number of technical issues which somewhat undermine the conclusions drawn at the moment. These are shown below and should be addressed. More minor points are covered in the private Recommendations for Authors.

      Weaknesses to be addressed:

      (1) l. 144 Was there really only one sample that gave this resolution? Biological repeats of all experiments are required.

      CEMOVIS is a very challenging method that is not amenable to numerous repeats. However, multiple images were recorded from at least two independent samples for each strain. Additional sample images are shown in a new Fig. S3.

      CETOVIS is even more challenging (only two publications in Pubmed since 2015) and was performed on a single ultrathin section that, exceptionally, laid perfectly flat on the EM grid, allowing tomography data acquisition on ∆tacL cells. The reconstructed tomogram confirmed the absence of a granular layer in the depth of the section. Additionally, the numbering of Fig. S4A-B (previously misidentified as Fig. S2A-B) has been corrected in the text of V2.

      (2) Fig. 4A. Is the pellet recovered at "low" speeds not just some of the membrane that would sediment at this speed with or without LTA? Can a control be done using an integral membrane protein and Western Blot? Using the tacL mutant would show the behaviour of membranes alone.

      We think that the pellet is not just some of the membrane but most of it. In support of this view, the “low” speed pellets after enzymatic cell lysis contain not just some membrane lipids, but most of them (Fig. S10A). We therefore expect membrane proteins to be also present in this fraction. We performed a Western blot using antibodies against the membrane protein PBP2x (new Fig. S7C). Unfortunately, no signal was detected most likely due to protein degradation from contaminant proteases that we could trace to the purchased mutanolysin. The same sedimentation properties were observed with the ∆tacL strain as shown in Fig. 6A. However, in the ∆tacL strain the membrane pellet still contains membrane-bound TA precursors. It is therefore impossible to test definitely if pneumococcal membranes totally devoid of TA would sediment in the same way.

      (3) Fig. 4A. Using enzymatic digestion of the cell wall and then sedimentation will allow cell wall associated proteins (and other material) to become bound to the membranes and potentially effect sedimentation properties. This is what is in fact suggested by the authors (l. 1000, Fig. S6). In order to determine if the sedimentation properties observed are due to an artefact of the lysis conditions a physical breakage of the cells, using a French Press, should be carried out and then membranes purified by differential centrifugation. This is a standard, and well-established method (low-speed to remove debris and high-speed to sediment membranes) that has been used for S. pneumoniae over many years but would seem counter to the results in the current manuscript (for instance Hakenbeck, R. and Kohiyama, M. (1982), Purification of Penicillin-Binding Protein 3 from Streptococcus pneumoniae. European Journal of Biochemistry, 127: 231-236).

      Thank you for this suggestion. We have tested this hypothesis by breaking cells with a Microfluidizer followed by differential centrifugation. This experiment, which requires an important minimal volume, was performed with unlabeled cells (due to the cost of reagents) and assessed by Western blot using antibodies against the membrane protein PBP2x (new Fig. S7C). In this case, the majority of the membrane material was found in the high-speed pellet, as expected.

      We also applied the spheroplast lysis procedure of Flores-Kim et al. to the labeled cells, and found that most of the labeled material sedimented at low speed (new Fig. S7B), as observed with our own procedure.

      With these new results, the section on membrane density has been removed from the Supplementary Information. Instead, the fractionation is further discussed in terms of size of membrane fragments and presence of intact spheroplasts in the notes in Supplementary Information preceding Fig. S7.

      (4) l. 303-305. The authors suggest that the observed LTA-like bands disappear in a pulse chase experiment (Fig. 6B). What is the difference between this and Fig. 5B, where the bands do not disappear? Fig. 5C is the WT and was only pulse labelled for 5 min and so would one not expect the LTA-like bands to disappear as in 6B?

      Fig. 6B shows a pulse-chase experiment with strain ∆tacL, whereas Fig. 5C shows a similar experiment with the parental WT strain. The disappearance of the LTA-like band pattern with the ∆tacL strain (Fig. 6B), and their persistence in the WT strain (Fig. 5C), indicate that these bands are the undecaprenyl-linked TA in ∆tacL and proper LTA in the WT. A sentence has been added to better explain this point in V2.

      Note that we have exchanged the previous Fig. 5C and Fig. S13B, so that the experiments of Fig. 5A and 5C are in the same medium, as suggested by Reviewer #2.

      (5) Fig. 6B, l. 243-269 and l. 398-410. If, as stated, most of the LTA-like bands are actually precursor then how can the quantification of LTA stand as stated in the text? The "Titration of Cellular TA" section should be re-evaluated or removed? If you compare Fig. 6C WT extract incubated at RT and 110oC it seems like a large decrease in amount of material at the higher temperature. Thus, the WT has a lot of precursors in the membrane? This needs to be quantified.

      Indeed, the quantification of the ratio of LTA and WTA in the WT strain rests on the assumption that the amount of membrane-linked polymerized TA precursors is negligible in this strain. This assumption is now stated in the Titration section. We think it is the case. The true LTA and TA precursors do not have exactly the same electrophoretic mobility, being shifted relative to each other by about half a ladder “step”. This difference is visible when samples are run in adjacent lanes on the same gel, as in the new Fig. 6C. The difference of migration was well documented in the original paper about the deletion of tacL, although tacL was known as rafX at that time, and the ladders were misidentified as WTA (Wu et al. 2014. A novel protein, RafX, is important for common cell wall polysaccharide biosynthesis in Streptococcus pneumoniae: implications for bacterial virulence. J Bacteriol. 196, 3324-34. doi: 10.1128/JB.01696-14). This reference was added in V2. The experiment in the new Fig. 6C was repeated to have all samples on the same gel and treated at a lower temperature. The minor effect on the amount of LTA when WT cells are heated at pH 4.2 may be due to the removal of some labeled phosphocholine. We have NMR evidence that the phosphocholine in position D is labile to acidic treatment of LTA, which may lack in some cases, as reported by Hess et al. (Nat Commun. 2017 Dec 12;8(1):2093. doi: 10.1038/s41467-017-01720-z).

      (6) L. 339-351, Fig. 6A. A single lane on a gel is not very convincing as to the role of LytR. Here, and throughout the manuscript, wherever statements concerning levels of material are made, quantification needs to be done over appropriate numbers of repeats and with densitometry data shown in SI.

      Yes indeed. Apart from the titration of TA in the WT strain, we haven’t yet carried out a thorough quantification of TA or LTA/WTA ratio in different strains and conditions, although we intend to do so in a follow-up study, using the novel opportunities offered by the method presented here.

      However, to better substantiate our statement regarding the ∆lytR strain, we have quantified two experiments performed in C-medium with azido-choline, and two experiments of pulse labeling in BHI medium. The results are presented in the additional supplementary Fig. S14. The value of 51% was a calculation error, and was corrected to 41%. Likewise, the decrease in the WTA/LTA ratio was corrected to 5 to 7-fold.

      (7) 14. l. 385-391. Contrary to the statement in the text, the zwitterionic TA will have associated counterions that result in net neutrality. It will just have both -ve and +ve counterions in equal amounts (dependent on their valency), which doesn't matter if it is doing the job of balancing osmolarity (rather than charge).

      Thank you for pointing out this point. The paragraph has been corrected in V2.

      Reviewer #2 (Public review):

      The Gram-positive cell wall contains for a large part of TAs, and is essential for most bacteria. However, TA biosynthesis and regulation is highly understudied because of the difficulties in working with these molecules. This study closes some of our important knowledge gaps related to this and provides new and improved methods to study TAs. It also shows an interesting role for TAs in maintaining a 'periplasmic space' in Gram positives. Overall, this is an important piece of work. It would have been more satisfying if the possible causal link between TAs and periplasmic space would have been more deeply investigated with complemented mutants and CEMOVIS. For the moment, there is clearly something happening but it is not clear if this only happens in TA mutants or also in strains with capsules/without capsules and in PG mutants, or in lafB (essential for production of another glycolipid) mutants. Finally, some very strong statements are made suggesting several papers in the literature are incorrect, without actually providing any substantiation/evidence supporting these claims. Nevertheless, I support the publication of this work as it pioneers some new methods that will definitively move the field forward.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) l. 55 It is stated that TA are generally not essential. This needs to be introduced in a little more detail as in several species they are collectively. Need some more references here to give context.

      We have expended the paragraph and added a selection of references in V2.

      (2) l. 63 and Fig. 1A. Is the model based on the images from this paper? Is the periplasm as thick as the peptidoglycan layer? Would you not expect the density of WTA to be the same throughout the wall, rather than less inside? Do the authors think that the TA are present as rods in the cell envelope and because of this the periplasm looks a little like a bilayer, is this so? Is the relative thickness of the layers based on the data in the paper (Table 1)?

      The model proposed in Fig. 1A is not based on our data. It is a representation of the model proposed by Harold Erickson, and the appropriate reference has been added to the figure legend in V2. We do not speculate on the relative density of WTA inside the peptidoglycan layer, at the surface or in the periplasm. The only constraint from the model is that the density of WTA in the periplasm should be sufficient for self-exclusion and allow the brush polymer theory to apply. The legend has been amended in V2.

      We indeed think that the bilayer appearance of the periplasmic space in the wild type strain, and the single layer periplasmic space in the ∆tacL and ∆lytR support the Erickson’s model. Although the model was drawn arbitrarily, it turns out that the relative thickness of the peptidoglycan and periplasmic scale is in rough agreement with the measurements reported in Table 1.

      (3) Fig. 2. It is hard to orient oneself to see the layers. The use of the term periplasmic space (l. 132) and throughout is probably not wise as it is not a space.

      We prefer to retain this nomenclature since the term periplasmic space has been used in all the cell envelope CEMOVIS publications and is at the core of Erickson’s hypothesis about these observations and teichoic acids.

      (4) L. 147. This is not referring to Fig. S2A-B as suggested but Fig. S3A-B.

      This has been corrected.

      (5) l. 148. How do you know the densities observed are due to PG or certainly PG alone? Perhaps it is better to call this the cell wall.

      Yes. Cell wall is a better nomenclature and the text and Table 1 have been corrected in V2, in accordance with Fig. 2.

      (6) l. 165. It is also worth noting that peripheral cell wall synthesis also happens at the same site so this may well not be just division.

      Yes. We have replaced “division site” by “mid-cell” in V2.

      (7) l. 214 What is the debris? If PG digestion has been successful then there will be marginal debris. Is this pellet translucent (like membranes)? If you use fluorescently labelled PG in the preparation has it all disappeared, as would be expected by fully digested and solubilised material?

      In traditional protocols of bacterial membrane preparation, a low-speed centrifugation is first performed to discard “debris” that to our knowledge have not been well characterized but are thought to consist of unbroken cells and large fragments of cell wall. After enzymatic degradation of the pneumococcal cell wall, the low-speed pellet is not translucent as in typical membrane pellets after ultracentrifugation, but is rather loose, unlike a dense pellet of unbroken cells. A description of the pellet appearance was added in V2.

      It is a good idea to check if some labeled PG is also pelleted at low-speed after digestion. In a double labeling experiment using azido-choline and a novel unpublished metabolic probe of the PG, we found that the PG was fully digested and labeled fragments migrated as a couple of fuzzy bands likely corresponding to different labeled peptides. These species were not pelleted at low speed.

      (8) l. 219. Can you give a reference to certify that the low mobility material is WTA? Why does it migrate differently than LTA? Or is the PG digestion not efficient?

      WTA released from sacculi by alkaline lysis were found to migrate as a smear at the top of native gels revealed by alcian-blue silver staining, which is incompatible with SDS (Flores-Kim, 2019, 2022). The references have be added in V2. It could be argued in this case that the smearing was due to partial degradation of the WTA by the alkaline treatment.

      Bui et al. (2012) reported the preparation of WTA by enzymatic digestion of sacculi, but the resulting WTA were without muropeptide, presumably due to a step of boiling at pH 5 used to deactivate the enzymes.

      To our knowledge, this is the first report of pneumococcal WTA prepared by digestion of sacculi and analyzed by SDS-PAGE. Since the migration of WTA in native and SDS-PAGE is similar, we hypothesize that they do not interact significantly with the dodecyl sulphate, in contrast to the LTA, which bear a lipidic moiety. The fuzziness of the WTA migration pattern may also result from the greater heterogeneity due to the attached muropeptide, such as different lengths (di-, tetra-saccharide…), different peptides despite the action of LytA (tri-, tetra-peptide…), different O-acetylation status, etc.

      (9) L. 226-227, Fig S8. Presumably several of the major bands on the Coomassie stained gel are the lysozyme, mutanolysin, recombinant LytA, DNase and RNase used to digest the cell wall etc.? Can the sizes of these proteins be marked on the gel. Do any of them come down with the material at low-speed centrifugation?

      We have provided a gel showing the different enzymes individually and mixed (new Fig. S9G). While performing several experiments of this type, we found that the mutanolysin might be contaminated with proteases. The enzymes do not appear to sediment at low speed.

      (10) Fig. S9B. It is difficult to interpret what is in the image as there appear to be 2 populations of material (grey and sometimes more raised). Does the 20,000 g material look the same?

      Fig. S10B is a 20,000 × g pellet. We agree that there appears to be two types of membrane vesicles, but we do not know their nature.

      (11) l. 277 and Fig. 5A. Why is it "remarkable" that there are apparently more longer LTA molecules as the cell reach stationary phase?

      This is the first time that a change of TA length is documented. Such a change could conceivably have consequences in the binding and activity of CBPs and the physiology of the cell envelope in general. These questions should be adressed in future studies.

      (12) l. 280. How do you know which is the 6-repeat unit?

      It is an assumption based on previous analyses by Gisch et al.( J Biol Chem 2013, 288(22):15654-67. doi: 10.1074/jbc.M112.446963). The reference was added.

      (13) Fig. 5A and C. Panel C, the cells were grown in a different medium and so are not comparable to Panel A. Why is Fig. S12B not substituted for 5B? Presumably these are exponential phase cells.

      We have interverted the Fig. S13B and 5C in V2, as suggested, and changed the text and legends accordingly.

      Reviewer #2 (Recommendations for the authors):

      L30: vitreous sections?

      Corrected in V2.

      L32: as their main universal function --> as a universal function. To show it's the main universal function, you will need to look at this across various bacterial species.

      Changed to “possible universal function” in V2.

      L35: enabled the titration the actual --> titration of the actual?

      Corrected in V2.

      L34: consider breaking up this very long sentence.

      Done in V2.

      L37: may compensate the absence--> may compensate for the absence.

      Corrected in V2.

      L45: Using metabolic labeling and electrophoresis showed --> Metabolic labeling and...

      Corrected in V2.

      L46: This finding casts doubts on previous results, since most LTA were likely unknowingly discarded in these studies. This needs to be rephrased and is unnecessarily callous. While the current work casts doubts on any quantitative assessments of actual LTA levels measured in previous studies, it does not mean any qualitative assessments or conclusions drawn from these experiments are wrong. Better would be to say: These findings suggest that previously reported quantitative assessments of LTA levels are likely underestimating actual LTA levels, since much of the LTA would have been unknowingly discarded.

      If the authors do think that actual conclusions are wrong in previous work, then they need to be more explicit and explain why they were wrong.

      Yes indeed. The statement was toned down in V2.

      L55: Although generally non-essential. I would remove or rephrase this statement. I don't think any TA mutant will survive out in the wild and will be essential under a certain condition. So perhaps not essential for growth under ideal conditions, but for the rest pretty essential.

      The paragraph was amended by qualifying the essentiality to laboratory conditions and including selected references.

      L95: Note that the prevailing model until reference 20 (Gibson and Veening) was that the TA is polymerized intracellularly (see e.g. Figure 2 of PMID: 22432701, DOI: 10.1089/mdr.2012.0026). This intracellular polymerisation model seemed unlikely according to Gibson and Veening ('As TarP is classified by PFAM as a Wzy-type polymerase with predicted active site outside the cell, we speculate that TarP and TarQ polymerize the TA extracellularly in contrast to previous reports.'), but there is no experimental evidence as far as this referee knows of either model being correct.

      Despite the lack of experimental evidence, we think that Gibson and Veening are very likely correct, based on their argument, and also by analogy with the synthesis of other surface polysaccharides from undecaprenyl- or dolichol-linked precursors. It is unfortunate that Figure 2 of PMID: 22432701, DOI: 10.1089/mdr.2012.0026 was published in this way, since there was no evidence for a cytoplasmic polymerization, to our knowledge.

      L97: It is commonly believed, although I'm not sure it has ever been shown, that the capsule is covalently attached at the same position on the PG as WTA. Therefore, there must be some sort of regulation/competition between capsule biosynthesis and WTA biosynthesis (see also ref. 21). The presence of the capsule might thus also influence the characteristics of the periplasmic space. Considering that by far most pneumococcal strains are encapsulated, the authors should discuss this and why a capsule mutant was used in this study and how translatable their study using a capsule mutant is to S. pneumoniae in general.

      A paragraph was added in the Introduction of V2 to present the complication and a sentence was added at the end of the discussion to mention that this should be studied in the future.

      L102: Ref 29 should probably be cited here as well?

      Since in Ref 29 (Flores-Kim et al. 2019) there is a detectable amount of LTA (presumably precursors TA) in the ∆tacL stain, we prefer to cite only Hess et al. 2017 regarding the absence of LTA in the absence of TacL. However, we added in V2 a reference to Flores-Kim et al. 2019 in the following paragraph regarding the role of the LTA/WTA ratio.

      L106: dependent on the presence of the phosphotransferase LytR (21). --> dependent on the presence of the phosphotransferase LytR, whose expression is upregulated during competence (21).

      Corrected in V2.

      L119: I fail to see how the conclusions drawn by other groups (I assume the authors mean work from the Vollmer, Rudner, Bernhardt, Hammerschmidt, Havarstein, Veening groups?) are invalid if they compared WTA:LTA ratios between strains and conditions if they underestimated the LTA levels? Supposedly, the LTA levels were underestimated in all samples equally so the relative WTA/LTA ratio changes will qualitatively give the same outcome? I agree that these findings will allow for a reassessment of previous studies in which presumably too low LTA levels were reported, but I would not expect a difference in outcome when people compared WTA:LTA ratios between strains?

      The sentence was rephrased in V2 to be neutral regarding previous work and rather emphasize future possibilities.

      L131: Perhaps it would be good to highlight that such a conspicuous space has been noticed before by other EM methods (see e.g. Figs.4 and 5 or ref 19, or one of the most clear TEM S. pneumoniae images I have seen in Fig. 1F of Gallay et al, Nat. Micro 2021). However, always some sort of staining had previously been performed so it was never clear this was a real periplasmic space. CEMOVIS has this big advantage of being label free and imaging cells in their presumed native state.

      Thanks for pointing out these beautiful data that we had overlooked. We have added a few sentences and references in the Discussion of V2.

      L201: References are not numbered.

      Corrected in V2.

      L271/L892: Change section title. 'Evolution' can have multiple meanings. It would be more clear to write something like 'Increased TA chain length in stationary phase cells' or something like that.

      Changed in V2.

      L275: harvested

      Corrected in V2.

      L329: add, as suggested shown previously (I guess refs 24 and 29)

      Reference to Hess et al. 2017 has been added in V2. A sentence and further references to Flores-Kim, 2019, 2022 and Wu et al. 2014 were added at the end of the discussion with respect to the LTA-like signal observed in these studies of ∆tacL strains.

      L337: I think a concluding sentence is warranted here. These experiments demonstrate that membrane-bound TA precursors accumulate on the outside of the membrane, and are likely polymerized on the outside as well, in line with the model proposed in ref. 20.

      From the point of view of formal logic, the accumulation of membrane-bound TA precursors on the outer face of the membrane does not prove that they were assembled there. They could still be polymerized inside and translocated immediately. However, since this is extremely unlikely for the reasons discussed by Gibson and Veening, we have added a mild conclusion sentence and the reference in V2.

      L343: How accurate are these quantifications? Just by looking at the gel, it seems there is much less WTA in the lytR mutant than 50% of the wild type?

      Yes, the 51% value was a calculation error. This was changed to 41%. Likewise, the decrease of the WTA amount relative to LTA was corrected to 5- to 7-fold.

      Apart from the titration of TA in the WT strain, we haven’t yet carried out a careful quantification neither of TA nor of the LTA/WTA ratio in different strains and conditions, although we intend to do so in the near future using the method presented here.

      However, to better substantiate our statement regarding the ∆lytR strain, we have quantified two experiments of growth in C-medium with azido-choline, and two experiments of pulse labeling in BHI medium. The results are presented in the additional supplementary Fig. S14.

      L342: although WTA are less abundant and LTA appear to be longer (Fig. 6A). although WTA are less abundant and LTA appear to be longer (Fig. 6A), in line with a previous report showing that LytR the major enzyme mediating the final step in WTA formation (ref. 21). (or something like that). Perhaps better is to start this paragraph differently. For instance: Previous work showed that LytR is the major enzyme mediating the final step in WTA formation (ref. 21). As shown in Fig. 6A, the proportion of WTA significantly decreased in the lytR mutant. However, there was still significant WTA present indicating that perhaps another LCP protein can also produce WTA.

      Changed in V2.

      Of note, WTA levels would be a lot lower in encapsulated strains as used in Ref. 21 (assuming WTA and capsule compete for the same linkage on PG). So perhaps it would be hard to detect any residual WTA in a encapsulated lytR mutant?

      Investigation of the relationship between TA and capsule incorporation or O-acetylation is definitely a future area of study using this method of TA monitoring.

      L371: see my comments related to L131. Some TEM images clearly show the presence of a periplasmic space.

      Comments and references have been added in V2.

      L402: It would be really interesting to perform these experiments on a wild type encapsulated strain. Would these have much more LTA? (I understand you cannot do these experiments perhaps due to biosafety, but it might be interesting to discuss).

      Yes. It would be interesting to compare the TA in D39 and D39 ∆cps strains. We have added this perspective at the end of the discussion in V2.

      L418: ref lacks number

      Corrected in V2.

      L423: refs missing.

      References added in V2.

      L487: See my comments regarding L46. I do not see one valid point in the current paper why underestimating LTA levels would change any of the conclusions drawn in Ref. 21. I do not know the other papers cited well enough, but it seems highly unlikely that their conclusions would be wrong by systematically underestimating LTA levels. As far as I understand it, this current work basically confirms the major conclusions drawn by these 'doubtful' papers (that TacL makes LTA and LytR is the main WTA producer). As such, I find this sentence highly unfair without precisely specifying what the exact doubts are. Sure, this current paper now shows that probably people have discarded unknowingly LTA and therefore underestimated LTA levels, so any quantitative assessment of LTA levels are probably wrong. That is one thing. But to say this casts doubts on these studies is very serious and unfair (unless the authors provide good arguments to support these serious claims).

      Yes indeed. The sentence was rephrased to be strictly factual in V2.

      Table 2: I assume these strains are delta cps? Would be relevant to list this genotype.

      The Table 2 was completed in V2.

      The authors should comment on why the mutants have not been complemented, especially for lytR as it's the last gene in a complex operon. It would be great to see WTA levels being restored by ectopic expression of LytR.

      Yes. We think this could be part of an in-depth study of the attachment of WTA, together with the investigation of the other LCP phosphotransferases.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Joint Public Review:

      Summary:

      The behavioral switch between foraging and mating is important for resource allocation in insects. This study characterizes the role of sulfakinin and the sulfakinin receptor 1 in changes in olfactory responses associated with foraging versus mating behavior in the oriental fruit fly (Bactrocera dorsalis), a significant agricultural pest. This pathway regulates food consumption and mating receptivity in other species; here the authors use genetic disruption of sulfakinin and sulfakinin receptor 1 to provide strong evidence that changes in sulfakinin signaling modulate antennal responses to food versus pheromonal cues and alter the expression of ORs that detect relevant stimuli.

      Strengths:

      The authors utilize multiple complementary approaches including CRISPR/Cas9 mutagenesis, behavioral characterization, electroantennograms, RNA sequencing and heterologous expression to convincingly demonstrate the involvement of the sulfakinin pathway in the switch between foraging and mating behaviors. The use of both sulfakinin peptide and receptor mutants is a strength of the study and implicates specific signaling actors.

      Weaknesses:

      The authors demonstrate that SKR is expressed in olfactory neurons, however there are additional potential sites of action that may contribute to these results.

      Recommendations for the authors:

      The authors have addressed most of the issues raised by the reviewers. Below are a few outstanding issues.

      (1) Lines 68-69 describe "control of B. dorsalis include the use of the behavioral responses to semiochemicals" but does not describe what these responses are or how behavior is modulated.

      The sentence was revised as “Control of B. dorsalis include the use of the reproductive and feeding behavioral responses to semiochemicals” (lines 69 in the revision).

      (2) Statistical analysis for 9 hour starved females at 5 minutes is missing in Figure 1D and S1.

      We had added statistical analysis for 9 hour starved females at 5 minutes in the revised Figures 1D and S1, respectively (lines 578).

      (3) The legend in Figure S2 should be revised as it is not clear from the figure which of the odors are food associated odors.

      As suggested, we added food odor label in the revised Figure S2 (lines 666).

      (4) Line 167: "Therefore, the upregulated OR genes in starved WT flies, OR7a.4, OR7a.8 and OR10a, were activated by the pheromonal components, while down regulated genes, OR49a and OR63a, were activated by food volatiles." Based on the data, this sentence is incorrect - Therefore, the upregulated OR genes in starved WT flies, OR7a.4, OR7a.8 and OR10a, were activated by the food components, whereas downregulated genes, OR49a and OR63a, were activated by pheromonal components."

      We are sorry for our mistake. We had corrected it (lines 168-169).

      (5) Line 192: "The coordinated action of sulfakinin on mutiple downstreams,..." should be revised to "downstream pathways or tissues" or simply removing "multiple downstream".

      As suggested, we removed “multiple downstream”. See line 192.

      (6) Reference formatting is inconsistent: see line 207 vs line 208.

      We had corrected it as “(Wu et al., 2019)” (lines 207). 

      (7) Lines 241-244 The broad discussion regarding the evolution and ancestral function of CCK here and the phylogeny in Figure S6 are peripheral to the authors claims.

      As suggested, we removed the section and the Figure S6 in the revision.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This research article by Nath et al. from the Lee Lab addresses how lipolysis under starvation is achieved by a transient receptor potential channel, TRPγ, in the neuroendocrine neurons to help animals survive prolonged starvation. Through a series of genetic analyses, the authors identify that TRPγ mutations specifically lead to a failure in lipolytic processes under starvation, thereby reducing animals' starvation resistance. The conclusion was confirmed through total triacylglycerol levels in the animals and lipid droplet staining in the fat bodies. This study highlights the importance of transient receptor potential (TRP) channels in the fly brain to modulate energy homeostasis and combat metabolic stress. While the data is compelling and the message is easy to follow, several aspects require further clarification to improve the interpretation of the research and its visibility in the field.

      Strengths:

      This study identifies the biological meaning of TRPγ in promoting lipolysis during starvation, advancing our knowledge about TRP channels and the neural mechanisms to combat metabolic stress. Furthermore, this study demonstrates the potential of the TRP channel as a target to develop new therapeutic strategies for human metabolic disorders by showing that metformin and AMPK pathways are involved in its function in lipid metabolisms during starvation in Drosophila.

      Weaknesses:

      Some key results that might strengthen their conclusions were left out for discussion or careful explanation (see below). If the authors could improve the writing to address their findings and connect their findings with conclusions, the research would be much more appreciated and have a higher impact in the field.

      Here, I listed the major issues and suggestions for the authors to improve their manuscript:

      (1) Are the increased lipid droplet size and the upregulated total TAG level measured in the starved or sated mutant in Figure 1? This information might be crucial for readers to understand the physiological function of TRP in lipid metabolism. In other words, clarifying whether the upregulated lipid storage is observed only in the starved trp mutant will advance our knowledge of TRPγ. If the increase of total TAG level is only observed in the starved animals, TRP in the Dh44 neurons might serve as a sensor for the starvation state required to promote lipolysis in starvation conditions. On the other hand, if the total TAG level increases in both starved and sated animals, activation of Dh44 through TRPγ might be involved in the lipid metabolism process after food ingestion.

      We measured total TAG level in Figure 1 and LD sizes in Figure 2 under sated condition. We inserted “under sated condition” to clarify it. lines 97 and 147-148.

      Thanks for your suggestions.

      (2) It is unclear how AMPK activation in Dh44 neurons reduces the total triacylglycerol (TAG) levels in the animals (Figure 3G). As AMPK is activated in response to metabolic stress, the result in Figure 3G might suggest that Dh44 neurons sense metabolic stress through AMPK activation to promote lipolysis in other tissues. Do Dh44 neurons become more active during starvation? Is activation of Dh44 neurons sufficient to activate AMPK in the Dh44 neurons without starvation? Is activation of AMPK in the Dh44 neurons required for Dh44 release and lipolysis during starvation? These answers would provide more insights into the conclusion in Lines 192-193.

      In our previous study, we demonstrated that trpγ mutants exhibited lower levels of glucose, trehalose and glycogen level (Dhakal et al. 2022), and in the current study, we observed excessive lipid storage in the trpγ mutant, indicating imbalanced energy homeostasis. Given the established role of AMPK in maintaining energy balance (Marzano et. al., 2021, Lin et al 2021), we employed the activated form of AMPK (UAS-AMPK<sup>TD</sup>) in our experiments. Our result showed that expression of activated AMPK in Dh44 neurons led to a reduction in total TAG levels, suggesting that AMPK activation in these neurons can promote lipolysis even in the absence of starvation. Regarding the activation of Dh44 neurons, Dus et al in 2015 reported that Dh44 cells in the brain are activated by nutritive sugars especially in starvation conditions. In addition, another report showed a role of Dh44 neuron in regulating starvation induced sleep suppression (Oh et. al., 2023) which may imply that these neurons become more active under starved conditions. We did not directly assess whether Dh44 neuron activity increases during starvation or whether AMPK activation in these neurons is required for DH44 release and subsequent lipolysis, our finding support the notion that AMPK activation in Dh44 neuron is sufficient to reduce TAG levels, potentially by metabolic stress response typically observed during starvation. We explained it like the following: “Dh44 neurons regulate starvation-induced sleep suppression (Oh et. al., 2023), which implies that these neurons become more active under starved conditions.” lines 190-191.

      (3) It is unclear how the lipolytic gene brummer is further downregulated in the trpγ mutant during starvation while brummer is upregulated in the control group (Figure 6A). This result implies that the trpγ mutant was able to sense the starvation state but responded abnormally by inhibiting the lipolytic process rather than promoting lipolysis, which makes it more susceptible to starvation (Figure 3B).

      Thanks for your suggestions. We explained it like the following: “The data indicates that the trpg mutant can sense the starvation state but responds abnormally by suppressing lipolysis instead of activating it. This dysregulated lipolytic response likely increases the mutant's vulnerability to starvation, as it cannot effectively mobilize lipid stores for energy during periods of nutrient deprivation.” lines 251-254.

      (4) There is an inconsistency of total TAG levels and the lipid droplet size observed in the Dh44 mutant but not in the Dh44-R2 mutant (Figures 7A and 7F). This inconsistency raises a possibility that the signaling pathway from Dh44 release to its receptor Dh44-R2 only accounts for part of the lipid metabolic process under starvation. Adding discussion to address this inconsistency may be helpful for readers to appreciate the finding.

      Thanks for your suggestion. We included the following in the Discussion: “There is an inconsistency of total TAG levels and the LD size observed in the Dh44 mutant. This inconsistency raises a possibility that the signaling pathway from DH44 release to its receptor DH44R2 only accounts for part of the lipid metabolic process under starvation. While Dh44 mutant flies displayed normal internal TAG levels, Dh44R2 mutant flies exhibited elevated TAG levels. This suggested that the lipolysis phenotype could be facilitated by a neuropeptide other than DH44. Alternatively, a DH44 neuropeptide-independent pathway could mediate the lipolysis.” lines 429-436.

      Reviewer #2 (Public Review):

      Summary:

      In this paper, the function of trpγ in lipid metabolism was investigated. The authors found that lipid accumulation levels were increased in trpγ mutants and remained high during starvation; the increased TAG levels in trpγ mutants were restored by the expression of active AMPK in DH44 neurons and oral administration of the anti-diabetic drug metformin. Furthermore, oral administration of lipase, TAG, and free fatty acids effectively restored the survival of trpγ mutants under starvation conditions. These results indicate that TRPv plays an important role in the maintenance of systemic lipid levels through the proper expression of lipase. Furthermore, authors have shown that this function is mediated by DH44R2. This study provides an interesting finding in that the neuropeptide DH44 released from the brain regulates lipid metabolism through a brain-gut axis, acting on the receptor DH44R2 presumably expressed in gut cells.

      Strengths:

      Using Drosophila genetics, careful analysis of which cells express trpγ regulates lipid metabolism is performed in this study. The study supports its conclusions from various angles, including not only TAG levels, but also fat droplet staining and survival rate under starved conditions, and oral administration of substances involved in lipid metabolism.

      Weaknesses:

      Lipid metabolism in the gut of DH44R2-expressing cells should be investigated for a better understanding of the mechanism. Fat accumulation in the gut is not mechanistically linked with fat accumulation in the fat body. The function of lipase in the gut (esp. R2 region) should be addressed, e.g. by manipulating gut-lipases such as magro or Lip3 in the gut in the contest of trpγ mutant. Also, it is not clarified which cell types in the gut DH44R2 is expressed. The study also mentioned only in the text that bmm expression in the gut cannot restore lipid droplet enlargement in the fat body, but this result might be presented as a figure.

      We appreciate the reviewer’s insightful suggestions. Unfortunately, due to the unviability of the reagent (UAS-Lip3), we were unable to manipulate gut lipase in trpy mutants as proposed. However, we additionally performed immunostaining to examine the co-expression of trpγ and Dh44R2 in the gut, and our results indicate that both trpγ and Dh44R2 are co-expressed in the R2 region of the gut (Figure 7O and P). Furthermore, we have updated our figures to address the point that bmm expression in the gut does not restore lipid droplet enlargement in the fat body, with the revised version (Figure 5I and J).

      Reviewer #3 (Public Review):

      In this manuscript, the authors demonstrated the significance of the TRPγ channel in regulating internal TAG levels. They found high TAG levels in TRPγ mutant, which was ascribed to a deficit in the lipolysis process due to the downregulation of brummer (bmm). It was notable that the expression of TRPγ in DH44+ PI neurons, but not dILP2+ neurons, in the brain restored the internal TAG levels and that the knockdown of TRPγ in DH44+ PI neurons resulted in an increase in TAG levels. These results suggested a non-cell autonomous effect of Dh44+PI neurons. Additionally, the expression of the TRPγ channel in Dh44 R2-expressing cells restored the internal TAG levels. The authors, however, did not provide an explanation of how TRPγ might function in both presynaptic and postsynaptic cells in the non-cell autonomous manner to regulate the TAG storage. The authors further determined the effect of TRPγ mutation on the size of lipid droplets (LD) and the lifespan and found that TRPγ mutation caused an increase in the size of LD and a decrease in the lifespan, which were reverted by feeding lipase and metformin. These were creative endeavors, I thought. The finding that DH44+ PI neurons have non-cell autonomous functions in regulating bodily metabolism (mainly sugar/lipid) in addition to directing sugar nutrient sensing and consumption is likely correct, but the paper has many loose ends. I would like to see a revision that includes more experiments to tighten up the findings and appropriate interpretations of the results.

      (1) The authors need to provide interpretations or speculations as to how DH44+ PI neurons have non-cell autonomous functions in regulating the internal TAG stores, and how both presynaptic DH44 neurons and postsynaptic DH44 R2 neurons require TRPγ for lipid homeostasis.

      In Discussion, we had mentioned our previous finding. “ We previously proposed that TRPg holds DH44 neurons in a state of afterdepolarization, thus reducing firing rates by inactivating voltage-gated Na+ channels (Dhakal et al., 2022). At the physiological level, this induces the consistent release of DH44 and depletion of DH44 stores, resulting in nutrient utilization and storage malfunctions.”

      We also included the following: “TRPg in DH44 neurons may influence the release of metabolic signals or hormones that act on postsynaptic DH44R2 cells. These postsynaptic cells could, in turn, modulate lipid storage and metabolism in a non-cell autonomous manner. However, the mechanism by which TRPg functions in DH44R2 cells remains unclear. One possible explanation is that TRPg in the gut may be activated by stretch or osmolarity (Akitake et al. 2015).” lines 439-440.

      This interaction between presynaptic and postsynaptic cells may ensure a coordinated response to metabolic changes and maintain lipid homeostasis. Thus, both Dh44-expressing and Dh44-R2-expressing cells are crucial for the proper functioning of TRPγ in regulating internal TAG levels and lipid storage.

      (2) The expression of TRPγ solely in DH44 R2 neurons of TRPγ mutant flies restored the TAG phenotype, suggesting an important function mediated by TRPγ in DH44 R2 neurons. However, the authors did not document the endogenous expression of TRPγ in the DH44R2+ gut cells. This needs to be shown.

      We appreciate the reviewer’s suggestion. To address this, we performed immunostaining to examine the expression of TRPγ in the DH44R2+ gut cells. Our results, as shown in Figure 7 O and P, confirm that TRPγ is co-expressed in the Dh44R2+ cells in the gut. We also found that Dh44R2 is expressed in the brain as well. We documented this part like the following: “Given that Dh44R2 is predominantly expressed in the intestine, we performed immunostaining to examine whether Dh44R2 co-localizes with trpg in gut cells. Our results confirmed that Dh44R2 and trpg are co-expressed in intestinal cells (Figure 7O and P). Additionally, we analyzed Dh44R2 expression in the brain and found that two Dh44R2-expressing cells are co-localized with Dh44-expressing cells in the PI region (Figure 7Q). To further delineate whether Dh44R2-mediated fat utilization is specific to the brain, gut, or fat body, we knocked down Dh44R2<sup>RNAi</sup> using Dh44-GAL4, myo1A-GAL4, and cg-GAL4, respectively (Figure 7–figure supplement 1E). Notably, knockdown of Dh44R2 with Myo1A-GAL4 resulted in elevated TAG levels, indicating that DH44R2 activity in lipid metabolism is specific to the gut.” lines 375-384.

      (3) While Dh44 mutant flies displayed normal internal TAG levels, Dh44R2 mutant flies exhibited elevated TAG levels (Figure 7A). This suggested that the lipolysis phenotype could be facilitated by a neuropeptide other than Dh44. Alternatively, a Dh44 neuropeptide-independent pathway could mediate the lipolysis. In either case, an additional result is needed to substantiate either one of the hypotheses.

      The Dh44 mutant flies exhibited normal TAG levels, whereas Dh44R2 mutant flies showed elevated TAG levels. However, when we examined the lipid droplets in the fat body, both Dh44 mutant and Dh44R2 mutant flies displayed larger lipid droplets, indicating a disruption in lipid metabolism. Additionally, we assessed starvation survival time and found that both Dh44 and Dh44R2 mutant flies exhibited reduced survival under starvation conditions compared to controls. Supplementation with lipase (Figure 7–figure supplement 1A), glycerol (Figure 7–figure supplement 1B), hexanoic acid (Figure 7–figure supplement 1C), and mixed TAGs (Figure 7–figure supplement 1D) improved starvation survival time, further supporting that the lipid metabolism pathway was impaired in both mutants. These observations highlight the role of Dh44 in regulating lipolysis. We included related Discussion: “There is an inconsistency of total TAG levels and the LD size observed in the Dh44 mutant. This inconsistency raises a possibility that the signaling pathway from DH44 release to its receptor DH44R2 only accounts for part of the lipid metabolic process under starvation. While Dh44 mutant flies displayed normal internal TAG levels, Dh44R2 mutant flies exhibited elevated TAG levels. This suggested that the lipolysis phenotype could be facilitated by a neuropeptide other than DH44. Alternatively, a DH44 neuropeptide-independent pathway could mediate the lipolysis.” lines 429-436.

      (4) While the authors observed an increased area of fat body lipid droplets (LD) in Dh44 mutant flies (Figure 7F), they did not specify the particular region of the fat body chosen for measuring the LD area.

      We have chosen the 2-3 segment in the abdomen for all fat body images, which we already mentioned in Nile red staining in the Method section line 630-631.

      (5) The LD area only accounts for TAG levels in the fat body, whereas TAG can be found in many other body parts, including the R2 area as demonstrated in Figure 5A-D using Nile red staining. As such, measuring the total internal TAG levels would provide a more accurate representation of TAG levels than the average fat body LD area.

      We have measured total internal TAG level in whole body throughout the experiments (Figure 1F, 2C, 2E, 3C, 3G, 4A, 4B, 7A, 7I, and many Supplementary Figures) except bmm expression using GAL4/UAS system. Now we include this new data in Figure 5–figure supplement 1) which is the same conclusion with LD analysis.

      (6) In Figure 5F-I, the authors should perform the similar experiment with Dh44, Dh44R1, and Dh44R2 mutant flies.

      We did the experiments with Dh44, Dh44R1, and Dh44R2 mutant flies and we found that Dh44 and Dh44R2 mutant flies showed reduced starvation survival time than control and which was increased after supplementation of lipase, glycerol, hexanoic acid and TAG (Figure 7– figure supplement 1A–D). lines 361-372.

      (7) The representative image in Figure 6B does not correspond to the GFP quantification results shown in Figure 6C. In trpr1;bmm::GFP flies, the GFP signal appears stronger in starved conditions than in satiated conditions.

      We updated it with new images. We quantified GFP intensity level using image J and found that GFP intensity level was significantly lower in starved condition in trpγ<sup>1</sup>;bmm::GFP flies than sated condition.

      (8) In Figure 6H-I, fat body-specific expression of bmm reversed the increased LD area in TRPγ mutants. The authors also showed that Dh44+PI neuron-specific expression of bmm yielded a similar result. The authors need to provide an interpretation as to how bmm acts in the fat body or DH44 neurons to regulate this.

      We first inserted the following in results: “Furthermore, the expression of bmm in the fat body, as well as Dh44 neurons in the PI region, can promote lipolysis at the systemic level.” lines 276-277.

      Additionally, we discussed it in the Discussion: “Brummer lipase is essential for regulating lipid levels in the insect fat body by mediating lipid mobilization and energy homeostasis. In Nilaparvata lugens, it facilitates triglyceride breakdown (Lu et al., 2018), while studies in Drosophila show that reduced Brummer lipase expression decreases fatty acids and increases diacylglycerol levels, highlighting its role in lipid metabolism (Nazario-Yepiz et al., 2021). Here, we additionally demonstrate that bmm expression in DH44 neurons within the PI region can systemically regulate TAG levels. Cell signaling or energy status in DH44 neurons may contribute to hormonal release that targets organs such as the fat body.” lines 451-459.

      (9) The authors should explain why the DH44 R1 mutant did not represent similar results as the wild type.

      We added “In addition, bmm levels in Dh44R1<sup>Mi</sup> under starved condition did not increase as significantly as in the control. This suggests a unique role of DH44 and its receptors in regulating lipid metabolism and response to nutritional status in Drosophila.” lines 358-360.

      (10) It would be good to have a schematic that represents the working model proposed in this manuscript.

      We updated the schematic model in revised version (Figure 8).

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      This paper characterized the function of trpγ in Dh44-expressing PI neurons for lipid metabolism and lipolysis induced by prolonged starvation. The authors applied a series of lipolytic genetic manipulation and lipid/lipid metabolism supplements to rescue the trpγ deficits in lipolysis: the expression of active AMPK in the DH44-expressing PI neurons or brummer, a lipolytic gene, in the trpγ-expressing cells, and oral administration of the anti-diabetic drug metformin, lipase, TAG and free fatty acids. Despite this exhaustive characterization of the defective lipolysis in the trpγ mutants, there remain puzzles in inconsistent defects of Dh44 and DH44R2 in the total TAG levels and in the expression and functions of the receptor in the gut. Clarification of these points and other issues raised by the reviewers should improve the mechanisms of lipid metabolism through Dh44 signalling.

      Reviewer #1 (Recommendations For The Authors):

      (1) It might be worth introducing Dh44 in the introduction section as it is unclear to readers how the authors hypothesized the site-of-action of TRPγ in Dh44 neurons for lipid metabolism after reading the introduction.

      We introduced the following: “We found that TRPg expression in Dh44 neuroendocrine cells in the brain is critical for maintaining normal carbohydrate levels in tissues (Dhakal et al. 2022). Building on this, we hypothesized that TRPg in Dh44 cells also regulates lipid and protein homeostasis.” lines 69-71.

      (2) Providing a summary model in the end to integrate the present findings and their previous publication about TRPγ functions in Drosophila sugar selection would greatly help readers understand and appreciate the general role of TRPγ in balancing energy homeostasis.

      We made a schematic model in Figure 8.

      (3) Swapping the order of Figures 5 and 6 might be a better way to tell the story without logic gaps. The results addressing the mechanisms of metformin and TRPγ in promoting lipolysis under starvation are interrupted by the lipid storage data in the R2 cells in the current Figure 5A-5E. In addition, presenting Figure 5A-5E before or together with Figure 7 will help readers appreciate the expression of Dh44-R2 and its function in regulating lipid metabolism in Figure 7.

      We did.

      (4) It might be misleading to use the word "sated" for the condition of 5-hour mild starvation. The word "mild starvation" or the equivalents might be a better word choice.

      We appreciate the reviewer’s concern. As hemolymph sugar level does not drop down significantly in 5 hr starvation, the previous papers (Dus et al 2015, Dhakal et al 2022) indicated it as sated condition. To use the word consistently, we prefer using “sated” instead of “mild starvation”.

      (5) It is unclear what the white arrows are pointing at in Figures 7O and 7P. Some of those seem to be non-specific signals, so it is hard to connect the figure to the conclusion in Lines 351-353. It would be helpful to add some explanations to help readers interpret Figures 7O and 7P.

      In the previous version, Figure 7O and 7P white arrows represented the expression of Dh44R2 in the SEZ region of the brain and R2 region of the gut. In revised version, to make clear, we performed additional immunostaining for the co-expression of trpγ and Dh44R2 in the gut. We found that trpγ and Dh44R2 co-expressed at the R2 region of the gut specifically (Figure 7O and P). Similarly, we found that two cells of Dh44R2 co-expressed in Dh44 cells in the PI region of the brain (now Figure 7Q). We updated this part. lines 375-380.

      (6) The figure legend for the (G) panel in Figure 2-figure Supplement 1 was mislabeled as (F).

      We corrected it.

      (7) In Line 85, the authors might want to write "… among these mutants, only trpγ mutant displayed reduced carbohydrate levels, suggesting …". Please confirm the information for the sentence. lines 87-88.

      We clarified it.

      Reviewer #2 (Recommendations For The Authors):

      (1) The trpγ[G4] would be difficult for non-Drosophila researchers to understand; it would be better to use trpγ-Gal4.

      We got the mutant line from Dr. Craig Montell who named it. We explained it like the following in the main text: “controlled by GAL4 knocked into the trpg locus (trpg<sup>G4</sup> flies; +)” line 109.

      (2) The arrows in Figures 7O and 7P need to be explained in the figure legends.

      We did.

      Reviewer #3 (Recommendations For The Authors):

      (11) Lines 95-96 should have a reference.

      We did.

      (12) Lines 129-130: It should read "TRPγ expressed in DH44 cells is sufficient for the regulation of lipid levels."

      We changed it as suggested.

      (13) Figure 5E needs to be repeated with more trials.

      We increased the n numbers. Previously (Figure 5E) we included area of 10 LDs from 3 samples, and in revised figure (Figure 6I) we have included 28 LDs from 10 samples.

      (14) Figures 5F-I, bold lines are not too visible and therefore, dotted lines could be used.

      We changed it as suggested.

      (15) Line 356: It is not true that D-trehalose or D-fructose is commonly detected by DH44 neurons. These sugars at concentrations much higher than the physiological concentration range stimulate DH44 neurons (see Dus et al., 2015).

      We removed it.

      (16) Lines 362-363: It should read "Expression of TRPγ in DH44 neurons was necessary and sufficient to regulate the carbohydrate and lipid levels.".

      We changed it.

      (17) Lines 369-370: The authors need to consider removing the possible role of CRF in regulating lipid homeostasis. It could be considered to be far-fetched.

      We removed it.

      (18) Line 407-408: the sentence "Nevertheless, it is also known that DH44 neurons mediate the influence of dietary amino acids on promoting food intakes in flies (37)" needs to be removed. They used amino acid concentrations that were far greater than the physiological levels observed in the internal milieu of flies. Still, many laboratories cannot reproduce the result of using the high AA concentrations.

      We removed it.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (public review): 

      This manuscript presents SAVEMONEY, a computational tool designed to enhance the utilization of Oxford Nanopore Technologies (ONT) long-read sequencing for the design and analysis of plasmid sequencing experiments. In the past few years, with the improvement in both sequencing length and accuracy, ONT sequencing is being rapidly extended to almost all omics analyses which are dominated by short-read sequencing (e.g., Illumina). However, relatively higher sequencing errors of long-read sequencing techniques including PacBio and ONT is still a major obstacle for plasmid/clone-based sequencing service that aims to achieve single base/nucleotide accuracy. This work provides a guideline for sequencing multiple plasmids together using the same ONT run without molecular barcoding, followed by data deconvolution. The whole algorithm framework is well-designed, and some real data and simulation data are utilized to support the conclusions. The tool SAVEMONEY is proposed to target users who have their own ONT sequencers and perform library preparation and sequencing by themselves, rather than relying on commercial services. As we know and discussed by the authors, in the real world, to ensure accuracy, the researchers will routinely pick up multiple colonies in the same plasmid construction and submit for Sanger sequencing. However, SAVEMONEY is not able to support the simultaneous analysis of multiple colonies in the same run, as compared to the barcoding-based approaches. This is a major limitation in the significance of this work. Encouraging computational ePorts in ONT data debarcoding for mixed-plasmid or even single-cell sequencing would be more valuable in the field. 

      We thank the reviewer for the positive response to our manuscript and the helpful comments.

      The tool SAVEMONEY is proposed to target users who have their own ONT sequencers and perform library preparation and sequencing by themselves, rather than relying on commercial services.

      We apologize that we were not clear enough in the manuscript. Our tool is designed for users who rely on commercial services (i.e., those who cannot include a barcode by themselves). However, it can also benefit those performing library preparation, as SAVEMONEY can be applied after standard barcode-based sequencing and de-multiplexing. The combination of standard barcodes with SAVEMONEY would significantly expands the scope of sequencing applications. For example, it would enable sequencing of more plasmid types than the number of available barcodes and, in some cases, it may even eliminate the need for barcode introduction. Because we do not own ONT equipment and because the primary target audience for the SAVEMONEY algorithm are users without ONT equipment, we were not able to conduct experiments using ONT. However, to clarify these possibilities, we added a dedicated paragraph describing these issues (3rd paragraph in the discussion section).

      However, SAVEMONEY is not able to support the simultaneous analysis of multiple colonies in the same run, as compared to the barcoding-based approaches.

      We agree with the reviewer about this limitation of SAVEMONEY, as it does not allow mixing of plasmids from multiple colonies in the same cloning run. However, that does not necessarily mean that SAVEMONEY cannot reduce sequencing costs in cloning. For example, when sequencing two colonies from each of three diPerent constructs (six plasmids in total), the standard approach would require sequencing costs for six samples. However, with SAVEMONEY, up to three plasmids can be mixed per sample, allowing them to be sequenced as just two samples. As a result, the sequencing cost per plasmid is reduced to one-third. The greatest benefits can be realized when SAVEMONEY is used at the laboratory level or by multiple researchers. To make this point clearer, we have added sentences in the 5th paragraph of the discussion section.

      (1) To provide more comprehensive information for users who care about the cost, the Introduction section should include a cost comparison between Sanger and ONT, with more details, such as diPerent ONT platforms (MinION, PromethION, FlongIe), chemistries (flow cells) and kits. This additional information will be more helpful and informative for the users who have their own sequencers and are the target audience for SAVEMONEY. 

      We thank the reviewer for pointing this out. Since we do not own ONT equipment, we are unable to provide a total cost for using the ONT platform. However, we have included the price per sample (~$15 per plasmid) for the commercial service we have used, as well as the equipment that they employ (V14 chemistry on a PromethION with an R10.4.1 flow cell) and the number of reads obtained per plasmid (~100–1000) in the 4th paragraph of the introduction section.     Though these costs will inevitably change over time, this information should still be helpful for those who own ONT sequencers in estimating the costs.

      (2) In "Overview of the algorithm" (Pages 3-4) under the Results section, instead of stating "However, coverage varies from ~100-1000 and is diPicult to predict because each nanopore flow cell has diPerent properties.", it will be beneficial to provide more detailed information, such as sequencing length, yield/read count per flow cell of diPerent platforms. This information will assist users in designing their own experiments ePectively. 

      We thank the reviewer for the comment. As mentioned in the previous response, we are unable to provide sequencing length, yield/read count per flow cell because we do not own ONT equipment. However, we apologize if it was not clear in "Overview of the algorithm" section that we are discussing the use of results obtained from commercial services, and therefore we need to provide more detailed information about the results from the commercial service. We have now clarified in the sentence pointed out by the reviewr that the numbers are derived from the information provided by commercial sequencing services. In addition, we have also added that typical examples of the result properties, i.e., read length and quality score distribution, can be found in Fig. 2 at the end of the same paragraph.

      (3) While this study optimized and evaluated the tool using a total of 14 plasmids, it may not provide suPicient power to represent the diversity of the plasmid world. Consideration should be given to expanding the dataset to include a broader range of plasmids in future studies to enhance the robustness and generalizability of the tool. 

      We are grateful to the reviewer for their valuable input. It is very reasonable that we had to expect that a larger number of plasmids should be used, even though the main target of SAVEMONEY is those who utilize commercial services. In the previous version of SAVEMONEY, it was not possible to process in a reasonable amount of time if too many plasmids were provided, though the algorithm itself does not have no restrictions based on the number of plasmids. Therefore, we have changed the underlying code to improve the algorithm, making it more than 20 times faster than the previous version (the benchmark time mentioned in the 3rd paragraph of the discussion section was improved to 3.1 minutes from the previous 65 minutes, using the same dataset and the same computer). Additionally, SAVEMONEY is now compatible with multiprocessing. The processing time is expected to decrease approximately inversely proportional to the number of CPU cores used. We have added these updates at the end of the 3rd paragraph in the discussion section.

      (4) If applicable and feasible, including a comparison or benchmark of SAVEMONEY against other similar tools would further strengthen the manuscript. This comparison would allow users to evaluate the advantages and disadvantages of diPerent tools for their specific needs. 

      We thank the reviewer for the suggestion. We have added the benchmark using the similar tool, On-Ramp, with the exact same set of plasmids and FASTQ data used for our benchmark (4th paragraph in the discussion section). Because the machine specifications used in the On-Ramp web server are unknown, a direct comparison is not possible. However, using only laptop-level computational resources, SAVEMONEY was able to process the data 38% faster than On-Ramp. When using mini-PC level computational resources, the processing time was 64% faster than on-RAMP.

      (5) The importance of pre-filtering raw sequencing reads should be emphasized as noisy reads can significantly impact the overall performance of the tool. It is essential to clarify whether any pre-filtering steps were performed in this study, such as filtering based on quality scores, read length, or other relevant factors. 

      We apologize for not being clear. Unfortunately, the commercial sequencing service we used did not provide the information regarding pre-filtering. However, the impact of the quality of pre-filtering based on quality score and read length on the quality of the final results is theoretically minimal in SAVEMONEY. First, during the initial step of the post-analysis, the classification step, short reads compared to the full plasmid length can be excluded based on the user-defined “score_threshold”. Simultaneously, low-quality reads with poor alignment to the plasmid can also be excluded, because “score_threshold” is related to the normalized alignment score. Even if there are low-quality reads that are not excluded at this stage, the ePect can be minimized during the final step of the post-analysis that generates consensus sequences. This is because our Bayesian analysis considers not only the base calling but also the q-scores to determine the consensus. Therefore, we believe the overall impact of pre-filtering on the final results is negligible.

      (6) The statement regarding the number of required reads per plasmid (20-30) and the maximum number of plasmids (up to six) that can be mixed in a single run may become outdated due to the rapid advancements in ONT technology. In the Discussion section, instead of assuming specific numbers, it would be more beneficial to provide information based on the current state of ONT sequencing, such as the number of reads per MinION flow cell that can be produced.

      We thank the reviewer for pointing this out. Because the number of required reads per plasmid depends on the accuracy of each read (i.e., the number of required reads can be reduced if the accuracy increases), we have added the description of these points to the last paragraph of the discussion section.

      Reviewer #2 (public review):  

      The authors developed an algorithm that allows for deconvoluting of plasmid sequences from a mixture of plasmids that have been sequenced by nanopore long read technology. As library preparations and barcoding of individual samples increase sequencing costs, the algorithm bypasses this need and thus decreases time on sample prep and sequencing costs. In the first step, the tool assesses which of the plasmid constructions can be mixed in a single library preparation by calculating a distance matrix between the reference plasmid and the constructions producing sequence clusters. The user is given groups of plasmids, from diPerent clusters, to be pooled together for sequencing. After sequencing, the algorithm deconvolutes the reads by classifying them based on alignments to the reference sequence. A Bayesian analysis approach is used to obtain a consensus sequence and quality scores. 

      Strengths 

      The authors exploit one of the main advantages of long-read sequencing which is to accurately resolve regions of high complexity, as regularly found in plasmids, and developed a tool that can validate plasmid constructions by reducing sequencing costs. Multiple plasmids (up to six) can be analyzed simultaneously in a single library without the need for sample barcoding, also reducing sample preparation time. Although inserts must be diPerent, just 2 bases diPerence would be enough for a correct assignation. It maximizes cost-ePiciency for projects that require large amounts of plasmid constructions and highthroughput validation. 

      We thank the reviewer for the positive response to our manuscript and the helpful comments.

      Weaknesses 

      The method proposed by the authors requires prior knowledge of plasmid sequences (i.e., blueprints or plasmid reference) and is not suitable for small experiments. The plasmid inserts or backbones must be diPerent e.g., multiple colonies from the same plasmid construction ePort cannot be submitted together. 

      As also discussed in the response to reviewer 1, we agree with the reviewer that SAVEMONEY does not allow you the analysis of plasmids from multiple colonies in the same cloning experiment. However, that does not necessarily mean that SAVEMONEY cannot reduce the sequencing cost. For example, when sequencing two colonies from each of three diPerent constructs (six plasmids in total), the standard approach would require sequencing costs for six samples. However, with SAVEMONEY, up to three plasmids can be mixed per sample, allowing them to be sequenced as just two samples. As a result, the sequencing cost per plasmid is reduced to one-third. The greatest benefits can be realized when SAVEMONEY is used at the laboratory level or by multiple researchers. To make this point clearer, we have added sentences in the 5th paragraph of the discussion section.

      The reviewer also expressed concern that SAVEMONEY is not suitable for experiments at a small scale. To put it more precisely, SAVEMONEY cannot be used when the experiment size is minimal, such as in a lab that consistently constructs only a single plasmid at a time. That said, the strength of SAVEMONEY lies in its scalability. Even in labs where plasmid construction is typically limited to one at a time, there may be occasional instances where two or more plasmids are created simultaneously. In such cases, SAVEMONEY can be used to reduce sequencing costs. Moreover, in a typical molecular biology lab where multiple plasmids are constructed every week, SAVEMONEY can be particularly ePective. Given its adaptability and cost-saving potential and widespread use since its initial publication on bioRxiv and on Google Colab, we are confident that SAVEMONEY will continue to be a valuable tool for a wide range of researchers.

      Recommendations For The Authors:

      Reviewer #2 (Recommendations For The Authors): 

      The manucript assumes all samples are sent out for sequencing at a specific company. This could be generalized for a much broader use since many labs now own nanopore sequencers. In turn, the advantage of reducing hands-on sample prep becomes more evident. 

      We thank the reviewer for pointing this out. We agree that SAVEMONEY can also benefit those performing library preparation. Combination of standard barcodes with SAVEMONEY significantly expands the scope of sequencing applications. For example, it enables sequencing of more plasmid types than the number of available barcodes and, in some cases, may even eliminate the need for the sample prep step to introduce barcode. Because we do not own ONT equipment, we could not conduct experiments using ONT. However, to clarify these possibilities, we added a dedicated paragraph (3rd paragraph in the discussion section).

      The base calling model (high accuracy, super accuracy) used by Plasmidsaurus and tested here should be mentioned.  

      We thank the reviewer for the suggestion. The description about the base calling model (HAC) was added in Materials and Methods section.

      Other modifications to the revised manuscript 

      Beyond changes made in response to reviewer comments above, we have also through our continued use and improvement of SAVEMONEY, made additional changes to the algorithm and therefore to the manuscript. Those changes are outlined below. Improvements in the pre-survey step

      (1) The pre-survey algorithm was reduced to a Zero-One Integer Linear Programming Problem to guarantee the optimal combinations, as previous versions did not ensure an optimal solution. Relatedly, the explanation of the algorithm in the main manuscript was updated.

      (2) The algorithm was modified to ensure that the number of plasmids distributed to each group is balanced. A new feature was also added to allow users to specify the number of groups, which is beneficial when balancing between cost and quality.

      (3) An error was corrected in Fig. 2, where the distance calculation method for the hierarchical clustering step for group formation was Farthest Point Algorithm, which calculates distance between two clusters based on the farthest pair of plasmids. The correct method is the Nearest Point Algorithm. This error was present only in Fig. 2, while other implementations, including source code of SAVEMONEY and Google Colab page, were correct from the beginning. We have corrected the error in Fig. 2.

      Modifications in figures, manuscripts, and other aspects

      (1) Fig. 3 was updated to reflect the update of SAVEMONEY, although it did not show any important diPerences.

      (2) Parameter names were updated as follows:

      “threshold (pre)” -> “distance_threshold”

      “threshold (post)” -> “score_threshold” Added “number_of_groups”

      (3) The order of elements was rearranged in Fig. 4.

      (4) Incorrect calculations were fixed in Fig. 4g, h, and i (old Fig. 4d, h, and l). Related to that, Fig. 4j, k, and l and Table 1 were added, in addition to the explanation in the main manuscript.

      (5) SAVEMONEY was packaged and was released on PyPI to facilitate easy installation and integration by other developers.

      (6) SAVEMONEY was updated and expanded to accommodate linear DNA fragments, such as PCR amplicons and long synthetic DNA. Users can select the topology of DNA by specifying that as an option. A description of this new capability was added at the end of “Overview of the algorithm” section.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1:

      (…) some concerns with interpretations and technical issues make several major conclusions in this manuscript less rigorous, as explained in detail in comments below. In particular, the two major concerns I have: 1) the contradiction between the strong reduction of global translation, with puromycin incorporation gel showing no detectable protein synthesis in cold, and an apparently large fraction of transcripts whose abundance and translation in Fig. 2A are both strongly increased. 2) The fact that no transcripts were examined for dependance on IRE-1/XBP1 for their induction by cold, except for one transcriptional reporter, and some weaknesses (see below) in data showing activation of IRE-1/XBP-1 pathway. The conclusion for induction of UPR by cold via specific activation of IRE-1/XBP-1 pathway, in my opinion, requires additional experiments.

      Relating to the first point, the results of puromycin incorporation and ribosome profiling are not contradictory. The former shows absolute changes in translation, i.e. changes in how much protein the cell is producing, while the latter shows relative changes between the produced proteins, i.e. how the cell prioritizes its protein production. An observed up-regulation in ribosome profiling does not necessarily mean (but could) that the corresponding protein goes up in absolute terms (units produced per time). Instead, it implies that out of the population of all translating ribosomes, a larger fraction is translating (prioritizing) this particular mRNA relative to other mRNAs. The second point is addressed later in the response.

      Major concerns:

      (1) Fig. 1B shows polysomes still present on day 1 of 4ºC exposure, but the gel in Fig. 1C suggests a complete lack of protein synthesis. Why?

      We realized that the selected gel exposure may give the false impression of a complete lack of puromycin incorporation at 4ºC. To avoid confusion, we now show in Figure 1 – figure supplement 1 the original gel image next to its longer exposure. The quantification of puromycin incorporation remains in Fig. 1C (it is based on 3 biological replicates and only one replicate is shown in the corresponding supplement). We hope it is now clear that there is an ongoing puromycin incorporation/translation at 4ºC, albeit much reduced compared with 20ºC.

      What is then the evidence that ribosomal footprints used in much of the paper as evidence of ongoing active translation are from actual translating rather than still bound to transcripts but stationary ribosomes, considering that cooling to 4ºC is often used to 'freeze' protein complexes and prevent separation of their subunits? The authors should explain whether ribosome profiling as a measure of active translation has been evaluated specifically at 4ºC, or test this experimentally.

      While the ribosomal profiling alone might not prove ongoing translation, the residual puromycin incorporation does (see the longer gel exposure in Figure 1 – figure supplement 1). To strengthen this argument, we selected two additional genes (cebp-1 and numr-1) whose ribosomal footprints increase in the cold, and whose GFP-fusions were available from the CGC. Monitoring their expression, we observed the expected increase in the cold (see Figure 2 – figure supplement 3 A-B). The ongoing translation in the cold is also in line with our previous study (Peke et al., 2022), where we observed de novo protein synthesis of other proteins under the same cooling conditions as in this study.

      They should also provide some evidence (like Western blots) of increases in protein levels for at least some of the strongly cold-upregulated transcripts, like lips-11.

      As explained above, we addressed it by additionally examining two strains expressing GFP-fused proteins, whose translation in the cold is predicted to increase according to our ribosomal profiling data. See the new Figure 2 – figure supplement 3 A-B.

      As puromycin incorporation seems to be the one direct measure of global protein synthesis here, it conflicts with much of the translation data, especially considering that quite a large fraction of transcripts have increased both mRNA levels and ribosome footprints, and thus presumably increased translation at 4ºC, in Fig. 2A.

      We hope the above explanations put this concern to rest.

      Also, it is not clear how quantitation in Fig. 1C relates to the gel shown, the quantitation seems to indicate about 50-60% reduction of the signal, while the gel shows no discernable signal.

      A above, see a longer western blot exposure in Figure 1 – figure supplement 1 and note that the quantification is based on three biological replicates.

      (2) It is striking that plips-11::GFP reporter is induced in day 1 of 4ºC exposure, apparently to the extent that is similar to its induction by a large dose of tunicamycin (Fig. 3 supplement),

      We did not intend to compare the extend of induction between cold and tunicamycin treatment. The tunicamycin experiment was meant to confirm that, as suggested by expression data from Shen et al. 2005, lips-11 is upregulated upon UPR activation.

      …but the three IRE-1 dependent UPR transcripts from Shen 2005 list were not induced at all on day 1 (Fig. 4 supplement). Moreover, the accumulation of the misfolded CPL-1 reporter, that was interpreted as evidence that misfolding may be triggering UPR at 4ºC, was only observed on day 1, when the induction of the three IRE-1 targets is absent, but not on day 3, when it is stronger. How does this agree with the conclusion of UPR activation by cold via IRE-1/XBP-1 pathway?

      In the originally submitted supplemental figure, we compared mRNA levels between day 1 animals at 20ºC versus 4ºC. However, as argued later by this reviewer, it may be better to use day 0 animals at 20ºC as the reference (since at 20ºC the animals will continue producing embryos). Thus, we repeated the RT-qPCR analysis with additional time points (and genes relevant to other comments). This analysis, now in Figure 4 – figure supplement 2, shows that these mRNAs (dnj-27, srp-7, and C36B7.6) increased already at day 1 in the cold compared with the reference 20ºC animals on day 0, and their levels increased further on day 3.

      It is true that the authors do note very little overlap between IRE-1/XBP-1-dependent genes induced by different stress conditions, but for most of this paper, they draw parallels between tunicamycin-induced and cold induced IRE-1/XBP-1 activation.

      We carefully re-examined the manuscript to ensure that we do not draw parallels between cold and tunicamycin treatment. The three genes (dnj-27, srp-7, and C36B7.6) were taken from Shen et al. because that study reported lips-11 as an IRE-1-responsive gene, which we realized thanks to the Wormbase annotation of lips-11. Examining the three genes in our expression data, srp-7 (like lips-11) is also upregulated more than 2-fold, while the other two genes go up but less than 2-fold. As mentioned by the reviewer, we note little overlap between the different stress conditions suggesting that the response is context dependent. Additional differences may arise if, as we hypothesize, UPR is activated in the cold in response to both protein and lipid stress. Note that the 2-fold cutoff used in the previous Figure 7 – figure supplement 1 was (erroneously) on the log2 scale, so showed genes upregulated at least 4-fold. We now corrected it to 2-fold. While there are now a few more overlapping genes, the overall conclusion, that there is little overlap between different conditions, did not change. We now list the shared genes in the new Supplementary file 5.

      The conclusion that "the transcription of some cold-induced genes reflects the activation of unfolded protein response (UPR)..." is based on analysis of only one gene, lips-11. No other genes were examined for IRE-1 dependence of their induction by cold, neither the other 8 genes that are common between the cold-induced genes here and the ER stress/IRE-1- induced in Shen 2005 (Venn diagram in Figure 7 supplement), nor the hsp-4 reporter. What is the evidence that lips-11 is not the only gene whose induction by cold in this paper's dataset depends on IRE-1? This is a major weakness and needs to be addressed.

      Furthermore, whether induction by cold of lips-11 itself is due to IRE1 activation was not tested, only a partial decrease of reporter fluorescence by ire-1 RNAi is shown. A quantitative measure of the change of lips-11 transcript in ire-1 and xbp-1 mutants is needed to establish if it depends on IRE-1/XBP-1 pathway.

      We now examined by RT-qPCR if the induction of the three genes from Shen at al. (dnj-27, srp-7, and C36B7.6), as well as lips-11 and hsp-4 depends on IRE-1. In the new Figure 4 – figure supplement 2, we show that the upregulation of all these genes is reduced in the cold in the ire1 mutant (although in the wild type, the increase of hsp-4 mRNA appeared to be non-significant, despite the observed upregulation of the hsp-4 GFP reporter).

      The authors could provide more information and the additional data for the transcripts upregulated by both ER stress and cold, including the endogenous lips-11 and hsp-4 transcripts: their identity, fold induction by both cold and ER stress, how their induction is ranked in the corresponding datasets (all of these are from existing data), and do they depend on IRE-1/XBP-1 for induction by cold?

      As above, the dependence of endogenous lips-11 and hsp-4 on IRE-1 is now shown in the new Figure 4 – figure supplement 2, and the shared genes from Figure 7 – figure supplement 1 are listed in the new Supplementary file 5. We did not perform additional analysis comparing various data sets, as we felt that understanding the differences between IRE-1-mediated transcription outputs across different conditions goes well beyond this study.

      Without these additional data and considering that the authors did not directly measure the splicing of xbp-1 transcript (see comment for Fig. 3 below), the conclusion that cold induces UPR by specific activation of IRE-1/XBP-1 pathway is premature.

      To address the splicing of endogenous xbp-1, we examined our ribosome profiling data for the translation of spliced xbp-1, and found that the spliced variant is more abundant in the cold. This data is now shown in Figure 3 – figure supplement 2B.

      There are also technical issues that are making it difficult to interpret some of the results, and missing controls that decrease the rigor of conclusions:

      (1) For RNAseq and ribosome occupancy, were the 20ºC day 1 adult animals collected at the same time as the other set was moved to 4ºC, or were they additionally grown at 20ºC for the same length of time as the 4ºC incubations, which would make them day 2 adults or older at the time of analysis? This information is only given for SUnSET: "animals were cultivated for 1 or 3 additional days at 4ºC or 20ºC".

      In the RNAseq experiments, the 20ºC animals were collected at the same time as the others were moved to 10ºC (and then 4ºC), so they were not additionally grown at 20ºC. We make it now clear in Methods.

      This could be a major concern in interpreting translation data: First, the inducibility of both UPR and HSR in worms is lost at exactly this transition, from day 1 to day 2 or 3 adults, depending on the reporting lab (for example Taylor and Dillin 2013, Labbadia and Morimoto, 2015, De-Souza et al 2022).

      As explained above, the 20ºC animals were collected at the same time as the others were moved to 4ºC. Then, we reported before that ageing appears to be suppressed in animals incubated at 4ºC (Habacher et al., 2016; Figure S1C). Thus, it terms of their biological age, cold-incubated animals appear to be closer to the 20ºC animals at the time they are moved to the cold (day 0). Thus, the ageing-associated deterioration in UPR inducibility mentioned above presumably does not apply to cold-incubated animals, which is in line with the observed IRE-1-dependent upregulation of several genes in day 3 animals at 4ºC.

      How do authors account for this? Would results with reporter induction, or induction of IRE-1 target genes in Fig. 4, change if day 1 adults were used for 20ºC?

      Our analysis in Figure 4 – figure supplement 2 now includes 20ºC animals at day 0, 1, and 3.

      Second, if animals at the time of shift to 4ºC were only beginning their reproduction, they will presumably not develop further during hibernation, while an additional day at 20ºC will bring them to the full reproductive capacity. Did 4ºC and 20ºC animals used for RNAseq and ribosome occupancy have similar numbers of embryos, and were the embryos at similar stages?

      As explained above, the reference animals at 20ºC were young adults containing few embryos. Indeed, at 4ºC the animals do not accumulate embryos. Although we cannot say that for all genes, note that the genes analysed in Figure 4 – figure supplement 2 increase in abundance also when compared with the day 3 animals kept at 20ºC.

      (2) Second, no population density is given for most of the experiments, despite the known strong effects of crowding (high pheromone) on C. elegans growth. From the only two specifics that are given, it seems that very different population sizes were used: for example, 150 L1s were used in survival assay, while 12,000 L1s in SUnSET. Have the authors compared results they got at high population densities with what would happen when animals are grown in uncrowded plates? At least a baseline comparison in the beginning should have been done.

      None of the experiments involved crowded populations. In the SUnSET experiments, we just used larger and more plates to obtain sufficient material.

      (3) Fig. 3: it is unclear why the accepted and well characterized quantitative measure of IRE1 activation, the splicing of xbp-1transcript, is not determined directly by RT-PCR. The fluorescent XBP-1spliced reporter, to my knowledge, has not been tested for its quantitative nature and thus its use here is insufficient. Furthermore, the image of this fluorescent reporter in Fig. 3b shows only one anterior-most row of cells of intestine, and quantitation was done with 2 to 5 nuclei per animal, while lips-11 is induced in entire intestine. Was there spliced XBP-1 in the rest of the intestinal nuclei? Could the authors show/quantify the entire animal (20 intestinal cells) rather than one or two rows of cells?

      As explained above, we now included the analysis of xbp-1 splicing in Figure 3 – figure supplement 2B. As for the fluorescent reporter, it is difficult to measure all gut nuclei since part of the gut is occluded by the gonad. Nonetheless, we do see induction of the reporter in other gut nuclei and show now additional examples from midgut in Figure 3 – figure supplement 2A.  

      (4) The differences in the outcomes from this study and the previous one (Dudkevich 2022) that used 15ºC to 2ºC cooling approach are puzzling, as they would suggest two quite different IRE-1 dependent programs of cold tolerance. It would be good if authors commented on overlapping/non-overlapping genes, and provided their thoughts on the origin of these differences considering the small difference in temperatures.

      Indeed, there seem to be substantial differences between different temperatures and cooling paradigms. While understanding the C. elegans responses to cold is still in its infancy, one possible explanation for the observed differences is that we used different starting growth temperatures. While the initial populations in our study were grown at 20ºC, Dudkevich et al. used 15ºC. Worms display profound physiological differences between these two temperatures. For example, Xiao et al. (2013) showed that the cold-sensitive TRPA-1 channel is important at 15ºC but not 20ºC. Thus, the trajectories along which worms adapt to near freezing temperature may vary depending on their initial physiological state (and perhaps the target temperature, as we used 4ºC and they 2ºC). We now expanded argumentation on this topic in Discussion. I should also say that we planned on testing NLP-3 function in our paradigm, but our request for strains remained unanswered.

      Second, have the authors performed a control where they reproduced the rescue by FA supplementation of poor survival of ire-1 mutants after the 15ºC to 2ºC shift? Without this or another positive control, and without measuring change in lipid composition in their own experiments, it is unclear whether the different outcomes with respect to FAs are due to a real difference in adaptive programs at these temperatures, or to failure in supplementation?

      While we did not re-examine the findings by Dudkevich et al., we did include now another positive control. As reporter by Hou et al. (2014), supplementing unsaturated FAs rescues the induction of the hsp-4 reporter in fat-6 RNAi-ed animals. Although we were able to reproduce that result (Figure 6 – figure supplement 1), the same supplementation procedure did not suppress the lips11 reporter (Figure 6 – figure supplement 2).

      (5) Have the authors tested whether and by how much ire-1(ok799) mutation shortens the lifespan at 20ºC? This needs to be done before the defect in survival of ire-1 mutants in Fig. 7a can be interpreted.

      The lifespan at standard cultivation temperature was examined by others (Henis-Korenblit et al., 2010; Hourihan et al., 2016), showing that ire-1(ok799) mutants live shorter. However, while some mechanism that prolong lifespan may also improve cold survival, the two phenomena are not identical and whether IRE-1 facilitates longevity and cold survival in the same or different way remains to be seen.

      Reviewer #2:

      (1) The conclusions regarding a general transcriptional response are based on one gene, lips-11, which does not affect survival in response to cold. We would suggest altering the title, to replace "Reprograming gene expression: with" Regulation of the lipase lips-11".

      We now examined IRE-1 dependent induction of additional genes – see Figure 4 – figure supplement 2. While we do not know what fraction of cold-induced genes depends on IRE-1, we feel that our findings justify the statement that that gene expression in the cold involves the IRE1/XBP-1 pathway (title) or that that the transcription of some/a subset of cold-induced genes depend on this pathway (in abstract, model, and discussion).

      (2) There is no gene ontology with the gene expression data.

      We now included the top 10 most enriched and suppressed gene categories between 10ºC and 4ºC (since the biggest change happens between these conditions, as shown in Figure 2 – figure supplement 1A). This is now included in the Figure 2 – figure supplement 2.

      (3) Definitive conclusions regarding transcription vs translational effects would require use of blockers such as alpha amanatin or cyclohexamide.

      As explained also for reviewer 1, we confirmed now that at least some genes, whose translation is upregulated based on the ribosome profiling, are indeed upregulated in the cold at the protein level (Figure 2 – figure supplement 3A-B). Thus, the increase in ribosomal occupancy seems to accurately reflect increased translation. Since mRNA levels correlate overall with the ribosomal occupancy, it appears that the mRNA levels are the main determinants of the translation output. Because the lips-11 promoter is sufficient to upregulate the GFP reporter in the cold, it further suggests that the regulation happens at the transcription level. It is true that at this point we cannot completely rule out the effects of mRNA stability, which we clearly acknowledge in the discussion.

      (4) Conclusions regarding the role of lipids are based on supplementation with oleic acid or choline, yet there is no lipid analysis of the cold animals, or after lips-1 knockdown.

      We agree that this is an important direction for future studies but feel that lipidomic analysis goes beyond the scope of current work.

      Although choline is important for PC production, adding choline in normal PC could have many other metabolic impacts and doesn't necessarily implicate PC without lipidomic or genetic evidence.

      We agree and acknowledge it now in Discussion: “However, choline also plays other roles, including in neurotransmitter synthesis and methylation metabolism. Thus, we cannot yet rule out the possibility that the protective effects of choline supplementation stem from functions outside PC synthesis.”

      Reviewer #3:

      The study has several weaknesses: it provides limited novel insights into pathways mediating transcriptional regulation of cold-inducible genes, as IRE-1 and XBP-1are already well-known responders to endoplasmic reticulum stress, including that induced by cold.

      We presume the reviewer refers to the study by Dudkevich et al. (2022). As explained in our manuscript, there are important differences between that study and ours in how the IRE-1 signalling is utilized and to what ends.

      Additionally, the weak cold sensitivity phenotype observed in ire-1 mutants casts doubt on the pathway's key role in cold adaptation. The study also overlooks previous research (e.g.PMID: 27540856) that links IRE-1 to SKN-1, another major stress-responsive pathway, potentially missing important interactions and mechanisms involved in cold adaptation.

      We state in the manuscript that the IRE-1 pathway plays a modest but significant role in cold adaptation and state in the Fig. 7 model and Discussion that additional pathways work alongside IRE-1 to drive cold-specific gene expression.

      Recommendations for the authors:

      Reviewer #1:

      Minor comments:

      (1) Fig. 2B - reporter expression seems to be already present in the intestine of 20ºC animals. What is the turnover rate of GFP in the intestine and how is it affected by the temperature shift? If GFP degradation is inhibited, could it explain the increase in signal in 4ºC animals, rather than increased transcription? This seems to be true for the hsp-4 transcriptional reporter, as the GFP fluorescence appears to increase during 4ºC incubation (Fig. 4a), but the hsp-4 message levels are only increased after 1 day but not in later days at 4ºC, based on the RNAseq in provided dataset. How well do changes in lips-11 reporter fluorescence correspond to the changes in the endogenous lips-11 transcript?

      Note that increased GFP fluorescence is accompanied by increased mRNA levels. In addition to the RNAseq data, we now also examined changes of the endogenous lips-11 transcript by RTqPCR and observed its strong (and IRE-1 dependent) upregulation in the cold– see Figure 4 – figure supplement 2. Moreover, we now included two other examples of GFP-tagged proteins whose fluorescence increases in the cold, concomitant with increased mRNA levels and ribosomal occupancy (Figure 2 – figure supplement 2A-B).

      (2) Descriptions of methods to measure different aspects of translation are very abbreviated and in some places make it difficult to understand the paper. One example - what is RFP in Fig. 2a?

      We replaced now “RFP” with “RPF” (ribosome protected fragment) and the abbreviation is explained firsts time it is used.

      (3) How was the effectiveness of RNAi at 4ºC validated?

      As explained in Methods, we subjected animals to RNAi long before they were transferred to 4ºC, so the corresponding protein is depleted prior to cooling.

      (4) Several of the conclusions on translation and ribosomal occupancy are written in a somewhat confusing way. For example, the authors state that "shift from 10ºC to 4ºC had a strong effect" when describing "impact on translation (ribosomal occupancy)" (page 4), but in the next sentence, they state "a good correlation between mRNA levels and translation (Figure 2A)". Was ribosomal occupancy normalized to the transcript abundance?

      We do not perceive any discrepancy between the two statements. The former refers to the difference between time points, where we observed the largest change in both the transcriptome and ribosomal occupancy from 10ºC to 4ºC (as can be inferred in the PCA plot in Figure 2 - figure supplement 1). The latter refers to the observation that changes in mRNA levels mirrored, in most of cases, similar changes in the ribosomal occupancy.

      The ribosomal occupancy was not normalized, as that would essentially normalize the y-axis (ribosomal occupancy) with the x-axis (mRNA), and so express changes in “translational efficiency” as a function of changes in mRNA abundance. While this type of analysis can also reveal interesting biological phenomena, it would explore a different question.

      (5) "For most transcripts ... increased the abundance of a particular protein appears to correlate depend primarily on the abundance of its mRNA" (page 5). This is an overstatement, the protein levels were not quantified.

      As explained above, we now additionally monitored the expression of two GFP-tagged proteins (CEBP-1 and NUMR-1). Monitoring their expression, we observed the expected increase in GFP fluorescence in the cold (see Figure 2 – figure supplement 3 A-B). While we did not examine them also by western blot, these observations are in line with our conclusions.

      (6) The statement "Since transcription is the main determinant of mRNA levels, these results suggest that cold-specific gene expression primarily depends on transcription activation" seems to assume that message degradation doesn't have much of an impact at 4ºC. What is the evidence here? The authors themselves later suggest either transcription or mRNA stability in Discussion.

      While we cannot exclude that mRNA stability of some genes may be affected, this concern is more valid for the messages that go down in the cold. Although we have done it for only selected genes, each time we observed an increase in the mRNA levels, we also observed the corresponding increase in the protein; this study and Pekec et al. (2022). Then, the lips-11 reporter was designed to monitor the activity of its promoter, which we showed in sufficient to upregulate reporter GFP in the cold. We have now expanded the corresponding paragraph in Discussion, which will hopefully come across as more balanced.  

      Reviewer #2:

      (1) Alter title, conclusions to better reflect specific nature of the work.

      We now provided additional data and feel that it justifies our conclusions and title.

      (2) Use Gene Ontology searches to look at patterns of gene expression in RNA seq data.

      We now show it in Figure 2 – figure supplement 2.

      (3) Use genetic or lipidomic tools rather than solely adding exogenous lipids.

      We agree that lipidomic analysis is an important direction for future research, but feel that lipidomic analysis and further genetic experiments go beyond the scope of current manuscript.

      Reviewer #3:

      To strengthen the evidence for the role of IRE-1 in cold adaptation, the authors might consider performing additional functional assays, such as testing the effects of IRE-1 and XBP-1 mutations under varying cold conditions and testing the genetic interaction of ire-1 with xbp-1, skn-1, and hsf-1 in cold sensitivities. It is also worth using alternative approaches such as independent alleles of ire-1, knockdowns or tissue-specific knockouts (without potential developmental compensation in global constitutive mutants) to better characterize the contribution of IRE-1 to cold adaptation. Additionally, studies that examine tissue-specific responses to cold exposure could provide important insights, as different tissues may utilize distinct molecular pathways to adapt to cold stress.

      We also tested ire-1 and xbp-1 functions by RNAi-mediated depletion. SKN-1 is a good candidate for future studies, but Horikawa at al. (2024) showed that HSF-1 is not required for cold dormancy (at 4ºC); we also show now that HSF-1::GFP does not increase in the cold (Figure 2 – figure supplement 3C).

      This reviewer also recommends clarifying the novelty of your findings in the context of existing literature, particularly regarding the established roles of IRE-1 and XBP-1 in responding to endoplasmic reticulum stress.

      The entry point of this study was to clarify a long-standing problem in hibernation research, i.e., the apparent discrepancy between a global translation repression and de novo gene expression observed in the cold. By connecting cold-mediated expression of some genes to the IRE-1/XBP1 pathway, we strengthen the argumentation for transcription-mediated gene regulation in hibernating animals. We did go the extra mile to test the possible reason behind the activation of UPR<sup>ER</sup> in the cold but feel that a deeper analysis deserves a separate study.

      The term "hibernation" should be avoided or reworded since the study does not provide direct behavioral or physiological evidence for hibernation-like states; instead, the manuscript could refer to "cold-induced responses" or "adaptations to cold temperatures."

      The term “hibernation” was used before even in the context of the C. elegans dauer state, which, arguably, is even less appropriate. In addition to a global suppression of translation shown here, we reported before that the same cooling regime suppresses ageing (Habacher et al., 2016; Figure S1C). Incubating at 4ºC also arrests C. elegans development (Horikawa et al., 2024). Thus, while the worm and mammalian hibernation are certainly not equivalent – which we clearly spell out – we like to use “hibernation” interchangeably with “cold dormancy” to draw attention to a fascinating aspect of C. elegans biology. Still, we use now quotation marks in the title to avoid misunderstanding.

      The discussion could be strengthened by addressing the relevance of prior studies, such as those linking IRE-1 to SKN-1 (PMID: 27540856), TRPA-1 (PMID: 23415228), ZIP-10 (PMID: 29664006), HSF-1 (PMID: 38987256) in cold adaptation and elaborating on how your findings provide new

      The IRE-1/SKN-1 and ZIP-10 papers are now mentioned when describing the model in Figure 7. The TRP-1 and HSF-1 papers are cited when discussing physiological differences between different cold temperatures. Consistent with our studies, the HSF-1 paper shows that nematodes enter a dormant state at 4ºC (but at 9ºC and higher temperatures continue developing). Importantly, HSF-1 promotes the development at 9ºC but is not important for the arrest at 4ºC. We also shown now in Figure 2 – figure supplement 3C that HSF-1 does not go up at 4ºC.

    1. Author response:

      Reviewer #1 (Public Review):

      (1) The authors conclude that the committed progenitors revert to GSCs based on the coexpression of nanos2 and foxl2l nanos2 and based on expression of id1 in mutants but not in WT. Without functional data demonstrating that the progenitors revert to an earlier state, alternative interpretations should be considered. For example, it is possible that the cells initiate the committed progenitor program but continue to express the GSC program and that the coexpression of both programs blocks differentiation.

      Thanks for your insightful comment. We have explored possible alternative interpretations of our data. Regarding the suggested possibility of a continued GSC program in the mutant, we have examined the expression of GSC markers including nanos2 in the mutant at different stages. We found that in the mutant, nanos2 or other GSC markers were not significantly upregulated in GSC-to progenitor transition (G-P) and early progenitors (Prog-E) (Fig. 4B). The expression of these GSC markers was also low in the integrated clusters I4-I6 when G-P and Prog-E stages were prominent (Fig. 3D and Fig. 3E). GSC marker nanos2 was high only in mutant Prog-C. These results argue against continued GSC programs in the foxl2l mutants. Another possible explanation is that perhaps some mutant Prog-C acquires some GSC property with the upregulation of nanos2 instead of a continuous GSC program. We have now clarified our rationale about mutant cells gaining new GSC properties and included both interpretations in the Result.

      Consistent with this possibility, some Fox family members, FoxL2 and FoxPs for example, are known to be both activators and repressors of transcription or act primarily as repressors. Potentially relevant to this work, repressive activity of FoxL2 has been previously reported in the mammalian ovary (Pisarska et al Endocrinology 2004, Pisarska Am J. Phys Endo. Metabolism 2010, Kuo Reproduction 2012, Kuo Endocrinology 2011, as well as more recent publications). In that context interfering with FoxL2 was proposed to cause upregulated expression of genes normally repressed by FoxL2, accelerated follicle recruitment, and premature ovarian failure.

      FoxL2 exerts both activating and repressive activities. We believe that Foxl2l can also activate and repress its target gene expression. Although its target genes have not been clearly identified, Foxl2l may activate genes involved such process as oogenic meiosis, and may also repress other genes involved in other processes, say perhaps nanos2.

      (2) The authors conclude that the committed progenitor stage is "the gate toward female determination" and that the cells "stay at S-Phase temporarily before differentiation". This conclusion seems to be based solely on single cell RNAseq expression. In several species, including zebrafish, meiotic entry occurs earlier in females and has been correlated with ovary development. The possibility that the late progenitor stage, the stage when meiotic genes are detected in this study and a stage missing in foxl2l mutants, is actually the key stage for female determination cannot be excluded by the data provided.

      We agree that Prog-L is important for the initiation of female meiosis. We have made revision in the text to point out the importance of Prog-L in female differentiation.

      (3) The authors discuss prior working showing that loss of germ cells leads to male development and that germ cells are required for female development and claim to extend that work by showing here that some progenitors are already sexually differentiated. First, the stages compared are completely different. The earlier work looks at the primordial germ cells and their loss in the first few days of development before a gonad forms. In contrast, this work examines stages well after the gonad has formed and during sex determination.

      Both previous studies and our study indicate the important role of germ cells in zebrafish sex differentiation during gonadal development. The earlier works show that the abundance of primordial germ cells contributes to sex differentiation. Our current finding further suggests the existence of female identify in some germ cells at the juvenile stage and discusses the importance of cell in sexual differentiation. We have added the developmental age in our study to emphasize the age difference.

      The second concern is that the conclusion that the progenitors are differentiated is based solely on the expression of foxl2l, which is initially expressed in the juvenile ovary state that lab strains have been shown to develop through (Wilson et al Front Cell Dev Bio 2024). While it is fair to state that some cells express ovary markers at this stage, it is unclear that this is sufficient evidence that the cells are differentiated.

      The conclusion about the differentiation of progenitors is not based solely on foxl2l expression; rather, it is according to the whole transcriptomic profiles of both WT (Figure 1B) and foxl2l mutant cells (Figure 3A) as well as the foxl2l mutant phenotype (Figure 2C). Three types of progenitors, Prog-E, Prog-C and Prog-L were identified by whole transcriptomic analysis in WT. In foxl2l mutants, the transcriptomic profile further shows that Prog-L and meiotic cells are completely lost, and all germ cells undergo male differentiation eventually. These results together indicate that the differentiation of Prog-C to Prog-L guides the progenitor toward female differentiation. Our result also showed that in the juvenile gonad, foxl2l expression is high in two types of progenitors, Prog-C and Prog-L, and become low after meiotic entry.

      For example, in the context of the foxl2l mutant, the authors observe that GSCs and early progenitors inappropriately express foxl2l, but the mutants develop as males. Thus, expression of foxl2l transcripts alone is insufficient evidence to claim that the cells are already differentiated as female.

      The foxl2l mutants develop into males because they lack functional Foxl2l. Although the mutated foxl2l transcript is present in mutant cells, these transcripts are not functional. These mutants develop into males eventually. This result is consistent with our claim that functional Foxl2l is important for the development of Prog-L and female differentiation.

      (4) The comparison between medaka and zebrafish foxl2l mutants seems to suggest that Foxl2l is required for meiosis in medaka but has a different role in zebrafish. However, if foxl2l represses the earlier developmental programs of GSCs and early progenitors, it is possible that continued expression of these early programs interferes with activation of meiotic genes. This could account for the absence of the late progenitor stage in foxl2l mutants since the late progenitor stage is defined by and distinguished from the earlier stages by expression of foxl2l and meiotic genes. If so, foxl2l may be similarly required in both systems.

      Medaka and zebrafish Foxl2l may share similar functions such as the stimulation of meiotic gene expression and promotion of oogenesis in the female germ cells preparing for meiotic entry. In addition, we also detected aberrant upregulation of nanos2 in some foxl2l mutant cells. The idea of “continued expression of these early programs interferes with activation of meiotic genes” is conceivable, but for now we have no evidence for it. We do not know whether the absence of meiotic genes is due to an interference caused by the activation of nanos2 or due to the complete loss of Prog-L and meiotic cells. It will also be interesting to find out whether medaka Foxl2l has a role in early progenitors

      (5) The authors state that "Foxl2l may ensure female differentiation by preventing stemness and antagonizing male development." It is unclear why suppressing stemness would be necessary for female differentiation since female zebrafish have stem cells as do male zebrafish. It seems likely that turning off the GSC and early differentiation programs is important for allowing expression of meiosis and oocyte differentiation genes, and that a gene other than Foxl2l is required for differentiation from GSCs to spermatocytes.

      It is true that we have not proved whether suppression of stemness is required for female differentiation. Maybe our earlier statement is a bit misleading. We agree that it is likely that turning off the GSC and early differentiation programs is important for allowing expression of meiotic and oocyte differentiation genes, and that a gene other than Foxl2l is required for differentiation from GSCs to spermatocytes. To avoid confusion, we have modified our statement in the text.

      (6) Based on its expression in mutant progenitors, p53 is proposed to assist with alternative differentiation of mutant germ cells. Although p53 transcripts are expressed, no evidence is provided that p53 is involved in differentiation of germ cells, and sex bias has not been associated with the published p53 mutants in zebrafish. Furthermore, while p53 has been shown to be important for ovary to testis transformation in mutant contexts in adults, it appears dispensable for testis development in mutants that disrupt ovary differentiation in earlier stages (Rodriguez-Mari et al PLoS Gen 2010, Shive PNAS 2010, Hartung et al Mol. Reprod. Dev 2014, Miao Development 2017, Kaufman et al PLoSGen 2018, Bertho et al Development 2021. It is possible that p53 eliminates foxl2l mutant germ cells that are simultaneously expressing multiple developmental programs, but this possibility would need to be tested.

      The tp53<sup>-/-</sup>foxl2l<sup>-/-</sup> double mutant cannot alleviate the all-male phenotype of foxl2l<sup>-/-</sup> mutant (Dev Biol, 517, 91-99, 2024), indicating that the male development is not due to p53-mediated germ cell apoptosis. We have cited the suggested papers and compared relation of tp53 between these mutants (fancl, zar1, etc.) mentioned in the cited papers. Since tp53 was enriched in certain foxl2l<sup>-/-</sup> mutant cell clusters, and tp53 mutation fails to rescue the all-male phenotype, it is possible that p53 expressed in these mutant cell clusters has roles other than inducing apoptosis. One assumption is that p53 may be involved in the germ cell differentiation, especially p53 is known to promote differentiation of airway epithelial progenitors, adipogenesis and embryonic stem cells. We have emphasized that the suggested role of p53 in germ cell differentiation is our assumption in the Discussion.

      Reviewer #3 (Public Review):

      This is the first report to show a transcriptional factor, foxl2l, is essential for the development of female germs. Without foxl2l, germ cells will be developed into sperms. The report also clearly defined the arrested stage of early germ cells in foxl2l mutants, or stages that is critical for foxl2l to play a role for the further development of female germ cells.

      (1) Due to lack of cell lineage tracing, the claim of foxl2l suppression of dedifferentiate of progenitor cells to GSC based on the gene expression and cell number changes is weak.

      Thanks for your comments pointing out our contribution and also weakness. We acknowledge the lack of direct evidence on the reversion of mutant Prog-C to GSC in our data. We now removed the claim about the repression of stemness by Foxl2l.

      (2) In addition, separation of early germ cell types in foxl2l mutant using marker genes from WT may not be optimal.

      The cell type of mutant cell is determined by two independent analyses. First is inferring the developmental stage of mutant cells. This approach assumes that mutant cells can indeed be mapped to specific WT stages through their transcriptomic profiles. However, as indicated by this reviewer’s comments, mutant cells exhibited heterogeneity and can be distinct from WT cells. Defining cell types in mutants by WT markers may not be optimal. To address this, we conducted another analysis, co-clustering. Mutant cells and WT cells at early stages (GSC , G-P, Prog-E, Prog-C(S) and Prog-C) were co-clustered. This approach does not assume a direct correspondence between mutant and WT developmental stages. Instead, it facilitates the identification of novel germ cell types in mutants while characterizing the relationship between WT and mutant cells. In some clusters, both WT and mutant cells were present, indicating high transcriptomic similarity. In other clusters, most cells are only mutant cells, indicating distinct mutant cell types (Figure 3C). We can, therefore, assign developmental properties to these mutant cells with confidence.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      The aim of this study is to test the overarching hypothesis that plasticity in BNST CRF neurons drives distinct behavioral responses to unpredictable threat in males and females. The manuscript provides evidence for a possible sex-specific role for CRF-expressing neurons in the BNST in unpredictable aversive conditioning and subsequent hypervigilance across sexes. As the authors note, this is an important question given the high prevalence of sex differences in stress-related disorders, like PTSD, and the role of hypervigilance and avoidance behaviors in these conditions. The study includes in vivo manipulation, bulk calcium imaging, and cellular resolution calcium imaging, which yield important insights into cell-type specific activity patterns. However, it is difficult to generate an overall conclusion from this manuscript, given that many of the results are inconsistent across sexes and across tests and there is an overall lack of converging evidence. For example, partial conditioning yields increased startle in males but not females, yet, CRF KO only increases startle response in males after full conditioning, not partial, and CRF neurons show similar activity patterns between partial and full conditioning across sexes. Further, while the study includes a KO of CRF, it does not directly address the stated aim of assessing whether plasticity in CRF neurons drives the subsequent behavioral effects unpredictable threat.

      We appreciate the reviewer’s summary and agree that there is a large amount of complexity to the results, and that it was difficult to generate a simple model/conclusion to summarize our work. This is the unfortunate side effect of looking across both sexes at different conditioning paradigms, however, we believe that it is important to convey this information to the field even without a simple answer.  Our data reinforces the very important findings from the Maren and Holmes groups that partial fear is a different process than full fear, and that the BNST plays a differential role here. We have reworded the manuscript to better convey this complexity.

      A major strength of this manuscript is the inclusion of both males and females and attention to possible behavioral and neurobiological differences between them throughout. However, to properly assess sex-differences, sex should be included as a factor in ANOVA (e.g. for freezing, startle, and feeding data in Figure 1) to assess whether there is a significant main effect or interaction with sex. If sex is not a statistically significant factor, both sexes should be combined for subsequent analyses. See, Garcia-Sifuentes and Maney, eLife 2021 https://elifesciences.org/articles/70817. There are additional cases where t-tests are used to compare groups when repeated measures ANOVAs would be more appropriate and rigorous.

      We agree with the reviewer that this is the more appropriate analysis and have changed the analysis and figures throughout the revised manuscript to better assess sex differences as well as differences between fear conditions.

      Additionally, it's unclear whether the two sexes are equally responsive to the shock during conditioning and if this is underlying some of the differences in behavioral and neuronal effects observed. There are some reports that suggest shock sensitivity differs across sexes in rodents, and thus, using a standard shock intensity for both males and females may be confounding effects in this study.

      This is a great point. We have conducted appropriate analysis (Sex by Tone Repeated measures two-way ANOVAS for each of the groups: Ctrl, Full, Part) and there are no sex differences in freezing between males and females. The extent of conditioning is not different between the groups suggesting that if there was a difference in shock sensitivity, it is not driving any discernible differences in behavioral performance. However, it is possible that the experience of the shock differs for the animals even in the absence of any measurable behavior.

      The data does not rule out that BNST CRF activity is not purely tracking the mobility state of the animal, given that the differences in activity also track with differences in freezing behavior. The data shows an inverse relationship between activity and freezing. This may explain a paradox in the data which is why males show a greater suppression of BNST activity after partial conditioning than full conditioning, if that activity is suspected to drive the increased anxiety-like response. Perhaps it reflects that activity is significantly suppressed at the end of the conditioning session because animals are likely to be continuously freezing after repeated shock presentations in that context. It would also explain why there is less of a suppression in activity over the course of the recall session, because there is less freezing as well during recall compared with conditioning.

      While it is possible that the BNST may be tracking activity, we believe it is not purely tracking mobility state. For instance, while freezing increases across tone exposures in Part fear regardless of sex, males show an increase while females show a reduction in BNST response during tone 5 (Fig 2K). The data the reviewer refers to showing the inverse relationship with BNST activity and freezing would have suggested the opposite response if it were purely tracking the mobility state of the animal. This is also the case with BNST<sup>CRF</sup> activity to first and last tone during recall. Despite the suppression of activity over the course of recall (Fig 5K), we see an increase in BNST<sup>CRF</sup> tone response when comparing tone 1 and 6 in males and a decrease in females (Fig 6M), again suggesting the BNST is responding to more than just activity.

      A mechanistic hypothesis linking BNST CRF neurons, the behavioral effects observed after fear conditioning, and manipulation of CRF itself are not clearly addressed here.

      We disagree with this assertion. The data suggests a model in which males respond with increased arousal and Part fear males show persistent activation of the BNST and BNST<sup>CRF</sup> neurons during fear conditioning and recall while female Part fear mice show the opposite response. This female response differs from what the field believes to be the role of the BNST in sustained fear. Additionally, we show that CRF knockdown is not involved in fear differentiation or fear expression in males, while it enhances fear learning and recall in females. We have reworded the manuscript to highlight these novel findings.

      Reviewer #2 (Public Review):

      This study examined the role of CRF neurons in the BNST in both phasic and sustained fear in males and females. The authors first established a differential fear paradigm whereby shocks were consistently paired with tones (Full) or only paired with tones 50% of the time (Part), or controls who were exposed to only tones with no shocks. Recall tests established that both Full and Part conditioned male and female mice froze to the tones, with no difference between the paradigms. Additional studies using the NSF and startle test, established that neither fear paradigm produced behavioral changes in the NSF test, suggesting that these fear paradigms do not result in an increase in anxiety-like behavior. Part fear conditioning, but not Full, did enhance startle responses in males but not females, suggesting that this fear paradigm did produce sustained increases in hypervigilance in males exclusively.

      Thank you for this clear summary of the behavioral work.

      Photometry studies found that while undifferentiated BNST neurons all responded to shock itself, only Full conditioning in males lead to a progressive enhancement of the magnitude of this response. BNST neurons in males, but not females, were also responsive to tone onset in both fear paradigms, but only in Full fear did the magnitude of this response increase across training. Knockdown of CRF from the BNST had no effect on fear learning in males or females, nor any effect in males on fear recall in either paradigm, but in females enhanced both baseline and tone-induced freezing only in Part fear group. When looking at anxiety following fear training, it was found in males that CRF knockdown modulated anxiety in Part fear trained animals and amplified startle in Fully trained males but had no effect in either test in females. Using 1P imaging, it was found that CRF neurons in the BNST generally decline in activity across both conditioning and recall trials, with some subtle sex differences emerging in the Part fear trained animals in that in females BNST CRF neurons were inhibited after both shock and omission trials but in males this only occurred after shock and not omission trials. In recall trials, CRF BNST neuron activity remained higher in Part conditioned mice relative to Full conditioned mice.

      Overall, this is a very detailed and complex study that incorporates both differing fear training paradigms and males and females, as well as a suite of both state of the art imaging techniques and gene knockdown approaches to isolate the role and contributions of CRF neurons in the BNST to these behavioral phenomena. The strengths of this study come from the thorough approach that the authors have taken, which in turn helped to elucidate nuanced and sex specific roles of these neurons in the BNST to differing aspects of phasic and sustained fear. More so, the methods employed provide a strong degree of cellular resolution for CRF neurons in the BNST. In general, the conclusions appropriately follow the data, although the authors do tend to minimize some of the inconsistencies across studies (discussed in more depth below), which could be better addressed through discussion of these in greater depth. As such, the primary weakness of this manuscript comes largely from the discussion and interpretation of mixed findings without a level of detail and nuance that reflects the complexity, and somewhat inconsistency, across the studies. These points are detailed below:

      - Given the focus on CRF neurons in the BNST, it is unclear why the photometry studies were performed in undifferentiated BNST neurons as opposed to CRF neurons specifically (although this is addressed, to some degree, subsequently with the 1P studies in CRF neurons directly). This does limit the continuity of the data from the photometry studies to the subsequent knockdown and 1P imaging studies. The authors should address the rationale for this approach so it is clear why they have moved from broader to more refined approaches.

      The reviewer raises a good point.  We did some preliminary photometry studies with BNST CRF neurons and found that there was poor time locked signal. We reasoned that this was due to the heterogeneity of the cell activity, as we saw in our previous publication (Yu et al). Because of this, we moved to the 1p imaging work in place of continued BNST CRF photometry. We have also reworded the manuscript to better discuss the complexities and inconsistencies in findings across the studies.

      - The CRF KD studies are interesting, but it remains speculative as to whether these effects are mediated locally in the BNST or due to CRF signaling at downstream targets. As the literature on local pharmacological manipulation of CRF signaling within the BNST seems to be largely performed in males, the addition of pharmacological studies here would benefit this to help to resolve if these changes are indeed mediated by local impairments in CRF release within the BNST or not. While it is not essential to add these experiments, the manuscript would benefit from a more clear description of what pharmacological studies could be performed to resolve this issue.

      We agree with the reviewer that the addition of this experiment would be highly informative for differentiating the role of CRF in the BNST. This is something that will need to be considered moving forward and we have added this as a point of discussion.

      - While I can appreciate the authors perspective, I think it is more appropriate to state that startle correlates with anxiety as opposed to outright stating that startle IS anxiety. Anxiety by definition is a behavioral cluster involving many outputs, of which avoidance behavior is key. Startle, like autonomic activation, correlates with anxiety but is not the same thing as a behavioral state of anxiety (particularly when the startle response dissociates from behavior in the NSF test, which more directly tests avoidance and apprehension). Throughout the manuscript the use of anxiety or vigilance to describe startle becomes interchangeable, but then the authors also dissociate these two, such as in the first paragraph of the discussion when stating that the Part fear paradigm produces hypervigilance in males without influencing fear or anxiety-like behaviors. The manuscript would benefit from harmonization of the language used to operationally define these behaviors and my recommendation would be to remain consistent with the description that startle represents hypervigilance and not anxiety, per se.

      The reviewer raises an excellent point, we have clarified in the revised manuscript.

      - The interpretation of the anxiety data following CRF KD is somewhat confusing. First, while the authors found no effect of fear training on behavior in the NSF test in the initial studies, now they do, however somewhat contradictory to what one would expect they found that Full fear trained males had reduced latency to feed (indicative of an anxiolytic response), which was unaltered by CRF KD, but in Part fear (which appeared to have no effect on its own in the NSF test), KD of CRF in these animals produced an anxiolytic effect. Given that the Part fear group was no different from control here it is difficult to interpret these data as now CRF KD does reduce latency to feed in this group, suggesting that removal of CRF now somehow conveys an anxiolytic response for Part fear animals. In the discussion the authors refer to this outcome as CRF KD "normalizing" the behavior in the NSF test of Part fear conditioned animals as now it parallels what is seen after Full fear, but given that the Part fear animals with GFP were no different then controls (and neither of these fear training paradigms produced any effect in the NSF test in the first arm of studies), it seems inappropriate to refer to this as "normalization" as it is unclear how this is now normalized. Given the complexity of these behavioral data, some greater depth in the discussion is required to put these data in context and describe the nuance of these outcomes, in particular a discussion of possible experimental factors between the initial behavioral studies and those in the CRF KD arm that could explain the discrepancy in the NSF test would be good (such as the inclusion of surgery, or other factors that may have differed between these experiments). These behavioral outcomes are even more complex given that the opposite effect was found in startle whereby CRF KD amplified startle in Full trained animals. As such, this portion of the discussion requires some reworking to more adequately address the complexity of these behavioral findings.

      The reviewer raises a good point, and we agree that there are many inconsistencies in the behaviors. We believe it is still good to show these results but have expanded the manuscript on potential reasons for these behavioral inconsistencies.

      Reviewer #3 (Public Review):

      Hon et al. investigated the role of BNST CRF signaling in modulating phasic and sustained fear in male and female mice. They found that partial and full fear conditioning had similar effects in both sexes during conditioning and during recall. However, males in the partially reinforced fear conditioning group showed enhanced acoustic startle, compared to the fully reinforced fear conditioning group, an effect not seen in females. Using fiber photometry to record calcium activity in all BNST neurons, the authors show that the BNST was responsive to foot shock in both sexes and both conditioning groups. Shock response increased over the session in males in the fully conditioned fear group, an effect not observed in the partially conditioned fear group. This effect was not observed in females. Additionally, tone onset resulted in increased BNST activity in both male groups, with the tone response increasing over time in the fully conditioned fear group. This effect was less pronounced in females, with partially conditioned females exhibiting a larger BNST response. During recall in males, BNST activity was suppressed below baseline during tone presentations and was significantly greater in the partially conditioned fear group. Both female groups showed an enhanced BNST response to the tone that slowly decayed over time. Next, they knocked CRF in the BNST to examine its effect on fear conditioning, recall and anxiety-like behavior after fear. They found no effect of the knockdown in either sex or group during fear conditioning. During fear recall, BNST CRF knockdown lead to an increase in freezing in only the partially conditioned females. In the anxiety-like behavior tasks, BNST CRF knockdown lead to increased anxiolysis in the partially reinforced fear male, but not in females. Surprisingly, BNST CRF knockdown increased startle response in fully conditioned, but not partially conditioned males. An effect not observed in either female group. In a final set of experiments, the authors single photon calcium imaging to record BNST CRF cell activity during fear conditioning and recall. Approximately, 1/3 of BNST CRF cells were excited by shock in both sexes, with the rest inhibited and no differences were observed between sexes or group during fear conditioning. During recall, BNST CRF activity decreased in both sexes, an effect pronounced in male and female fully conditioned fear groups.

      Overall, these data provide novel, intriguing evidence in how BNST CRF neurons may encode phasic and sustained fear differentially in males and females. The experiments were rigorous.

      We thank you for this positive review of our manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      There are several graphs representing different analyses of (presumably) the same group of subjects, but which have different N/group. For example, in Figure 2:

      (1) Fig 2P seems to have n=10 in Part Male group (Peak), but 2Q only has n=9 in Part Male group (AUC)

      (2) Fig 2S seems to have n=10 in Part Female group (Peak), but 2T only has n=7 in Part Female group (AUC)

      (3) Fig 2G (Tone Resp) has n=6 Full Males but 2F (Tone Resp), 2H (Shock Resp), and 2I (Shock Resp) have n=7 Full Males

      (4) Fig 2K (Tone Resp) has n=7 Full Females but 2L (Tone Resp), 2M (Shock Resp), and 2N (Shock Resp) have n=8 Full Females

      (5) Fig 2L (Tone Resp) has n=9 Part Females but 2K (Tone Resp), 2M (Shock Resp), and 2N (Shock Resp) have n=10 Part Females

      It's possible that this is just due to overlapping individual data points which are made harder to see due to the low resolution of the figures. If so, this can be easily rectified. However, there may also be subjects missing from some analyses which must be clarified or corrected.

      We thank you for catching these. We have gone through and fixed any issues with data points and have added statistics and exclusions in datasets to figure legends to further explain inconsistencies.

      Regarding statistical tests:

      (2) Data in Figs 2G and 2I should be analyzed using a two-way RM ANOVA.

      We have now included sex as a factor in most of our analysis and are now using appropriate statistical tests.

      (3) Data in Fig 3K should be analyzed using a two-way RM ANOVA.

      We are now using appropriate statistical tests.

      Calcium activity in response to the shock during conditioning and in response to the tone during recall should be included in Figure 5. Given partial and full animals also receive unequal presentations of the cue, it would be useful to see the effects trial by trial or normalized to the first 3 presentations only.

      The reviewer raises a great point. We have changed this figure and have now added the response to shock and tones. Since we are most interested in the difference between sustained and phasic fear, we decided to compare tone 3 in Full fear and tone 4 in Part fear, which differ in the ambiguity of their cue and only have one tone difference.

      Histology maps should be included for all experiments depicting viral spread and implant location for all animals, in addition to the included representative histology images. These can be placed in the supplement.

      We agree this is helpful. While we have confirmed all of the experiments are hits, the tissue is no longer in condition for this analysis.

      Referring to the quantification of peaks in fiber photometry and cellular resolution calcium imaging data as "spikes" is a bit misleading given the inexact relationship between GCAMP sensor dynamics/calcium binding and neuronal action potentials, perhaps calling it "event" frequency would be more clear.

      We have changed the references of spikes to events as suggested.

      The legend for Figure 2S is mislabeled as A.

      Thank you for catching this mistake, it has been fixed.

      The methods refer to CRFR1 fl/fl animals but it seems no experiments used these animals, only CRF fl/fl.

      We have fixed this, thank you.

      Reviewer #2 (Recommendations For The Authors):

      As stated in the public review, while I think the addition of local pharmacological studies blocking CRF1 and 2 receptors in the BNST in both males and females, done under the same conditions as all of the other testing herein, would help to resolve some of the speculation of interpreting the CRF KD data, I dont think these studies are essential to do, but it would be good for the authors to more explicitly state what studies could be done and how they could facilitate interpretation of these data.

      Thank you for this suggestion. We have added this discussion into the manuscript.

      Asides from this, my other recommendations for the authors are to more clearly address the discrepancies in behavioral outcomes across studies and explicitly describe their rationale for the sequence of experiments performed and to harmonize their operationalization of how they define anxiety.

      Again, we appreciate these great suggestions. We have added more discussion on the behavioral discrepancies as well as rationale for the experiments. We have also changed the wording to remain consistent that the NSF test relates to anxiety and the Startle test relates to vigilance.

      - In Figure 2, Panel S is listed as Panel A in the caption and should be corrected.

      Thank you for catching this mistake, we have fixed it.

      Reviewer #3 (Recommendations For The Authors):

      My biggest concerns I have regard the interpretations and some conclusions from this data set, which I have stated below.

      (1) It was surprising to see minimal and somewhat conflicting behavioral effects due to BNST CRF knockdown. The authors provide a representative image and address this in the conclusion. They mention the role of local vs projection CRF circuits as well as the role of GABA. I don't think those experiments are necessary for this manuscript. However, it may be worthwhile to see through in situ hybridization or IHC, to see BNST CRF levels after both full and partial conditioned fear paradigms. Additionally, it would help to see a quantification of the knockdown of the animals.

      Thank you for these great suggestions. We will consider these for future experiments. We piloted out some CRF sensor experiments to probe this, but it was unclear if the signal to noise for the sensor was sufficient. We hope to do more of this in the future if we ever manage to get funding for this work.

      The authors can add a figure showing deltaF/F changes from control.

      We did not have control mice in these in-vivo experiments Our main interests lie in understanding the differences in Full and Part Fear conditioning paradigms specifically.

      (2) Related to the previous point, it was surprising to see an effect of the CRF deletion in the full fear group compared to the partial fear in the acoustic startle task. To strengthen the conclusion about differential recruitment of CRF during phasic and sustained fear, the experiment in my previous point could help elucidate that. Conversely, intra-BNST administration of a CRF antagonist into the BNST before the acoustic startle after both conditioning tasks could also help. Or patch from BNST CRF neurons after the conditioning tasks to measure intrinsic excitability. Not all these experiments are needed to support the conclusion, it's some examples.

      We thank the reviewer for these suggestions and agree that these are important experiments. We will consider this in future experiments exploring the role of BNST CRF in fear conditioning.

      (3) In Figure 5 F and K, the authors report data combined for both part and full fear conditioning. Were there any differences between the number of excited or inhibited neurons b/t the conditioning groups?

      We are only looking at the first shock exposure in these figures. These were combined because the first tone and shock exposure is identical in Full and Part fear conditioning. Differences in these behavioral paradigms emerge after Tone 3 exposure, where Part fear does not receive a shock while Full fear does.

      Also, can the authors separate male and female traces in Fig 5 E and P?

      Traces in Fig E are from females only. We did not include male traces because males and females had identical responses to first shock, and we felt only one trace was needed as an example. Traces in Figure P are from males. We did not show female traces because females did not show differential effects from baseline to end.

      (4) Also, regarding the calcium imaging data, what was the average length of a transient induced by shock? Were there any differences between the sexes?

      We have many cells in each condition, and the length of traces after shock were all different and hard to quantify, as for example, sometimes cells were active before shock and thus trace length would be difficult to quantify. Therefore, to keep consistency and reduce ambiguity regarding trace lengths, we focused on keeping the time consistent across mice and focused on the 10 second window post shock to be consistent across conditions.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary:

      In this study, Osiurak and colleagues investigate the neurocognitive basis of technical reasoning. They use multiple tasks from two neuroimaging studies and overlap analysis to show that the area PF is central for reasoning, and plays an essential role in tool-use and non-tool-use physical problem-solving, as well as both conditions of mentalizing task. They also demonstrate the specificity of the technical reasoning and find that the area PF is not involved in the fluid-cognition task or the mentalizing network (INT+PHYS vs. PHYS-only). This work suggests an understanding of the neurocognitive basis of technical reasoning that supports advanced technologies.

      Strengths:

      -The topic this study focuses on is intriguing and can help us understand the neurocognitive processes involved in technical reasoning and advanced technologies.

      -The researchers obtained fMRI data from multiple tasks. The data is rich and encompasses the mechanical problem-solving task, psychotechnical task, fluid-cognition task, and mentalizing task.

      -The article is well written.

      We sincerely thank Reviewer 1 for their positive and very helpful comments, which helped us improve the MS. Thank you.

      Weaknesses:

      - Limitations of the overlap analysis method: there are multiple reasons why two tasks might activate the same brain regions. For instance, the two tasks might share cognitive mechanisms, the activated regions of the two tasks might be adjacent but not overlapping at finer resolutions, or the tasks might recruit the same regions for different cognition functions.

      Thus, although overlap analysis can provide valuable information, it also has limitations.

      Further analyses that capture the common cognitive components of activation across different

      tasks are warranted, such as correlating the activation across different tasks within subjects for a region of interest (i.e. the PF).

      We thank Reviewer 1 for this comment. We added new analyses to address the two alternative interpretations stressed here by Reviewer 1, namely, the same-region-but-differentfonction interpretation and the adjacency interpretation. The new analyses ruled out both alternative interpretations, thereby reinforcing our interpretation.

      “The conjunction analysis reported was subject to at least two key limitations that needed to be overcome to assure a correct interpretation of our findings. The first was that the tasks could recruit the same regions for different cognition functions (same-region-but-different-function interpretation). The second was that the activated regions of the different tasks could be adjacent but did not overlap at finer resolutions (adjacency interpretation). We tested the same-region-but-different-function interpretation by conducting additional ROI analyses, which consisted of correlating the specific activation of the left area PF (i.e., difference in terms of mean Blood-Oxygen Level Dependent [BOLD] parameter estimates between the experimental condition minus the control condition) in the psychotechnical task, the fluid-cognition task, and the PHYS-Only and INT+PHYS conditions of the mentalizing task. This analysis did not include the mechanical problem-solving task because the sample of participants was not the same for this task. As shown in Fig. 5, we found significant correlations between all the tasks that were hypothesized as recruiting technical reasoning, i.e., the psychotechnical task and the PHYS-Only and INT+PHYS conditions of the mentalizing task (all p < .05). By contrast, no significant correlation was obtained between these three tasks and the fluid-cognition task (all p > .15). This finding invalidates the same-region-but-different-function interpretation by revealing a coherent pattern in the activation of the left area PF in situations in which participants were supposed to reason technically. We examined the adjacency interpretation by analysing the specific locations of individual peak activations within the left area PF ROI for the mechanical problemsolving task, the psychotechnical task, the fluid-cognition task, and the PHYS-Only and INT+PHYS conditions of the mentalizing task. These peaks, which corresponded to the maximum value of activation obtained for each participant within the left area PF ROI, are reported in Fig. 6. As can be seen, the peaks of the fluid-cognition task were located more anteriorly, in the left area PFt (Parietal Ft) and the postcentral cortex, compared to the peaks of the other four tasks, which were more posterior, in the left area PF. Statistical analyses based on the y coordinates of the individual activation peaks confirmed this description (Fig. 6). Indeed, the y coordinates of the peaks of the mechanical problem-solving task, the psychotechnical task and the PHYS-Only and INT+PHYS conditions of the mentalizing task were posterior to the y coordinates of the peaks of the fluid-cognition task (all p < .05), whereas no significant differences were reported between the four tasks (all p > .05). These findings speak against the adjacency interpretation by revealing that participants recruited the same part of the left area PF to perform tasks involving technical reasoning.” (p. 11-13)

      Control tasks may be inadequate: the tasks may involve other factors, such as motor/ actionrelated information. For the psychotechnical task, fluid-cognition task, and mentalizing task, the experiment tasks need not only care about technical-cognition information but also motor-related information, whereas the control tasks do not need to consider motor-related information (mainly visual shape information). Additionally, there may be no difference in motor-related information between the conditions of the fluid-cognition task. Therefore, the regions of interest may be sensitive to motor-related information, affecting the research conclusion.

      We thank Reviewer 1 for this comment. We added a specific section in the discussion that addresses this limitation.

      “The second limitation concerns the alternative interpretation that the left area PF is not central to technical reasoning but to the storage of sensorimotor programs about the prototypical manipulation of common tools. Here we show that the left area PF is recruited even in situations in which participants do not have to process common manipulable tools. For instance, some items of the psychotechnical task consisted of pictures of tractor, boat, pulley, or cannon. The fact that we found a common activation of the left area PF in such tasks as well as in the mechanical problem-solving task, in which participants could nevertheless simulate the motor actions of manipulating novel tools, indicates that this brain area is not central to tool manipulation but to physical understanding. That being said, some may suggest that viewing a boat or a cannon is enough to incite the simulation of motor actions, so our tasks were not equipped to distinguish between the manipulation-based approach and the reasoning-based approach. We have already shown that the left area PF is more involved in tasks that focus on the mechanical dimension of the tool-use action (e.g., the mechanical interaction between a tool and an object) than its motor dimension (i.e., the interaction between the tool and the effector [e.g., 24, 40]). Nevertheless, we recognize that future research is still needed to test the predictions derived from these two approaches.” (p. 18-19)

      -Negative results require further validation: the cognitive results for the fluid-cognition task in the study may need more refinement. For instance, when performing ROI analysis, are there any differences between the conditions? Bayesian statistics might also be helpful to account for the negative results.

      We agree that our negative results required further validation. We conducted the ROI analyses suggested by Reviewer 1, which confirmed the initial whole-brain analyses.

      “Region of interest (ROI) results. We conducted additional analyses to test the robustness of our findings. One of our results was that we did not report any specific activation of the left area PF in the fluid-cognition task contrary to the mechanical problem-solving task, the psychotechnical task, and the PHYS-Only and INT+PHYS conditions of the mentalizing task. However, this negative result needed exploration at the ROI level. Therefore, we created a spherical ROI of the left area PF with a radius of 12 mm in the MNI standard space (–59; –31; 40). This ROI was literature-defined to ensure the independence of its selection (40). ROI results are shown in Fig. 4. The analyses confirmed the results obtained with the whole-brain analyses by indicating a greater activation of the left area PF in the mechanical problem-solving task, the psychotechnical task, and the PHYS-Only and INT+PHYS conditions of the mentalizing task (all p < .001), but not in the fluid-cognition task (p \= .35).” (p. 10-11)

      Reviewer #1 (Recommendations For The Authors):

      (1) I may not fully grasp some of the arguments. In the abstract, what does the term "intermediate-level" mean, and why is it an intermediate-level state? In the sentence "the existence of a specific cognitive module in the human brain dedicated to materiality", I cannot see a clear link between technical cognition and the word "materiality".

      We used the term materiality to refer to a potential human trait that allows us to shape the physical world according to our ends, by using, making tools and transmiting them to others. This is a reference to Allen et al. (2020; PNAS): “We hope this empirical domain and modeling framework can provide the foundations for future research on this quintessentially human trait: using, making, and reasoning about tools and more generally shaping the physical world to our ends” (p. 29309). Scientists (including archaeologists, economists, psychologists, neuroscientists) interested in human materiality have tended to focus on how we manipulate things according to our thought (motor cognition) or how we conceptualize our behaviour to transmit it to others (language, social cognition). However, little has been said on the intermediate level, that is, technical cognition. We added the term “technical cognition” here, which should help to make the connection more quickly.

      “Yet, little has been said about the intermediate-level cognitive processes that are directly involved in mastering this materiality, that is, technical cognition.” (p. 2)

      (2) The introduction could provide more details on why the issue of "generalizability and specificity" is important to address, to clarify the significance of the research question.

      We followed this comment and added a sentence to explain why it is important to address this research question. Again, we thank Reviewer 1 for their helpful comments.

      “Here we focus on two key aspects of the technical-reasoning hypothesis that remain to be addressed: Generalizability and specificity. If technical reasoning is a specific form of reasoning oriented towards the physical world, then it should be implicated in all (the generalizability question) and only (the specificity question) the situations in which we need to think about the physical properties of our world.” (p. 5)

      Reviewer #2 (Public Review):

      Summary:

      The goal of this project was to test the hypothesis that a common neuroanatomic substrate in the left inferior parietal lobule (area PF) underlies reasoning about the physical properties of actions and objects. Four functional MRI (fMRI) experiments were created to test this hypothesis. Group contrast maps were then obtained for each task, and overlap among the tasks was computed at the voxel level. The principal finding is that the left PF exhibited differentially greater BOLD response in tasks requiring participants to reason about the physical properties of actions and objects (referred to as technical reasoning). In contrast, there was no differential BOLD response in the left PF when participants engaged in fMRI variant of the Raven's progressive matrices to assess fluid cognition.

      Strengths:

      This is a well-written manuscript that builds from extensive prior work from this group mapping the brain areas and cognitive mechanisms underlying object manipulation, technical reasoning, and problem-solving. Major strengths of this manuscript include the use of control conditions to demonstrate there are differentially greater BOLD responses in area PF over and above the baseline condition of each task. Another strength is the demonstration that area PF is not responsive in tasks assessing fluid cognition - e.g., it may just be that PF responds to a greater extent in a harder condition relative to an easy condition of a task. The analysis of data from Task 3 rules out this alternative interpretation. The methods and analysis are sufficiently written for others to replicate the study, and the materials and code for data analysis are publicly available.

      We sincerely thank Reviewer 2 for their precious comments, which helped us improve the MS. 

      Weaknesses:

      The first weakness is that the conclusions of the manuscript rely on there being overlap among group-level contrast maps presented in Figure 2. The problem with this conclusion is that different participants engaged in different tasks. Never is an analysis performed to demonstrate that the PF region identified in e.g., participant 1 in Task 2 is the same PF region identified in Participant 1 in Task 4.

      We added new analyses that demonstrated that “the PF region identified in e.g., participant 1 in Task 2 is the same PF region identified in Participant 1 in Task 4”. We thank Reviewer 2 for this comment, because these new analyses reinforced our interpretation.

      “The conjunction analysis reported was subject to at least two key limitations that needed to be overcome to assure a correct interpretation of our findings. The first was that the tasks could recruit the same regions for different cognition functions (same-region-but-different-function interpretation). The second was that the activated regions of the different tasks could be adjacent but did not overlap at finer resolutions (adjacency interpretation). We tested the same-region-but-different-function interpretation by conducting additional ROI analyses, which consisted of correlating the specific activation of the left area PF (i.e., difference in terms of mean Blood-Oxygen Level Dependent [BOLD] parameter estimates between the experimental condition minus the control condition) in the psychotechnical task, the fluid-cognition task, and the PHYS-Only and INT+PHYS conditions of the mentalizing task. This analysis did not include the mechanical problem-solving task because the sample of participants was not the same for this task. As shown in Fig. 5, we found significant correlations between all the tasks that were hypothesized as recruiting technical reasoning, i.e., the psychotechnical task and the PHYS-Only and INT+PHYS conditions of the mentalizing task (all p < .05). By contrast, no significant correlation was obtained between these three tasks and the fluid-cognition task (all p > .15). This finding invalidates the same-region-but-different-function interpretation by revealing a coherent pattern in the activation of the left area PF in situations in which participants were supposed to reason technically. We examined the adjacency interpretation by analysing the specific locations of individual peak activations within the left area PF ROI for the mechanical problemsolving task, the psychotechnical task, the fluid-cognition task, and the PHYS-Only and INT+PHYS conditions of the mentalizing task. These peaks, which corresponded to the maximum value of activation obtained for each participant within the left area PF ROI, are reported in Fig. 6. As can be seen, the peaks of the fluid-cognition task were located more anteriorly, in the left area PFt (Parietal Ft) and the postcentral cortex, compared to the peaks of the other four tasks, which were more posterior, in the left area PF. Statistical analyses based on the y coordinates of the individual activation peaks confirmed this description (Fig. 6). Indeed, the y coordinates of the peaks of the mechanical problem-solving task, the psychotechnical task and the PHYS-Only and INT+PHYS conditions of the mentalizing task were posterior to the y coordinates of the peaks of the fluid-cognition task (all p < .05), whereas no significant differences were reported between the four tasks (all p > .05). These findings speak against the adjacency interpretation by revealing that participants recruited the same part of the left area PF to perform tasks involving technical reasoning.” (p. 11-13)

      A second weakness is that there is a variance in accuracy between tasks that are not addressed. It is clear from the plots in the supplemental materials that some participants score below chance (~ 50%). This means that half (or more) of the fMRI trials of some participants are incorrect. The methods section does not mention how inaccurate trials were handled. Moreover, if 50% is chance, it suggests that some participants did not understand task instructions and were systematically selecting the incorrect item.

      It is true that the experimental conditions were more difficult than the control conditions, with some participants who performed at or below 50% in the experimental conditions. We added a section in the MS to stress this aspect. To examine whether this potential difficulty effect biased our interpretation, we conducted new ROI analyses by removing all the participants who performed at or below the chance level. These analyses revealed the same results as when no participant was excluded, suggesting that this did not bias our interpretation.

      “As mentioned above, the experimental conditions of all the tasks were more difficult than their control conditions. As a result, the specific activation of the left area PF documented above could simply reflect that this area responds to a greater extent in a harder condition relative to an easy condition of a task. This interpretation is nevertheless ruled out by the results obtained with the fluid-cognition task. We did not report a specific activation of the left area PF in this task while its experimental condition was more difficult than its control condition. To test more directly this effect of difficulty, we conducted new ROI analyses by removing all the participants who performed at or below 50% (Fig. S2). These new analyses replicated the initial analyses by showing a greater activation of the left area PF in the mechanical problem-solving task, the psychotechnical task, and the PHYS-Only and INT+PHYS conditions of the mentalizing task (all p < .001), but not in the fluid-cognition task (p \= .48). In sum, the ROI analyses corroborated the wholebrain analyses and ruled out the potential effect of difficulty.” (p. 11)

      A third weakness is related to the fluid cognition task. In the fMRI task developed here, the participant must press a left or right button to select between 2 rows of 3 stimuli while only one of the 3 stimuli is the correct target. This means that within a 10-second window, the participant must identify the pattern in the 3x3 grid and then separately discriminate among 6 possible shapes to find the matching stimulus. This is a hard task that is qualitatively different from the other tasks in terms of the content being manipulated and the time constraints.

      We acknowledge that the fluid-cognition task involved a design that differed from the other tasks. However, this was also true for the other tasks, as the design also differed between the mechanical problem-solving task, the psychotechnical task, and the mentalizing task. Nevertheless, despite these distinctions, we found a consistent activation of the left area PF in these tasks with different designs including in the psychotechnical task, which seemed as difficult as the fluid-cognition task.

      “Region of interest (ROI) results. We conducted additional analyses to test the robustness of our findings. One of our results was that we did not report any specific activation of the left area PF in the fluid-cognition task contrary to the mechanical problem-solving task, the psychotechnical task, and the PHYS-Only and INT+PHYS conditions of the mentalizing task. However, this negative result needed exploration at the ROI level. Therefore, we created a spherical ROI of the left area PF with a radius of 12 mm in the MNI standard space (–59; –31; 40). This ROI was literature-defined to ensure the independence of its selection (40). ROI results are shown in Fig. 4. The analyses confirmed the results obtained with the whole-brain analyses by indicating a greater activation of the left area PF in the mechanical problem-solving task, the psychotechnical task, and the PHYS-Only and INT+PHYS conditions of the mentalizing task (all p < .001), but not in the fluid-cognition task (p \= .35).” (p. 10-11)

      In sum, this is an interesting study that tests a neuro-cognitive model whereby the left PF forms a key node in a network of brain regions supporting technical reasoning for tool and non-tool-based tasks. Localizing area PF at the level of single participants and managing variance in accuracy is critically important before testing the proposed hypotheses.

      We thank Reviewer 2 for this positive evaluation and their suggestions. As detailed in our response, our revision took into consideration both the localization of the left area PF at the level of single participants and the variance in accuracy. 

      Reviewer #2 (Recommendations For The Authors):

      Did the fMRI data undergo high-pass temporal filtering prior to modeling the effects of interest? Participants engaged in a long (17-24 minutes) run of fMRI data collection. Highpass filtering of the data is critically important when managing temporal autocorrelation in the fMRI response (e.g., see Shinn et al., 2023, Functional brain networks reflect spatial and temporal autocorrelation. Nature Neuroscience).

      Yes. We added this information.

      “Regressors of non-interest resulting from 3D head motion estimation (x, y, z translation and three axes of rotation) and a set of cosine regressors for high-pass filtering were added to the design matrix.” (p. 25-26)

      Including scales in Figure 2 would help the reader interpret the magnitude of the BOLD effects.

      We added this information in Figure 3 (Figure 2 in the initial version of the MS).

      It was difficult to inspect the small thumbnail images of the task stimuli in Figure 1. Higher resolution versions of those stimuli would help facilitate understanding of the task design and trial structure.

      We changed both Figure 1 and Figure S1.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript reports two neuroimaging experiments assessing commonalities and differences in activation loci across mechanical problem-solving, technical reasoning, fluid cognition, and "mentalizing" tasks. Each task includes a control task. Conjunction analyses are performed to identify regions in common across tasks. As Area PF (a part of the supramarginal gyrus of the inferior parietal lobe) is involved across 3 of the 4 tasks, the investigators claim that it is the hub of technical cognition.

      Strengths:

      The aim of finding commonalities and differences across related problem-solving tasks is a useful and interesting one.

      The experimental tasks themselves appear relatively well-thought-out, aside from the concern that they are differentially difficult.

      The imaging pipeline appears appropriate.

      We thank Reviewer 3 for their constructive comments, which helped us improve the MS.

      Weaknesses:

      (1) Methodological

      As indicated in the supplementary tables and figures, the experimental tasks employed differ markedly in 1) difficulty and 2) experimental trial time. Response latencies are not reported (but are of additional concern given the variance in difficulty). There is concern that at least some of the differences in activation patterns across tasks are the result of these fundamental differences in how hard various brain regions have to work to solve the tasks and/or how much of the trial epoch is actually consumed by "on-task" behavior. These difficulty issues should be controlled for by 1) separating correct and incorrect trials, and 2) for correct trials, entering response latency as a regressor in the Generalized Linear Models, 3) entering trial duration in the GLMs.

      We thank Reviewer 3 for this comment. It is true that the experimental conditions were more difficult than the control conditions, with some participants who performed at or below 50% in the experimental conditions. We added a section in the MS to stress this aspect. We could not conduct new analyses by separating correct and incorrect trials because, for each task, participants had to respond only on the last item of the block. Therefore, we did not record a response for each event. Nevertheless, we could examine whether this potential difficulty effect biased our interpretation, by conducting new ROI analyses in which we removed all the participants who performed at or below the chance level. These analyses revealed the same results as when no participant was excluded, suggesting that this did not bias our interpretation. 

      “As mentioned above, the experimental conditions of all the tasks were more difficult than their control conditions. As a result, the specific activation of the left area PF documented above could simply reflect that this area responds to a greater extent in a harder condition relative to an easy condition of a task. This interpretation is nevertheless ruled out by the results obtained with the fluid-cognition task. We did not report a specific activation of the left area PF in this task while its experimental condition was more difficult than its control condition. To test more directly this effect of difficulty, we conducted new ROI analyses by removing all the participants who performed at or below 50% (Fig. S2). These new analyses replicated the initial analyses by showing a greater activation of the left area PF in the mechanical problem-solving task, the psychotechnical task, and the PHYS-Only and INT+PHYS conditions of the mentalizing task (all p < .001), but not in the fluid-cognition task (p \= .48). In sum, the ROI analyses corroborated the wholebrain analyses and ruled out the potential effect of difficulty.” (p. 11)

      A related concern is that the control tasks also differ markedly in the degree to which they were easier and faster than their corresponding experimental task. Thus, some of the control tasks seem to control much better for difficulty and time on task than others. For example, the control task for the psychotechnical task simply requires the indication of which array contains a simple square shape (i.e., it is much easier than the psychotechnical task), whereas the control task for mechanical problem-solving requires mentally fitting a shape into a design, much like solving a jigsaw puzzle (i.e., it is only slightly easier than the experimental task).

      It is true that some control conditions could be easier than other ones. These differences reinforced the common activation found in the left area PF in the tasks hypothesized as involving technical reasoning, because this activation survived irrespective of the differences in terms of experimental design. For us, the rationale is the same as for a meta-analysis, in which we try to find what is common to a great variety of tasks. The only detrimental consequence we identified here is that this difference explained why we did not report a specific activation of the left area PF in the fluid-cognition task, as if the left area PF was more responsive when the task was difficult. This possibility assumes that the experimental condition of the fluid-cognition task is much more difficult than its control condition compared to what can be seen in the other tasks. As Reviewer 2 stressed in Point 1, this interpretation is unlikely, because the differences between the experimental and control conditions were similar to the fluid-cognition task in the mechanical problem-solving and psychotechnical tasks. In addition, again, the new ROI analyses in which we removed all the participants who performed at or below the chance level in expetimental conditions reproduced our initital results.

      (2) Theoretical 

      The investigators seem to overlook prior research that does not support their perspective and their writing seems to lack scientific objectivity in places. At times they over-reach in the claims that can be made based on the present data. Some claims need to be revised/softened.

      As this comment is also mentioned below, please find our response to it below.

      Reviewer #3 (Recommendations For The Authors):

      (1) Because of the high level of detail, Figures 1 and S2 (particularly the mentalizing task and mechanical problem-solving task, and their controls) are very hard to parse, even when examined relatively closely. It is suggested that these figures be broken down into separate panels for Experiment 1 and Experiment 2 to facilitate understanding.

      We changed both Figure 1 and Figure S1.

      (2) The behavioral data (including response latencies) should be reported in the main results section of the paper and not in a supplement.

      The behavioural data are now reported in the main results. We did not report response latencies because participants were not prompted to respond as quickly as possible.

      “Behavioural results. All the behavioural results are given in Fig. 2. As shown, scores were higher in the experimental conditions than for the control conditions for all the tasks (all p < .05). In other words, the experimental conditions were more difficult than the control conditions. This difference in terms of difficulty can also be illustrated by the fact that some participants performed at or below the chance level in the experimental conditions whereas none did so in the control conditions.” (p. 8)

      (3) The investigators seem to overlook prior research that does not support their perspective and their writing seems to lack scientific objectivity in places. At times they over-reach in the claims that can be made based on the present data. For example, claims that need to be revised/softened include:

      Abstract: "Area PF... can work along with social-cognitive skills to resolve day-to-day interactions that combine social and physical constraints". This statement is overly speculative.

      This statement is based on the fact that we reported a combined activation of the technical-reasoning network and the mentalizing network in the INT+PHYS condition of the mentalizing task. This suggests that both networks need to work together for solving a day-today problem in which both the physical constraints of the situation and the intention of the individual must be integrated. Our findings replicated previous ones with a similar task (e.g., Brunet et al. 2000; Völlm et al., 2006), in which the authors gave an interpretation similar to ours in considering that this task requires understanding physical and social causes. Perhaps that the reference to the results of the mentalizing task was not explicit enough. We added “dayto-day” before “problem” in the part of the discussion in which we discuss this possibility to make this aspect clearer.

      “In broad terms, the results of the mentalizing task indicate that causal reasoning has distinct forms and that it recruits distinct networks of the human brain (Social domain: Mentalizing; Physical domain: Technical reasoning), which can nevertheless interact together to solve day-to-day problems in which several domains are involved, such as in the INT+PHYS condition of the mentalizing task.” (p. 16)

      Introduction: "The manipulation-based approach... remains silent on the more general cognitive mechanisms...that must also encompass the use of unfamiliar or novel tools". This statement seems to be based on an overly selective literature review. There are a number of studies in which the relationship between a novel and familiar tool selection/use has been explored (e.g., Buchman & Randerath, 2017; Mizelle & Wheaton, 2010; Silveri & Ciccarelli, 2009; Stoll, Finkel et al., 2022; Foerster, 2023; Foerster, Borghi, & Goslin, 2020; Seidel, Rijntjes et al., 2023).

      We thank Reviewer 3 for this comment. Even if we accept the idea that we possess specific sensorimotor programs about tool manipulation, it remains that these programs cannot explain how an individual decides to bend a wire to make a hook or to pour water in a recipient to retrieve a target. As a matter of fact, such behaviour has been reported in nonhuman animals, such as crows (Weir et al., 2002, Nature) or orangutans (Mendes et al., 2007, Biology Letters). In these studies, the question is whether these nonhuman animals understand the physical causes or not, but the question of sensorimotor programs is never addressed (to our knowledge). This is also true in developmental studies on tool use (e.g., Beck et al., 2011, Cognition; Cutting et al., 2011, Journal of Experimental Child Psychology). This is what we meant here, that is, the manipulation-based approach is not equipped to explain how people solve physical problems by using or making tools – or any object – or by building constructions or producing technical innovations. However, we agree that some papers have been interested in exploring the link between common and novel tool use and have suggested that both could recruit common sensorimotor programs. It is noteworthy that these studies do not test the predictions from the manipulation-based approach versus the reasoning-based approach, so both interpretations are generally viable as stressed by Seidel et al. (2023), one of the papers recommended by Reviewer 3.

      “Apparently, the presentation of a graspable object that is recognizable as a tool is sufficient to provoke SMG activation, whether one tends to see the function of SMG to be either “technical reasoning” (Osiurak and Badets 2016; Reynaud et al. 2016; Lesourd et al. 2018; Reynaud et al. 2019) or “manipulation knowledge” (Sakreida et al. 2016; Buxbaum 2017; Garcea et al. 2019b).” (Seidel et al., 2023; p. 9)

      Regardless, as suggested by Reviewer 3, these papers deserve to be cited and this part needed to be rewritten to insist on the “making, construction, and innovation” dimension more than on the “unfamiliar and novel tool use” dimension to avoid any ambiguity.

      “This manipulation-based approach has provided interesting insights (12–16) and even elegant attempts to explain how these sensorimotor programs could support the use of both unfamiliar or novel tools (17–20), but remains silent on the more general cognitive mechanisms behind human technology that include the use of common and unfamiliar or novel tools but must also encompass tool making, construction behaviour, technical innovations, and transmission of technical content.” (p. 3)

      Introduction: "Here we focus on two important questions... to promote the technicalreasoning hypothesis as a comprehensive cognitive framework..."(italics added). This and other similar statements should be rewritten as testable scientific hypotheses rather than implying that the point of the research is to promote the investigators' preferred view.

      We agree that our phrasing could seem inappropriate here. What we meant here is that the technical-reasoning hypothesis could become an interesting framework for the study of the cognitive bases of human technology only if we are able to verify some of its key facets. As suggested, we rewrote this part. We also rewrote the abstract and the first paragraph of the discussion.

      “Here we focus on two key aspects of the technical-reasoning hypothesis that remain to be addressed: Generalizability and specificity. If technical reasoning is a specific form of reasoning oriented towards the physical world, then it should be implicated in all (the generalizability question) and only (the specificity question) the situations in which we need to think about the physical properties of our world.” (p. 5)

      Introduction: The Goldenberg and Hagmann paper cited actually shows that familiar tool use may be based either on retrieval from semantic memory or by inferring function from structure (mechanical problem solving); in other words, the investigators saw a role for both kinds of information, and the relationship between mechanical problem solving and familiar tool use was actually relatively weak. This requires correction.

      We disagree with Reviewer 3 on this point. The whole sentence is as follows:

      “This silence has been initially broken by a series of studies initiated by Goldenberg and Hagmann (9), which has documented a behavioural link in left brain-damaged patients between common tool use and the ability to solve mechanical problems by using and even sometimes making novel tools (e.g., extracting a target out from a box by bending a wire to create a hook) (9, 17).” (p. 3-4)

      We did not mention the interpretations given by Goldenberg and Hagmann about the link with the pantomime task, but only focused on the link they reported between common tool use and novel tool use. This is factual. In addition, we also disagree that the link between common tool use and novel tool use was weak.

      “The hypothesis put forward in the introduction predicts that knowledge about prototypical tool use assessed by pantomime of tool use and the ability to infer function from structure assessed by novel tool selection can both contribute to the use of familiar tools. Indeed results of both tests correlated signicantly with the use of familiar tools pantomime of tool use: r \= 0.77, novel tool selection: r \= 0.62; both P < 0.001), but there was also a signicant correlation between the two tests r \= 0.64, P < 0.001).” (Goldenberg & Hagmann, 1998; p. 585)

      As can be seen in this quote, they reported a significant correlation between novel tool selection and the use of familiar tools. It is also noteworthy that the novel tool selection test and the pantomime test correlated together. Georg Goldenberg told one of the authors (F. Osiurak; personal communication) that this result incited him to revise its idea that pantomime could assess “semantic knowledge”, which explains why he did not use it again as a measure of semantic knowledge. Instead, he preferred to use a classical semantic matching task in his 2009 Brain paper with Josef Spatt, in which they found a clearer dissociation between semantic knowledge and common/novel tool use not only at the behavioral level but also at the cerebral level.

      Introduction: Please expand and clarify this sentence "However, this involvement seems to be task-dependent, contrary to the systematic involvement of left are PF. The IFG and LOTC activations observed in prior studies are of interest as well. Were they indeed all taskdependent in these studies?

      We agree that this sentence is confusing. We meant that, in the studies reported just above in the paragraph, these regions were not systematically reported contrary to the left area PF. As we think that this information was not crucial for the logic of the paper, we preferred to remove it. 

      Introduction: If implicit mechanical knowledge is acquired through interactions with objects, how is that implicit knowledge conveyed to pass on the material culture to others?

      We thank Reviewer 3 for this comment. Although mechanical knowledge is implicit, it can be indirectly transmitted to other individuals, as shown in two papers we published in Nature Human Behaviour (Osiurak et al., 2021) and Science Advances (Osiurak et al., 2022). Actually, verbal teaching is not the only way to transmit information. There are many other ways of transmitting information such as gestural teaching (e.g., pointing the important aspects of a task to make them salient to the learner), observation without teaching (i.e., when we observe someone unbeknown to them) or reverse engineering (i.e., scrutinizing an artifact made by someone else). We have shown that even in reverse-engineering conditions, participants can benefit from what previous participants have done to increase their understanding of a physical system. In other words, all these forms of transmission allow the learners to understand new physical relationships without waiting that these relationships randomly occur in the environment. There is a wide literature on social learning, which describes very well how knowledge can be transmitted without using explicit communication. In fact, it is very likely that such forms of transmission were already present in our ancestors, allowing them to start accumulating knowledge without using symbolic language. We did not add this information in the MS because we think that this was a little bit beyond the scope of the MS. Nevetheless, we cited relevant literature on the topic to help the reader find it if interested in the topic.

      “Yet, recent accounts have proposed that non-social cognitive skills such as causal understanding or technical reasoning might have played a crucial role in cumulative technological culture (6, 29, 66). Support for these accounts comes from micro-society experiments, which have demonstrated that the improvement of technology over generations is accompanied by an increase in its understanding (67, 68), or that learners’ technical-reasoning skills are a good predictor of cumulative performance in such micro-societies (33, 69).” (p. 19)

      What distinguishes this implicit mechanical knowledge from stored knowledge about object manipulation? Are these two conceptualizations really demonstrably (testably) different?

      We agree that it is complex to distinguish between these two hypotheses as suggested by Seidel et al. (2023) cited above (see Reviewer 3 Point 8). We have conducted several studies to test the opposite predictions derived from each hypothesis. The main distinction concerns the understanding of physical materials and forces, which is central to the technical-reasoning hypothesis but not to the manipulation-based approach. Indeed, sensorimotor programs about tool manipulation are not assumed to contain information about physical materials and forces. In the present study, the understanding of physical materials and forces was needed in the four tasks hypothesized as requiring technical reasoning, i.e., the mechanical problem-solving task, the psychotechnical task and the PHYS-Only and INT+PHYS conditions of the mentalizing task. We can illustrate this aspect with items of each of these tasks. Figure 1A is of the mechanical problem-solving task. 

      As explained in the MS, participants had memorized the five possible tools before the scanner session. Thus, for 4 seconds, they had to imagine which of these tools could be used to extract the target out from the box. We did so to incit them to reason about mechanical solutions based on the physical properties of the problem. Then, they had 3 seconds to select the tool with the appropriate shape, here the right one. In this case, the motor action remains the same (i.e., pulling). Another illustration can be given, with the psychotechnical task (Figure 1B).

      In this task, the participant had to reason as to whether the boat-tractor connection was better in the left picture or in the right picture. This needs to reason about physical forces, but there is no need to recruit sensorimotor programs about tool manipulation. Finally, a last example can be given with the PHYS-Only condition of the mentalizing task (but the logic is the same for the INT+PHYS condition except that the character’s intentions must also be taken into consideration) Figure 1D).

      Here the participant must reason about which picture shows what is physically possible. In this task, there is no need to recruit sensorimotor programs about tool manipulation. In sum, what is common between these three tasks is the requirement to reason about physical materials and forces. We do not ignore that motor actions could be simulated in the mechanical problemsolving task, but no motor action needed to be simulated in the other three tasks. Therefore, what was common between all these tasks was the potential involvement of technical reasoning but not of sensorimotor programs about tool manipulation. Of course, an alternative is to consider that motor actions are always needed in all the situations, including situations where no “manipulable tool” is presented, such as a tractor and a boat, a pulley, or a cannon. We cannot rule out this alternative, which is nevertheless, for us, prejudicial because it implies that it becomes difficult to test the manipulation-based approach as motor actions would be everywhere. We voluntarily decided not to introduce a debate between the reasoning-based approach and the manipulation-based approach and preferred a more positive writing by stressing the insights from the present study. Note that we stressed the merits of the manipulation-based approach in the introduction because we sincerely think that this approach has provided interesting insights. However, we voluntarily did not discuss the debate between the two approaches. Given Reviewer 3’s comment (see also Reviewer 1 Point 2), we understand and agree that some words must be nevertheless said to discuss how the manipulation-based approach could interpret our results, thus stressing the potential limitations of our interpretations. Therefore, we added a specific section in the discussion in which we discussed this aspect in more details.

      “The second limitation concerns the alternative interpretation that the left area PF is not central to technical reasoning but to the storage of sensorimotor programs about the prototypical manipulation of common tools. Here we show that the left area PF is recruited even in situations in which participants do not have to process common manipulable tools. For instance, some items of the psychotechnical task consisted of pictures of tractor, boat, pulley, or cannon. The fact that we found a common activation of the left area PF in such tasks as well as in the mechanical problem-solving task, in which participants could nevertheless simulate the motor actions of manipulating novel tools, indicates that this brain area is not central to tool manipulation but to physical understanding. That being said, some may suggest that viewing a boat or a cannon is enough to incite the simulation of motor actions, so our tasks were not equipped to distinguish between the manipulation-based approach and the reasoning-based approach. We have already shown that the left area PF is more involved in tasks that focus on the mechanical dimension of the tool-use action (e.g., the mechanical interaction between a tool and an object) than its motor dimension (i.e., the interaction between the tool and the effector [e.g., 24, 40]). Nevertheless, we recognize that future research is still needed to test the predictions derived from these two approaches.” (p. 18-19)

      Introduction and throughout: The framing of left Area PF as a special area for technical reasoning is overly reductionistic from a functional neuroanatomic perspective in that it ignores a large relevant literature showing that the region is involved with many other tasks that seem not to require anything like technical cognition. Indeed, entering the coordinates - 56, -29, 36 (reported as the peak coordinates in common across the studied tasks) in Neurosynth reveals that 59 imaging studies report activations within 3 mm of those coordinates; few are action-related (a brief review indicated studies of verbal creativity, texture processing, reading, somatosensory processing, stress reactions, attentional selection etc). Please acknowledge the difficulty of claiming that a large brain region should be labeled the brain's technical reasoning area when it seems to also participate in so much else. The left IPL (including area PF) is densely connected to the ventral premotor cortex, and this network is activated in language and calculation tasks as well as tool use tasks (e.g., Matsumoto, Nair, et al., 2012). What other constructs might be able to unite this disparate literature, and are any of these alternative constructs ruled out by the present data? Lacking this objective discussion, the manuscript does read as a promotion of the investigators' preferred viewpoint.

      We thank Reviewer 3 for this comment. As stressed in the initial version of the MS, we did not write that the left area PF is sufficient but central to the network that allows us to reason about the physical world. Regardless, we agree that an objective discussion was needed on this aspect to help the reader not misunderstand our purpose. We added a section in this aspect as suggested. 

      “Before concluding, we would like to point out two potential limitations of the present study. The first limitation concerns the fact that the literature has documented the recruitment of the left area PF in many neuroimaging experiments in which there was no need to reason about physical events (e.g., language tasks). This can be easily illustrated by entering the left area PF coordinates in the Neurosynth database.

      This finding could be enough to refute the idea that this brain area is specific to technical reasoning. Although this limitation deserves to be recognized, it is also true for many other findings. For instance, sensory or motor brain regions such as the precentral or the postcentral cortex have been found activated in many non-motor tasks, the visual word form area in non-language tasks, or the Heschl’s gyrus in nonmusical tasks. This remains a major challenge for scientists, the question being how to solve these inconsistencies that can result from statistical errors or stress that considerable effort is needed to understand the very functional nature of these brain areas. Thus, understanding that the left area PF is central to physical understanding can be viewed as a first essential step before discovering its fundamental function, as suggested by the functional polyhedral approach (56).” (p. 18)

      Discussion: The discussion of a small cluster in the IFG (pars opercularis) that nearly survived statistical correction is noteworthy in light of the above point. This further underscores the importance of discussing networks and not just single brain regions (such as area PF) when examining complex processes. The investigators note, "a plausible hypothesis is that the left IFG integrates the multiple constraints posed by the physical situation to set the ground for a correct reasoning process, such as it could be involved in syntactic language processing". In fact, the hypothesis that the IFG and SMG are together related to resolving competition has been previously proposed, as has the more specific hypothesis that the SMG buffers actions and that the context-appropriate action is then selected by the IFG (e.g., Buxbaum & Randerath, 2018). The parallels with the way the SMG is engaged with competing lexical or phonological alternatives (e.g., Peramunage, Blumstein et al., 2011) have also been previously noted.

      We added the Buxbaum and Randerath (2018)’s reference in this section.

      “The functional role of the left IFG in the context of tool use has been previously discussed (24) and a plausible hypothesis is that the left IFG integrates the multiple constraints posed by the physical situation to set the ground for a correct reasoning process, such as it could be involved in syntactic language processing (for a somewhat similar view, see [51]).” (p. 16-17)

      Introduction and Discussion: Please clarify how the technical reasoning network overlaps with or is distinct from the tool-use network reported by many previous investigators.

      We added a couple of sentences in the discussion to clarify this point.

      “It should be clear here that we do not advocate the localizationist position simply stating that activation in the left area PF is the necessary and sufficient condition for technical reasoning. We rather defend the view according to which it requires a network of interacting brain areas, one of them – and of major importance – being the left area PF. This allows the engagement of different configurations of cerebral areas in different technical-reasoning tasks, but with a central process acting as a stable component: The left area PF. Thus, when people intend to use physical tools, it can work in concert with brain regions specific to object manipulation and motor control, thereby forming another network, the tool-use network. It can also interact with brain regions specific to intentional gestures to form a “social-learning” network that allows people to enhance their understanding about the physical aspects of a technical task (e.g., the making of a tool) through communicative gestures such as pointing gestures (42). The major challenge for future research is to specify the nature of the cognitive process supported by the left area PF and that might be involved in the broad understanding of the physical world.” (p. 14)

      Discussion: All of the experimental tasks require a response from a difficult choice in an array, and all of the tasks except for the fluid cognition task are likely to require prediction or simulation of a motion trajectory-whether an embodied or disembodied trajectory is unclear. The Discussion does mention the related (but distinct) idea of an "intuitive physics engine", a "kind of simulator", Please clarify how this study can rule out these alternative interpretations of the data. If the study cannot rule out these alternatives, the claims of the study (and the paper title which labels PF as a technical cognition area) should be scaled back considerably. 

      We thank Reviewer 3 for this comment. The authors of the papers on intuitive physics engine or associative learning do not suggest that these processes are embodied. As discussed above, we clarified our perspective on the role of the left area PF and hope that these modifications help the reader better understand it. We warmly thank Reviewer 3 for their comments, which considerably helped us improve the MS.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Hüppe and colleagues had already developed an apparatus and an analytical approach to capture swimming activity rhythms in krill. In a previous manuscript they explained the system, and here they employ it to show a circadian clock, supplemented by exogenous light, produces an activity pattern consistent with "twilight" diel vertical migration (DVM; a peak at sunset, a midnight sink, and a peak in the latter half of the night).

      They used light:dark (LD) followed by dark:dark (DD) photoperiods at two times of the year to confirm the circadian clock, coupled with DD experiments at four times of year to show rhythmicity occurs throughout the year along with DVM in the wild population. The individual activity data show variability in the rhythmic response, which is expected. However, their results showed rhythmicity was sustained in DD throughout the year, although the amplitude decayed quickly. The interpretation of a weak clock is reasonable, and they provide a convincing justification for the adaptive nature of such a clock in a species that has a wide distributional range and experiences various photic environments. These data also show that exogenous light increases the activity response and can explain the morning activity bouts, with the circadian clock explaining the evening and late-night bouts. This acknowledgement that vertical migration can be driven by multiple proximate mechanisms is important.

      The work is rigorously done, and the interpretations are sound. I see no major weaknesses in the manuscript. Because a considerable amount of processing is required to extract and interpret the rhythmic signals (see Methods and previous AMAZE paper), it is informative to have the individual activity plots of krill as a gut check on the group data.

      The manuscript will be useful to the field as it provides an elegant example of looking for biological rhythms in a marine planktonic organism and disentangling the exogenous response from the endogenous one. Furthermore, as high latitude environments change, understanding how important organisms like krill have the potential to respond will become increasingly important. This work provides a solid behavioral dataset to complement the earlier molecular data suggestive of a circadian clock in this species.

      We appreciate the positive evaluation of our work by Reviewer 1, acknowledging our approach to record locomotor activity in krill and the importance of the findings in assessing krill’s potential to respond to environmental change in their habitat.

      Reviewer #2 (Public review):

      Summary:

      This manuscript provides experimental evidence on circadian behavioural cycles in Antarctic krill. The krill were obtained directly from krill fishing vessels and the experiments were carried out on board using an advanced incubation device capable of recording activity levels over a number of days. A number of different experiments were carried out where krill were first exposed to simulated light:dark (L:D) regimes for some days followed by continuous darkness (DD). These were carried out on krill collected during late autumn and late summer. A further set of experiments was performed on krill across three different seasons (summer, autumn, winter), where incubations were all DD conditions. Activity was measured as the frequency by which an infrared beam close to the top of the incubation tube was broken over unit time. Results showed that patterns of increased and decreased activity that appeared synchronised to the LD cycle persisted during the DD period. This was interpreted as evidence of the operation of an internal (endogenous) clock. The amplitude of the behavioural cycles decreased with time in DD, which further suggests that this clock is relatively weak. The authors argued that the existence of a weak endogenous clock is an adaptation to life at high latitudes since allowing the clock to be modulated by external (exogenous) factors is an advantage when there is a high degree of seasonality. This hypothesis is further supported by seasonal DD experiments which showed that the periodicity of high and low activity levels differed between seasons.

      Strengths

      Although there has been a lot of field observations of various circadian type behaviour in Antarctic krill, relatively few experimental studies have been published considering this behaviour in terms of circadian patterns of activity. Krill are not a model organism and obtaining them and incubating them in suitable conditions are both difficult undertakings. Furthermore, there is a need to consider what their natural circadian rhythms are without the overinfluence of laboratory-induced artefacts. For this reason alone, the setup of the present study is ideal to consider this aspect of krill biology. Furthermore, the equipment developed for measuring levels of activity is well-designed and likely to minimise artefacts.

      We would like to thank Reviewer 2 for their positive assessment of our approach to study the influence of the circadian clock on krill behavior. We are delighted, that Reviewer 2 found our mechanistic approach in understanding daily behavioral patterns of Antarctic krill using the AMAZE set-up convincing, and that the challenging circumstances of working with a polar, non-model species are acknowledged.

      Weaknesses

      I have little criticism of the rationale for carrying out this work, nor of the experimental design. Nevertheless, the manuscript would benefit from a clearer explanation of the experimental design, particularly aimed at readers not familiar with research into circadian rhythms. Furthermore, I have a more fundamental question about the relationship between levels of activity and DVM on which I will expand below. Finally, it was unclear how the observational results made here related to the molecular aspects considered in the Discussion.

      (1) Explanation of experimental design - I acknowledge that the format of this particular journal insists that the Results are the first section that follows the Introduction. This nevertheless presents a problem for the reader since many of the concepts and terms that would generally be in the Methods are yet to be explained to the reader. Hence, right from the start of the Results section, the reader is thrown into the detail of what happened during the LD-DD experiments without being fully aware of why this type of experiment was carried out in the first place. Even after reading the Methods, further explanation would have been helpful. Circadian cycle type research of this sort often entrains organisms to certain light cycles and then takes the light away to see if the cycle continues in complete darkness, but this critical piece of knowledge does not come until much later (e.g. lines 369-372) leaving the reader guessing until this point why the authors took the approach they did. I would suggest the following (1) that more effort is made in the Introduction to explain the exact LD/DD protocols adopted (2) that a schematic figure is placed early on in the manuscript where the protocol is explained including some logical flow charts of e.g. if behavioural cycle continues in DD then internal clock exists versus if cycle does not continue in DD, the exogenous cues dominate - followed by - major decrease in cyclic amplitude = weak clock versus minor decrease = strong clock and so on

      We want to thank Reviewer 2 for pointing out that the experimental design and its rationale are not becoming clear early in the manuscript, especially for people outside the field of chronobiology. We added a new figure (now Fig. 1), illustrating the basic principle of chronobiological study design and how we adopted it. We also extended the description at the beginning of the Results section to clarify the rationale behind the experimental design.

      (2) Activity vs kinesis - in this study, we are shown data that (i) krill have a circadian cycle - incubation experiments; (ii) that krill swarms display DVM in this region - echosounder data (although see my later point). My question here is regarding the relationship between what is being measured by the incubation experiments and the in situ swarm behaviour observations. The incubation experiments are essentially measuring the propensity of krill to swim upwards since it logs the number of times an individual (or group) break a beam towards the top of the incubation tube. I argue that krill may be still highly active in the rest of the tube but just do not swim close to the surface, so this approach may not be a good measure of "activity". Otherwise, I suggest a more correct term of what is being measured is the level of "upward kinesis". As the authors themselves note, krill are negatively buoyant and must always be active to remain pelagic. What changes over the day-night cycle is whether they decide to expend that activity on swimming upwards, downwards or remaining at the same depth. Explaining the pattern as upward kinesis then also explains by swarms move upwards during the night. Just being more active at night may not necessarily result in them swimming upwards.

      We believe there is a slight misunderstanding in how what we call “activity” is measured. The experimental columns are equipped with five detector modules, evenly distributed over the height of the column. In our analysis we count all beam breaks caused by upward movement, i.e. every time a detector module is triggered after a detector module at a lower position has been triggered, and not only when the top detector module is triggered. In this way, we record upward swimming movements throughout the column, and not only when the krill swims all the way to the top of the column. This still means that what we are measuring is swimming activity, caused by upward swimming. We use this measure, to deliberately separate increased swimming activity, from baseline activity (i.e. swimming, which solely compensates for negative buoyancy) and inactivity (i.e. passive sinking).

      Higher activity is thus at first interpreted as an increase in swimming activity, which in the field may result in upwards-directed swimming but also could mean a horizontal increase in activity, for example, representing increased foraging and feeding activity. This would explain the daily activity pattern observed under LD cycles (now Fig. 3), which shows a general increase in activity during the dark phase. This nighttime increase could be used for both upward directed migration during sunset and horizontal directed swimming for feeding and foraging throughout the night.

      We added the following sentence to the description of the activity metric in the Methods section to clarify this point (lines 465-469):

      “To accomplish this, we organized the raw beam break data from all five detector modules in each experimental column in chronological order. We selected only those beam break detections that occurred after a detection in the detector module positioned lower on the column. Like this, we consider upward swimming movements throughout the full height of the column.”

      (3) Molecular relevance - Although I am interested in molecular clock aspects behind these circadian rhythms, it was not made clear how the results of the present study allow any further insight into this. In lines 282 to 284, the findings of the study by Biscontin et al (2017) are discussed with regard to how TIM protein is degraded by light via the clock photreceptor CRYTOCHROME 1. This element of the Discussion would be a lot more relevant if the results of the present study were considered in terms of whether they supported or refuted this or any other molecular clock model. As it stands, this paragraph is purely background knowledge and a candidate for deletion in the interest of shortening the Discussion.

      We agree that this part is not directly related to the data presented in the manuscript. We, therefore, omitted this part in the revised version of the manuscript to keep the discussion concise and focused on the results.

      Other aspects

      (i) 'Bimodal swimming' was used in the Abstract and later in the text without the term being fully explained. I could interpret it to mean a number of things so some explanation is required before the term is introduced.

      We thank the Reviewer for pointing this out. We provided an explanation for the term “bimodal” in the Results section, where the two clock driven activity bouts are described first, by extending the sentence in lines 161-164, which now reads:

      “This suggests that the circadian clock drives a distinct bimodal activity pattern with two activity peaks in one day, i.e. the evening and late-night activity bouts, while. In contrast, the morning activity bout is triggered by the onset of illumination in the experimental set-up.”.

      (ii) Midnight sinking - I was struck by Figure 2b with regards to the dip in activity after the initial ascent, as well as the rise in activity predawn. Cushing (1951) Biol Rev 26: 158-192 describes the different phases of a DVM common to a number of marine organisms observed in situ where there is a period of midnight sinking following the initial dusk ascent and a dawn rise prior to dawn descent. Tarling et al (2002) observe midnight sinking pattern in Calanus finmarchicus and consider whether it is a response to feeding satiation or predation avoidance (i.e. exogenous factors). Evidence from the present study indicates that midnight sinking (and potential dawn rise) behaviour could alternatively be under endogenous control to a greater or lesser degree. This is something that should certainly be mentioned in the Discussion, possibly in place of the molecular discussion element mentioned above - possibly adding to the paragraph Lines 303-319.

      We would like to thank the Reviewer for pointing this out and agree that adding the idea of an endogenous control of midnight sinking would be interesting to the discussion. We added the following section to the Discussion (lines 335-343):

      “Interestingly, the decrease in clock-controlled swimming activity during the early night, right after the evening activity bout, may further facilitate a phenomenon called “midnight sinking”, which describes the sinking of animals to intermediate depths after the evening ascent, followed by a second rise to the surface before the morning descend. This behavior has been observed in a number of zooplankton species, including calanoid copepods (see 69, 70 and references therein) and krill (71). While previous studies suggested several exogenous factors, such as satiation or predator presence, as drivers of the midnight sink (69, 70), our study suggests that this pattern may be partly under endogenous control.”

      (iii) Lines 200-207 - I struggled to follow this argument regarding Piccolin et al identifying a 12 h rhythm whereas the present study indicates a ~24 h rhythm. Is one contradicting the other - please make this clear.

      In our study, we found that the circadian clock drives a bimodal pattern of swimming activity in krill, meaning it controls two bouts of activity in a 24-hour cycle. Piccolin et al. (2020) identified a swimming activity pattern of ~12 h (i.e. two peaks in 24 h) at the group level, which aligns with our findings at the individual level. We revised the Section in the discussion for more clarity, which now reads:

      “Data from Piccolin et al. (20) showed a strong damping of the amplitude and indication of a remarkably short (~12 h) free running period (FRP) of vertical swimming behavior of a group of krill under constant darkness (20). The short period found in Piccolin et al. (20) complements is in line with our findings of a bimodal activity pattern the pattern of swimming activity under DD conditions on the individual level found in the present study, suggesting that the ~12 h rhythm in group swimming behavior in Piccolin et al. (20) could have resulted from a bimodal activity pattern at the individual level, as found in our study.” (lines 212-219).  

      (iv) Although I agree that the hydroacoustic data should be included and is generally supportive of the results, I think that two further aspects should be made clear for context (a) whether there was any groundtruthing that the acoustic marks were indeed krill and not potentially some other group know to perform DVM such as myctophids (b) how representative were these patterns - I have a sense that they were heavily selected to show only ones with prominent DVM as opposed to other parts of the dataset where such a pattern was less clear - I am aware of a lot of krill research where DVM is not such a clear pattern and it is disingenuous to provide these patterns as the definitive way in which krill behaves. I ask this be made clear to the reader (note also that there is a suggestion of midnight sinking in Fig 5b on 28/2).

      To clarify the mentioned points concerning the hydroacoustic data:

      a) As mentioned in the Methods section, only hydroacoustic data during active fishing was included in the analysis. E. superba occurs in large monospecific aggregations, and the fishery actively targets E. superba and monitors their catch and the proportion of non-target species continuously with cameras. Krill fishery bycatch rates are very low (0.1–0.3%, Krafft et al. 2022), and fishing operations would stop if non-target species were caught in significant proportions at any time. Therefore, and supported by our own observations when we conducted the experiments, we argue that it is a valid assumption that E. superba predominantly causes the backscattering signal shown in Figure 5 (now Fig. 6).

      b) We are aware of the fact that DVM patterns of Antarctic krill are highly variable and that normal DVM patterns do not need to be the rule (e.g. see our cited study on the plasticity of krill DVM by Bahlburg et al. 2023). The visualized data were not selected for their DVM pattern but represent the period directly preceding the sampling for behavioral experiments in four seasons (experiment 2), including the day of sampling. These periods were chosen to assess the DVM behavior of krill swarms in the field in the days before and during the sampling for behavioral experiments.

      To improve understanding, we modified the description in the Results, Discussion, and Methods sections, as well as the caption of Figure 5 (now Fig. 6), which now read:

      “To investigate whether krill swarms exhibited daily behavioral patterns in swimming behavior in the field before they were sampled for seasonal experiments, hydroacoustic data were recorded from the fishing vessel, continuously over a three-day period prior to sampling for the seasonal experiments described above…” (lines 191-194).

      “Furthermore, hydroacoustic recordings demonstrate that most krill swarms sampled exhibited synchronized DVM in the field in the days directly before sampling for behavioral experiments, indicating that in this region, krill remain behaviorally synchronized across a wide range of photoperiods.” (lines 397-400).

      “Hydroacoustic data were collected using a hull-mounted SIMRAD ES80 echosounder (Kongsberg Maritime AS) aboard the Antarctic Endurance, covering three days before the sampling for each of the seasonal behavioral experiments of experiment 2” (lines 512-515).

      “We only included data during active fishing periods and the vessel is specifically targeting E. superba, which occurs in large monospecific aggregations. Further, krill fishery bycatch rates are very low (0.1-0.3%, 84), which makes it highly probable that the recorded signal represents krill swarms.” (lines 523-526).

      “Hydroacoustic recordings showing the vertical distribution of krill swarms in the upper water column (<220 m) below the vessel, visualized by the mean volume backscattering signal (200 kHz), on the three days prior to krill sampling for experiments…” (lines 802-804).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      As noted in the public review, this is a logical and well-written manuscript. I have very few comments to consider addressing.

      The Results lead with a paragraph outlining the experimental approach. This is good, but you use the term "experiments" to refer to both the two sets, and the two or four subsets of experiments. Perhaps consider the subset experiments as "treatments"? I understood what you meant, but it took a few read-throughs to be sure I got it.

      We thank the reviewer for pointing this out and changed the nomenclature of the experiments throughout the manuscript. We now refer to the two sets of experiments as experiment 1 and 2, to the subsets of experiment 1 as “short day treatment” and “long day treatment”, and to the subsets of experiment 2 as summer treatment, late summer treatment, autumn treatment, and winter treatment. We also believe that the new Figure 1 is now helping to follow the experimental design more efficiently.

      Ln 140: "...off and decrease at lights-on."

      We adjusted the sentence accordingly.

      Ln 244: Can you define "extreme photic conditions"? I get what you mean, but to be clear to the reader this would help.

      We adjusted the sentence, which now reads:

      “This could confer a significant adaptive advantage to species inhabiting environments characterized by extreme photic conditions (53, 54, 60), such as phases of polar night or midnight sun as well as rapid changes in daylength, or species that rely on precise photoperiodic time measurement for accurate seasonal adaptation.” (lines 258-261).

      Figures: Consider adding an LSP for groups in Fig 1. Also, it would be useful to have LSP period estimates for each individual tested. This could be a separate table, or it could be added to the individual activity plots. Should S3 and S4 be reversed?

      We thank the reviewer for their suggestion and added an LSP as figure 1d (now Fig. 2d) to statistically support the group activity shown in Figure 1c (now Fig. 2c) as suggested. We added the individual animals' LSP period estimates to supplementary figures S2, S7, S8, S9, and S10. We also reversed Figures S3 and S4 to match the appearance in the main text. 

      Fig 5: are the light regime bars for b and c correct? They look similar, but there are only 15 days apart, so perhaps they are correct as is.

      We double checked the light regime bars in Fig. 5b and c (now 6b and c) and they are correct as is.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Kaya et al. studies the effect of food consumption on hippocampal sharp wave ripples (SWRs) in mice. The authors use multiple foods and forms of food delivery to show that the frequency and power of SWRs increase following food intake, and that this effect depends on the caloric content of food. The authors also studied the effects of the administration of various food-intake-related hormones on SWRs during sleep, demonstrating that ghrelin negatively affects SWR rate and power, but not GLP1, insulin, or leptin. Finally, the authors use fiber photometry to show that GABAergic neurons in the lateral hypothalamus, increase activity during a SWR event.

      Strengths:

      The experiments in this study seem to be well performed, and the data are well presented, visually. The data support the main conclusions of the manuscript that food intake enhances hippocampal SWRs. Taken together, this study is likely to be impactful to the study of the impact of feeding on sleep behavior, as well as the phenomena of hippocampal SWRs in metabolism.

      Weaknesses:

      Details of experiments are missing in the text and figure legends. Additionally, the writing of the manuscript could be improved.

      We thank the reviewer for their favorable assessment of the work and its potential impact. We have added all requested details in the text and figure legends and revised the wording of the manuscript to improve its clarity.

      Reviewer #2 (Public review):

      Summary:

      Kaya et al uncover an intriguing relationship between hippocampal sharp wave-ripple production and peripheral hormone exposure, food intake, and lateral hypothalamic function. These findings significantly expand our understanding of hippocampal function beyond mnemonic processes and point a direction for promising future research.

      Strengths:

      Some of the relationships observed in this paper are highly significant. In particular, the inverse relationship between GLP1/Leptin and Insulin/Ghrelin are particularly compelling as this aligns well with opposing hormone functions on satiety.

      Weaknesses:

      I would be curious if there were any measurable behavioral differences that occur with different hormone manipulations.

      We thank the reviewer for their favorable assessment of the work and its contribution to our understanding of non-mnemonic hippocampal function. Whether there are behavioral differences that occur following administration of the different hormones is a great question, yet unfortunately our study design did not include fine behavioral monitoring to the degree that would allow answering it. While some previous studies have partially addressed the behavioral consequences of the delivery of these hormones (and we reference these studies in our Discussion), how these changes may interact with the hippocampal and hypothalamic effects we observe is a very interesting next step.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Kaya et al. explores the effects of feeding on sharp wave-ripples (SWRs) in the hippocampus, which could reveal a better understanding of how metabolism is regulated by neural processes. Expanding on prior work that showed that SWRs trigger a decrease in peripheral glucose levels, the authors further tested the relationship between SWRs and meal consumption by recording LFPs from the dorsal CA1 region of the hippocampus before and after meal consumption. They found an increase in SWR magnitude during sleep after food intake, in both food restricted and ad libitum fed conditions. Using fiber photometry to detect GABAergic neuron activity in the lateral hypothalamus, they found increased activity locked to the onset of SWRs. They conclude that the animal's satiety state modulates the amplitude and rate of SWRs, and that SWRs modulate downstream circuits involved in regulating feeding. These experiments provide an important step forward in understanding how metabolism is regulated in the brain. However, currently, the paper lacks sufficient analyses to control for factors related to sleep quality and duration; adding these analyses would further support the claim that food intake itself, as opposed to sleep quality, is primarily responsible for changes in SWR activity. Adding this, along with some minor clarifications and edits, would lead to a compelling case for SWRs being modulated by a satiety state. The study will likely be of great interest in the field of learning and memory while carrying broader implications for understanding brain-body physiology.

      Strengths:

      The paper makes an innovative foray into the emerging field of brain-body research, asking how sharp wave-ripples are affected by metabolism and hunger. The authors use a variety of advanced techniques including LFP recordings and fiber photometry to answer this question. Additionally, they perform comprehensive and logical follow-up experiments to the initial food-restricted paradigm to account for deeper sleep following meal times and the difference between consumption of calories versus the experience of eating. These experiments lay the groundwork for future studies in this field, as the authors pose several follow-up questions regarding the role of metabolic hormones and downstream brain regions.

      We thank the reviewer for their appreciation and constructive review of the work.

      Weaknesses:

      Major comments:

      (1) The authors conclude that food intake regulates SWR power during sleep beyond the effect of food intake on sleep quality. Specifically, they made an attempt to control for the confounding effect of delta power on SWRs through a mediation analysis. However, a similar analysis is not presented for SWR rate. Moreover, this does not seem to be a sufficient control. One alternative way to address this confound would be to subsample the sleep data from the ad lib and food restricted conditions (or high calorie and low calorie, etc), to match the delta power in each condition. When periods of similar mean delta power (i.e. similar sleep quality) are matched between datasets, the authors can then determine if a significant effect on SWR amplitude and rate remains in the subsampled data.

      This is an important point that we believe we addressed in a few complementary ways. First, the mediation analysis we implemented measures the magnitude and significance of the contribution of food on SWR power after accounting for the effects of delta power, showing a highly significant food-SWR contribution. While the objective of subsampling is similar, mediation is a more statistically robust approach as it models the relationship between food, SWR power, and delta power in a way that explicitly accounts for the interdependence of these variables. Further, subsampling introduces the risk of losing statistical power by reducing the sample size, due to exclusion of data that might contain relevant and valuable information. Mediation analysis, on the other hand, uses the full dataset and retains statistical power while modeling the relationships between variables more holistically. However, as we were not satisfied with a purely analytical approach to test this issue, we carried out a new set of experiments in ad-libitum fed mice, where there is no concern of food restriction impairing sleep quality in the presleep session. In these conditions food amount also significantly correlated with, and showed significant mediation of, the SWR power change. Finally, we acknowledge and discuss this point in the Discussion, highlighting that given the known relationship between cortical delta and SWRs, it is challenging to fully disentangle these signals. 

      (2) Relatedly, are the animals spending the same amount of time sleeping in the ad lib vs. food restricted conditions? The amount of time spent sleeping could affect the probability of entering certain stages of sleep and thus affect SWR properties. A recent paper (Giri et al., Nature, 2024) demonstrated that sleep deprivation can alter the magnitude and frequency of SWRs. Could the authors quantify sleep quantity and control for the amount of time spent sleeping by subsampling the data, similar to the suggestion above?

      Following the reviewer’s comment, we have quantified and compared the amount of time spent in NREM sleep in the Pre and Post session pairs in which the animals were food restricted, with 0-1.5 g of chow given between the sleep sessions. We found that there was no significant difference in the amount of time spent in NREM sleep in the Pre and Post sessions. We have added this result to the Results section of the manuscript and as a new Supplementary Fig. 1. 

      Additionally, we have added details to the Methods section that were missing in the original submission that are relevant to this point. Specifically, within the sleep sessions, the ongoing sleep states were scored using the AccuSleep toolbox (https://github.com/zekebarger/AccuSleep) using the EEG and EMG signals. NREM periods were detected based on high EEG delta power and low EMG power, REM periods were detected based on high EEG theta power and low EMG power, and Wake periods were detected based on high EMG power. Importantly, only NREM periods were included for subsequent SWR detection, quantification and analyses (in particular, reported SWR rates reflect the number of SWRs per second of NREM sleep). 

      (3) Plot 5I only reports significance but does not clearly show the underlying quantification of LH GABAergic activity. Upon reading the methods for how this analysis was conducted, it would be informative to see a plot of the pre-SWR and post-SWR integral values used for the paired t-test whose p-values are currently shown. For example, these values could be displayed as individual points overlaid on a pair of boxand-whisker plots of the pre- and post-distribution within the session (perhaps for one example session per mouse with the p-value reported, to supplement a plot of the distribution of p-values across sessions and mice). If these data are non-normal, the authors should also use a non-parametric statistical test.

      We have generated the summary plots the reviewer requested and have now included them in Supplementary Fig. 2. 

      Minor comments:

      (4) A brief explanation (perhaps in the discussion) of what each change in SWR property (magnitude, rate, duration) could indicate in the context of the hypothesis may be helpful in bridging the fields of metabolism and memory. For example, by describing the hypothesized mechanistic consequence of each change, could the authors speculate on why ripple rate may not increase in all the instances where ripple power increases after feeding? Why do the authors speculate that ripple duration does not increase, given that prior work (Fernandez-Ruiz et al. 2019) has shown that prolonged ripples support enhanced memory?

      This is an interesting point and we have added a section to the Discussion to discuss it (pg. 17, last paragraph)

      (5) The authors suggest that "SWRs could modulate peripheral metabolism" as a future implication of their work. However, the lack of clear effects from GLP-1, leptin and insulin complicates this interpretation. It might be informative for readers if the authors expanded their discussion of what specific role they speculate that SWRs could play in regulating metabolism, given these negative results.

      We have added a section to the Discussion proposing potential reasons for this point (pg. 16, last paragraph)

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):

      Major Comments:

      (1) The experiments involve very precise windows of time for sleeping and eating that seem impossible to control. For example, the authors state that for the experiments in Figure 1, there was a 2-h sleep period, followed by a 1-h feeding period, followed by another 2-h sleep period. Without sleep deprivation procedures or other environmental manipulations, how can these periods be so well-defined? Even during the inactive period, mice typically don't sleep for 2-h bouts at once, and the addition of food would not likely lead to an exact 1-h period of wakefulness in the middle. The validity of these experimental times would be more believable if the authors provided much more data on these sessions. For example, the authors could provide a table or visual display of data for the actual timing of the pre-sleep, eating, and post-sleep phases with exact time measurements and/or visual display of sleep versus wakefulness.

      This is an important point, which we were not clear enough about in the original submission. While the durations of the Pre-sleep, Wake and Post-sleep sessions were indeed 2 h, 1 h and 2 h respectively, the animals did not actually sleep during the entirety of the sleep sessions. Importantly, we performed sleep state scoring on all sessions, and only analyzed identified NREM sleep for all SWR analyses. Following the reviewer’s comment (and that of Reviewer 1), we have quantified and compared the amount of time spent in NREM sleep in the Pre and Post session pairs in which the animals were food restricted and 0-1.5 g of chow were given between the sleep sessions. We found that there was no significant difference in the amount of time spent in NREM sleep in the Pre and Post sessions. We have added this result to the Results section of the manuscript and as a new Supplementary Fig. 1. 

      Additionally, we have added details to the Methods section that were missing in the original submission that are relevant to this point. Specifically, within the sleep sessions, the ongoing sleep states were scored using the AccuSleep toolbox (https://github.com/zekebarger/AccuSleep) using the EEG and EMG signals. NREM periods were detected based on high EEG delta power and low EMG power, REM periods were detected based on high EEG theta power and low EMG power, and Wake periods were detected based on high EMG power. Importantly, only NREM periods were included for subsequent SWR detection, quantification and analyses (in particular, reported SWR rates reflect the number of SWRs per second of NREM sleep). 

      (2) I may have missed this (although I tried searching in the text and figure legend), but the authors did not state the difference between green versus red bar colors in Figure 1 C-E. For Figures 1 F-J, do the individual dots represent both the test (fed) animals and control animals, or just the test animals?

      We thank the reviewer for the opportunity to clarify these points. Red bars in Fig. 1C-E represent the SWR changes observed following delivery of equal or more than 0.5 g of chow, while the green bars represent the changes observed following delivery of less than 0.5 g. Fig. 1F-J includes both the experimental and control animals- the control animals appearing as having received 0 food amount. This information has now been added to the figure legend.

      (3) For the jello experiments in Figure 3, was there only 1 trial per animal? Previous studies show that animals learn the caloric value of jello after subsequent trials, so whether or not multiple trials took place in each animal is important for interpretation of the results.

      In Figure 3, the datapoints within each panel represent different animals and this information has now been added to the figure legend. Nevertheless, the animals were previously habituated to all foods, including regular jello, sugar-free jello and chocolate. While we consider it unlikely that this prior experience was sufficient to underlie the differential effects on SWRs, we cannot fully rule out the possibility that it provided some ability to predict the caloric value and consequences of the different foods. We have added details to the acknowledgement of this point in the Discussion (pg. 17, second paragraph).

      (4) The experiments in Figure 5 are informative but don't relate to the experiments in the rest of the study. It is difficult to interpret their meaning given that these experiments take place over seconds while the other experiments take place over hours. Some attempt should be made to bridge these experiments over the timescales relevant for the behaviors studied in Figures 1-4.

      We have now further acknowledged and discussed the point that our investigation is limited to the timescale of seconds around SWRs, and thus identified a potential communication channel, but whether and how this communication changes across hours following feeding remains for future studies (pg. 18, second paragraph).

      (5) Figure 5B should depict the x-axis in seconds, not an arbitrary set of times from a recording.

      We have replaced these with a time scale bar.

      Minor Comments:

      (6) The writing of the manuscript can be improved in many places:

      Sometimes the writing could be more precise. For example, the Abstract states: "hippocampal sharp wave ripples (SWRs)... have been shown to influence peripheral glucose metabolism." Could this be written in a more informative way, rather than just staying "has been shown to influence?" A few more words would provide a lot more information. Similarly, at the end of the Introduction: "we set out to test the hypothesis that SWRs are modulated following meal times as part of the systems-level response to changing metabolic needs." This is not a strong hypothesis... could it be written to boldly state how the SWRs will be modulated (increase or decrease) and provide more assertive information?

      The writing can be grandiose at times. Phrases such as "life is a continuous journey" or "the hypothalamus is a master regulator of homeostasis" are a bit sophomoric and too colloquial.

      Finally, a representative recording should be referred to as just that-a "representative recording," as opposed to a "snippet," which is also colloquial. This word is used in the figure legends to Figures 1 and 5, and misspelled as "sinpper" in Figure 1

      We have reworded all these sentences and phrases to make them clearer, more concrete and more formal.

      (7) The methods state that the study used both male and female mice. Were they used in equal numbers across experiments?

      Only one female was used in the final dataset, and we have corrected the wording accordingly.

      Reviewer #2 (Recommendations for the authors):

      Great paper!

      Thanks!

      Reviewer #3 (Recommendations for the authors):

      Below are some minor requests for clarification, including in figures:

      (1) Fig. 5H y-axis should say "normalized dF/F."

      Done

      (2) Fig. 1B is missing a y-axis label. It may be clearer to display separate y-axis scale bars for each component (SWR envelope, ripple-filtered amplitude, etc).

      Done

      (3) Please include labels for brain areas and methodological components in Fig. 5A.

      Done

      (4) Should Fig. 5B have the same y-axis or scale bars as 1B?

      We have edited the figure labels and legends to be visually similar

      (5) In Fig. 5J, is the y-axis a count of sessions?

      Yes, we have added that to the y-axis label

      (6) Could the authors please clarify whether the sugar-free jello was sweetened with an artificial sweetener? If so, this is a robust control for the rewarding nature of the two jellos, so a quick clarification would highlight this strength of the experiment.

      We thank the reviewer for this great point. Indeed, the sugar free jello contained artificial sweeteners (Aspartame and Acesulfame Potassium). We have added this information to the Results and Methods.

      (7) It appears in Fig. 5 that there may be a reliable dip in activity **at** the time of SWR onset, followed by the increase afterward, as shown in the example FP trace and the individual ripple-triggered traces. Is this indeed the case, and does this dip fall significantly below baseline? This characterization would be interesting, but I acknowledge is not necessarily crucial to the study to include.

      This would indeed be an interesting finding, but upon examination and statistical testing, we found that this is not the case. We believe this may appear as such due to the normalization of the traces.

      (8) The authors mention a reduction in ripple rate following insulin under food restriction as the only significant effect for insulin, GLP-1, and leptin, yet there was also a significant increase (at p<0.05) in ripple duration for GLP-1 in the ab lib condition. Is this not considered noteworthy?

      This is a fair point and we have reworded the description of this result to simply state that there were no robust, consistent, dose-dependent effects of GLP-1, leptin and insulin on SWR attributes.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      This study presents evidence that a special group of place cells, those tuned to fast-gamma oscillations, play a key role in theta sequence development. How theta sequences are formed and developed during experience is an important question, because these sequences have been implicated in several cognitive functions of place cells, including memory-guided spatial navigation. The revised version of this paper has been significantly improved. Major concerns in the previous round of review on technical and conceptual aspects of the relationship between gamma oscillations and theta sequences are addressed. The main conclusion is supported by the data presented.

      Reviewer #2 (Public review):

      The authors have conducted new analysis to address the issues I and the other reviewers raised in our original revision. As a result, the revised manuscript has been substantially improved.

      We thank the two reviewers for their positive comments.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      There are, however, still a few remaining issues that need further clarification.

      - Despite the authors explanation and comparison with Kitanishi et al., 2015, Neuron, I still find that the reduced number of significantly gamma phase-locked cells is at odds with most previous reports (e.g., Csicvari et al., 2003; Colgin et al., 2009; Belluscio et al., 2012; Schomburg et al., 2014; Cabral et al., 2014; Fernandez-Ruiz et al., 2017; Lopes dos Santos et al., 2018). There can be several issues to explain this difference, like the choice of LFP reference channel. The authors should at least acknowledge this difference in the text.

      We thank the reviewer for this suggestion.  We discussed the potential reasons causing the different proportion of gamma phase locked cells in the Discussion (lines367-380).

      - The new Figure R2 is very useful and should be included in the manuscript. It would be even better to expand the frequency range to higher frequencies to show where the maximum peak is. Still, the potential contribution of spike leakage should be acknowledged. While I agree that it will not account for all fast gamma spike modulation, it is certainly a contributing factor. A further evidence of this is that the coupling strength seems to keep increasing towards supra gamma frequency range in Fig R2. This is to be expected given that the authors have used the local LFP from the same tetrode where cells were recorded, which is never a good practice.

      We thank the reviewer for this suggestion. Now the Fig R2 has been moved to the manuscript as a part of Figure 2-figure supplement 2 (lines133-135). In terms of the contribution of spike leakage by using the local LFP, we also detected FG-cells by using LFP from a different tetrode, i.e. the central one of the bundle that located in the cell body layer, and found approximate proportion of FG-cells which phase locked to ~75Hz (Fig R3, now the Figure 2-figure supplement 2C-F). Thus, we think using the local LFP would not affect the main conclusion and we decide to keep the original results. We also acknowledged the potential contribution of spike leakage in the Discussion (lines 372-377).

      - From the authors answer I understand that recordings were almost exclusively conducted from the deep CA1 pyramidal layer. This would preclude any meaningful interpretation of the deep/ superficial differences in the distribution of FG and NFG cells. This is not a crucial point for the paper but needs to be acknowledged.

      We thank the reviewer for this suggestion.  We acknowledged the meaningful interpretation of the deep/ superficial differences in the distribution of FG- and NFG-cells in the Discussion (lines 380-386).

      - I am afraid that the authors interpreted my comment about authorship in the opposite way that I intended. I meant that the usual practice is that the last author of the manuscript is the person who has been the main intellectual driver of the work, not the most senior one necessarily. I guess that is Dr. Zheng not Dr. Ming. However, I leave this decision to the discretion of the authors.

      We thank the reviewer for this rigorous consideration.  Dr. Ming and Dr. Zheng were both the main intellectual drivers of this work.  Therefore, we decide to keep the current authors in the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public reviews:

      Reviewer #1 (Public review):

      Summary:

      This very interesting manuscript proposes a general mechanism for how activating signaling proteins respond to species-specific signals arising from a variety of stresses. In brief, the authors propose that the activating signal alters the structure by a universal allosteric mechanism.

      Strengths:

      The unitary mechanism proposed is appealing and testable. They propose that the allosteric module consists of crossed alpha-helical linkers with similar architecture and that their attached regulatory domains connect to phosphatases or other molecules through coiled-coli domains, such that the signal is transduced via rigidifying the alpha helices, permitting downstream enzymatic activity. The authors present genetic and structural prediction data in favor of the model for the system they are studying, and stronger structural data in other systems.

      Weaknesses:

      The evidence is indirect - targeted mutations, structural predictions, and biochemical data. Therefore, these important generalizable conclusions are not buttressed by impeccable data, which would require doing actual structures in B. subtilis, confirming experiments in other organisms, and possibly co-evolutionary coupling. In the absence of such data, it is not possible to rule out variant models.

      We thank the reviewer for their feedback. A challenge of studying flexible proteins is that it is often not possible to directly obtain high resolution structural data. For the case of B. subtilis RsbU, the independent experimental approaches we applied (including two unbiased genetic screens, targeted mutagenesis, SAXS, enzymology, and structure prediction, which includes evolutionary coupling) converged upon a model for activation, which we feel is well supported. Frustratingly, our attempts at determining high resolution experimental structures have been unsuccessful, which we think is due to the flexibility of the proteins revealed by our SAXS experiments. For example, we collected X-ray diffraction data from crystals of a fragment of B. subtilis RsbU containing the N-terminal domain and linker in which the linker was almost entirely disordered in the maps. We agree that doing experiments in other organisms would be valuable next steps to test the hypothesis that this coiled-coil based transduction mechanism is conserved across species, and will modify the text to differentiate this more speculative section of the manuscript.

      We have modified the abstract to read:

      “This coiled-coil linker transduction mechanism additionally suggests a resolution to the mystery of how shared sensory domains control serine/threonine phosphatases, diguanylate cyclases and histidine kinases.”

      We have modified the results to read:

      "These predictions suggest a testable hypothesis that RsbP is controlled through an activation mechanism similar to that of RsbU (Fig. 5A)”

      “From this analysis, we speculate that linker-mediated phosphatase domain dimerization is an evolutionarily conserved, adaptable mechanism to control PPM phosphatase activity.”

      Based on this critique (and the critiques of the other reviewers), we plan to do energetic analysis of the predicted coiled coils from the enzymes we analyzed from other species and to incorporate this into the manuscript.

      We have modified the results to read:

      Consistent with a model in which the stability of the linker plays a conserved regulatory role, the AlphaFold2 models for many of the predicted structures have unfavorable polar residues buried in the coiled-coil interface (positions a and d, for which non-polar residues are most favorable) (Figure 5 – figure supplement 2).”

      Finally, in the manuscript, we have highlighted that this mechanism is not the only mechanism for activation of other proteins with effector domains connected to linkers, but rather one of many mechanisms (Fig 5G). The reviewer additionally made helpful suggestions about the text in detailed comments that we will incorporate as appropriate.

      Reviewer #2 (Public review):

      Summary:

      While bacteria have the ability to induce genes in response to specific stresses, they also use the General Stress Response (GSR) to deal with growth conditions that presumably include a larger range of stresses (for instance, stationary phase growth). The activation of GSR-specific sigma factors is frequently at the heart of the induction of a GSR. Given the range of stresses that can lead to GSR induction, the regulatory inputs are frequently complex. In B. subtilis, the stressosome, a multi-protein complex, contains a set of proteins that, upon appropriate stresses, initiate partner switching cascades that free the sigma B sigma factor from an anti-sigma. The focus here is on the mode of activation of RsbU, a serine/threonine phosphatase of the PPM family, leading to sigB activation. RbsT, a component of the degradosome interacts with RsbU upon stress, activating the phosphatase activity. Once active, RsbU dephosphorylates its target (RsbV, an anti-antisigma), which in turn binds the anti-sigma. The conclusion is that flexible linker domains upstream of the phosphatase domain are the target for activation, via binding of proteins to the N-terminal domain, resulting in a crossed-linker dimeric structure. The authors then use the information on RsbU to suggest that parallel approaches are used to activate PPM phosphatases for the GSR response in other bacteria. (Biology vs. Mechanism, evolution?)

      Strengths and Weaknesses:

      Many of these have to do with clarifying what was done and why. This includes the presentation and content of the figures.

      One issue relates to the background and context. A bit more information on the stresses that release RsbT would be useful here. The authors might also consider a figure showing the major conclusions and parallels for SpoIIE activation and possibly other partner switches that are discussed, introducing the switch change more clearly to set the stage for the work here (and the generalization). There are a lot of players to keep track of.

      We plan to carefully review the manuscript to improve the clarity of presentation and background. In particular, we thank the reviewer for pointing out the missing information about the release of RsbT from the stressosome. We will incorporate this information into the introduction and provide an additional figure.

      We have added the following text to the introduction:

      “RsbT is sequestered in a megadalton stress sensing complex called the stressosome, and is released to bind RsbU in response to specific stress signals including ethanol, heat, acid, salt, and blue light”

      We have added a new figure panel (2C) that shows the model for how Q94L, M166V, and RsbT binding induce conformational change of the PPM domain to recruit metal cofactor and activate RsbU (analogous, but slightly different from the mechanism for SpoIIE).

      The reviewer additionally provided detailed helpful comments that we will incorporate in the text and figures.

      Reviewer #3 (Public review):

      Summary:

      The authors present a study building on their previous work on activation of the general stress response phosphatase, RsbU, from Bacillus subtilis. Using computed structural models of the RsbU dimer the authors map previously identified activating mutations onto the structure and suggest further protein variants to test the role of the predicted linker helix and the interaction with RsbT on the activation of the phosphatase activity.

      Using in vivo and in vitro activity assays, the authors demonstrate that linker variants can constitutively activate RsbU and increase the affinity of the protein for RsbT, thus showing a link between the structure of the linker region and RsbT binding.

      Small angle X-ray scattering experiments on RsbU variants alone, and in complex with RsbT show structural changes consistent with a decreased flexibility of the RsbU protein, which is hypothesised to indicate a disorder-order transition in the linker when RsbT binds. This interpretation of the data is consistent with the biochemical data presented by the authors.

      Further computed structure models are presented for other protein phosphates from different bacterial species and the authors propose a model for phosphatase activation by partner binding. They compare this to the activation mechanisms proposed for histidine kinase two-component systems and GGDEF proteins and suggest the individual domains could be swapped to give a toolkit of modular parts for bacterial signalling.

      Strengths:

      The key mutagenesis data is presented with two lines of evidence to demonstrate RsbU activation - in vivo sigma-b activation assays utilising a beta-galactosidase reporter and in vitro activity assays against the RsbV protein, which is the downstream target of RsbU. These data support the hypothesis for RsbT binding to the RsbU linker region as well as the dimerisation domain to activate the RsbU activity.

      Weaknesses:

      Small angle scattering curves are difficult to unambiguously interpret, but the authors present reasonable interpretations that fit with the biochemical data presented. These interpretations should be considered as good models for future testing with other methods - hydrogen/deuterium exchange mass spectrometry, would be a good additional method to use, as exchange rates in the linker region would be affected significantly by the disorder/order transition on RsbT binding.

      We agree with the reviewer that the SAXS data has inherent ambiguity due to the nature of the measurement. However, SAXS is one of the best techniques to directly assess conformational flexibility. Our scattering data for RsbU have multiple signatures of flexibility supporting a high confidence conclusion. While the scattering data support a reduction in flexibility for the RsbT/RsbU complex, we agree that a high resolution structure would be valuable. However the combination of the scattering data with our biochemical and genetic data supports the validity of the AlphaFold predicted model. We thank the reviewer for the suggestion of future hydrogen/deuterium exchange experiments that would be complementary, but which we feel are beyond the scope of this work.

      The interpretation of the computed structure models should be toned down with the addition of a few caveats related to the bias in the models returned by AlphaFold2. For the full-length models of RsbU and other phosphatase proteins, the relationship of the domains to each other is likely to be the least reliable part of the models - this is apparent from the PAE plots shown in Supplementary Figure 8. Furthermore, the authors should show models coloured by pLDDT scores in an additional supplementary figure to help the reader interpret the confidence level of the predicted structures.

      We thank the reviewer for suggestions on how to clarify the discussion of AlphaFold models. We will decrease the emphasis on the computed models in the text and will add figures with the models colored by the pLDDT scores to aid in the interpretation.

      We have modified the text of the Abstract: “This coiled-coil linker transduction mechanism additionally suggests a resolution to the mystery of how shared sensory domains control serine/threonine phosphatases, diguanylate cyclases and histidine kinases.”

      We have modified the text of the Results: “These predictions suggest a testable hypothesis that RsbP is controlled through an activation mechanism similar to that of RsbU (Fig. 5A).”

      “From this analysis, we speculate that linker-mediated phosphatase domain dimerization is an evolutionarily conserved, adaptable mechanism to control PPM phosphatase activity”

      We have also added Figure 1 – figure supplement 2 with the AlphaFold2 models colored by the pLDDT scores.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Baral and colleagues investigate the regulatory mechanisms of the General Stress Response (GSR) in Bacillus subtilis, focusing on the phosphatase RsbU and its regulation by the protein RsbT. The GSR is a critical adaptive mechanism that allows bacteria to survive under various stress conditions by reshaping their physiology through a broad transcriptional response. RsbU, a key player in the GSR, facilitates the activation of the transcription factor SigB by dephosphorylating RsbV. This activation is mediated through a partner-switching mechanism involving RsbT. Baral and colleagues use a combination of genetic screening, structural predictions via AlphaFold2, and biophysical techniques such as SAXS and MALS to present a model for how RsbT regulates RsbU. Key findings include the identification of specific amino acid substitutions that enhance RsbU activity, the role of the α-helical linker in RsbU dimerization and activation, and the potential broader conservation of these mechanisms across bacterial species. However, as described below, additional work is required to solidify the results.

      Major Points

      (1) The manuscript is misnamed--it dissects a single step of the signal-transduction pathway regulating the general stress response. Instead, it is rather seeking a generalizable mechanism for kinase -phosphatase interactions across stresses.

      We have edited the title to “A General Mechanism for Initiating the General Stress Response in Bacteria” to reflect that that this study addresses the initiating event of the general stress response.

      (2) The genetic screen likely has limitations in detecting all possible variants that could affect RsbU activity. The readout is specific to σ^B activation, and the focus on specific amino acid substitutions may overlook other significant regions or mechanisms involved in the regulation of RsbU, particularly those involving RsbV and RsbT.

      Our screens were specifically designed to identify features of RsbU that contribute to regulation. Importantly, RsbU does not have any known targets other than RsbV and the downstream σ<sup>B</sup> response but agree that substitutions in either RsbV or RsbT could influence RsbU activation. In principle our suppressor screen with RsbU<sup>Y28I</sup> could have identified RsbT variants (rsbT was mutagenized in this screen), but we did not identify any such variants in the screen. We conducted a separate screen (published elsewhere) that specifically addressed how RsbU recognizes RsbV.

      (3) The authors largely focus on the biochemical and structural aspects of RsbU regulation. There is limited discussion on the broader functional implications of these findings in the context of bacterial physiology and stress response. Incorporating more in vivo studies to show how these mechanisms impact bacterial survival and adaptation would provide a more comprehensive understanding.

      We appreciate this comment, but did not conduct additional studies of survival and adaptation because the phenotypes of σ<sup>B</sup> deletion in B. subtilis under laboratory conditions are relatively mild and therefore difficult to assay. Future studies to address this in other systems could be highly informative.

      (4) The results primarily support the model of linker-mediated dimerization and rigidity. However, other potential regulatory mechanisms or interacting partners might also play significant roles in RsbU activation. A more thorough exploration of these possibilities would strengthen the study's conclusions.

      One of the major advantages of RsbU as a model for initiation of the general stress response is that the system is discreet with all evidence pointing to there being a single primary input (RsbT) and output (dephosphorylation of RsbV). While there are other possible variations on the system (for example RsbU may be directly activated by manganese stress), we focused on this system precisely because of its simplicity.

      (5) While the study presents evidence for the conservation of the described mechanism across different species, this assumption is based on structural predictions and limited experimental data. Broader experimental validation across diverse bacterial species would be necessary to substantiate this claim. Coevolution coupling along with conservation/evolutionary studies could be considered.

      We have altered the language in the paper to emphasize where we are making inferences from predictions that are therefore more speculative. We agree that a more detailed analysis of the evolutionary coupling would likely be fruitful. We note that these couplings are the major driving force of AlphaFold predictions, suggesting that these couplings contributed to the models that we analyzed.

      (6) The reliance on AlphaFold2 for structural predictions introduces potential biases and uncertainties inherent in computational models. Experimental validation of these models through additional techniques such as cryo-EM or X-ray crystallography would strengthen the conclusions.

      We agree with this point, which is why we performed extensive analysis and validation of the models for RsbU using SAXS, genetics, and biochemistry. The proposed techniques are made more challenging by flexibility and heterogeneity, which we detected in our experiments. Our attempts thus far at experimental structure determination are consistent with this being a major technical hurdle.

      (7) SAXS data provide low-resolution structural information, and the interpretation of flexibility versus rigidification might be overemphasized in its interpretation. This part of the study was difficult to interpret. Improving readability by breaking down the text into sections with clear headings for each figure panel and clarifying descriptions of the panels and methods would help. Complementary high-resolution techniques could provide a more definitive view of the linker's conformational changes.

      We have modified the presentation of the figures to clarify the SAXS analysis. The fact that the SAXS analysis suggests flexibility rather than a discrete inactive conformation means that high-resolution techniques may not be appropriate for this system.

      (8) The study primarily focuses on the model where RsbT binding rigidifies the RsbU linker. Alternative hypotheses, such as subtle conformational adjustments without complete rigidification, are not extensively explored or ruled out.

      Our analysis of the SAXS data strongly suggests that a subtle conformational change could not account for the scattering data that we obtained. We have modified the text to clarify this point.

      “Indicative of significant deviation between the RsbU structure in solution to the AlphaFold2 model, the scattering intensity profile (I(q) vs. q) was a poor fit (χ<sup>2</sup> 12.53) to a profile calculated from the AlphaFold2 model of an RsbU dimer using FoXS (Schneidman-Duhovny et al. 2016; Schneidman-Duhovny et al. 2013) (Fig. 4A). We therefore assessed the SAXS data for the RsbU dimer for features that report on flexibility (Kikhney & Svergun 2015). First, the scattering intensity data lacked distinct features caused by the multi-domain structure of RsbU from the AlphaFold2 model (Fig.4A).”

      (9) Future studies should aim to validate the AlphaFold2 predictions with high-resolution structural techniques. This would provide definitive evidence for the proposed conformational states of RsbU with and without RsbT.

      The fact that the SAXS analysis suggests flexibility rather than a discrete inactive conformation means that high-resolution techniques may not be appropriate for this system.

      (10) Investigating the RsbU-RsbT interaction in vivo using techniques like FRET, co-immunoprecipitation, or live-cell imaging would provide a more comprehensive understanding of their functional dynamics in a cellular context.

      We appreciate the reviewer’s suggestions for future experiments.

      (11) Exploring and testing alternative models of RsbU activation, such as partial rigidification or different modes of conformational change, would strengthen the conclusions.

      While our data strongly support that a flexible-to-rigid transition controls RsbU activation, we agree that it is possible that other mechanisms of linker modification could control other phosphatases and we discuss this at some length in the discussion.

      (12) The figure legends are quite dense and could benefit from some streamlining.

      We have edited the figure legends for clarity and length.

      Reviewer #2 (Recommendations for the authors):

      (1) Activation assays (Figures 1, 3, S2) are presented here as blue or white spots (reflecting a reporter activity). While off and on these are fairly clear, it is more difficult to compare the degree of activity (for instance that rsbU<sup>Q94L</sup> is more active than M166V). It would also be good to clearly present in the text the logic of asking if the mutant is RsbT independent or not (and the interpretation of that). Quantitative assays of these would be very useful.

      We chose not to perform quantitative-LacZ assays here because of several complications to interpreting these results that we encountered in our previously published study (Ho and Bradshaw, 2021). However, the level of blue pigmentation shown in Figure 1B for RsbU Q94L and RsbU M166V is qualitatively different, making the comparison possible. Most importantly, we observed cell density dependent changes in LacZ activity in the absence of rsbT for rsbU<sup>M166V</sup> expressing cells, meaning that comparisons between strains would be difficult. Additionally, we found that it was important to make a chromosomal replacement of rsbU to see the full effect of the M166V substitution. However, we were not able to construct a similar rsbU<sup>Q94L</sup> strain, likely because the high level σ<sup>B</sup> activity is lethal (we were able to construct this strain when σ<sup>B</sup> was deleted but only obtained strains with additional loss-of-function mutations in RsbU when σ<sup>B</sup> was present.

      We have modified the text to explain the logic of identifying RsbT independent variants: “We previously conducted a genetic screen (Ho & Bradshaw 2021) to identify features of RsbU that are important for phosphatase regulation by isolating gain-of-function variants that are active in the absence of RsbT.”

      (2) Explain Figure S8 graphs: as much as Alphafold is now in use, the authors should provide some further explanation of what is shown here. Blue (low error) is good, presumably. What are the A, B, C, and D sections showing? Different parts of a given letter region (and between them)? What is the x-axis? Is the top-ranked model used in every case in the text? How different are these models? The Methods section could be used for some of this (but doesn't in its current form). This also becomes important for the models generated later in the paper (Figure S7), which look rather different here.

      We have modified figure S8 to include additional labels and have added structures with the pLDDT scores shown. We have additionally modified the figure legends and methods to provide the requested information.

      (3) Figure 1C, D, Figure S2: amino acid ends of linker domains could be shown (text discusses 83-97 the linker as a two-turn coiled coil; Q94 is pretty close to the end of this coiled-coil? Figure S2 is even less clear - addresses of other amino acids would help, and or an added sequence showing the full linker and coiled-coil region). Some explanation for positions for readers to focus on for full coiled-coil would be useful in the legend of Figure S2. How strong a coiled-coil prediction is there for this region?

      We have added the sequence of the coiled-coil regions to the figures with numbering. For these analyses we used the Socket2 program, which analyzes a PDB file to identify coiled-coil regions and thus does not provide a confidence score. However, inspection of the sequence and the confidence scores of the AlphaFold2 models indicates that the coiled-coil regions are not ideal, consistent with this being a regulatory feature.

      Is it clear that the fully inactive proteins are still properly folded and soluble?

      In the case of RsbU, our biophysical analysis indicates that the inactive form of the protein is soluble. While phosphatase activity is substantially reduced, our unpublished comparison of single- and multiple-turnover reactions in the absence of RsbT indicates that nearly all of the enzyme is active.

      Finally, are there other positions that would also be expected, from this model, to stabilize the coiled-coil and thus bypass the requirement for RsbT? If so, it would be good to test these. Is it the burial of amino acid at position 94 that is important, or the ability to form crossed helices?

      Because of how short the predicted coiled-coil region is, we did not identify any obvious positions that would likely have the same effect as Q94 substitution. We considered making helix-breaking mutations, which would be predicted to block RsbU activation, but favored analysis of the wildtype protein because of limitations in interpreting the effects of loss-of-function mutations.

      (4) Figure 2A, RsbT binding to RsbU: It was not entirely clear to this reviewer why one would expect the RsbT binding, not needed for activation, to be increased by the mutation that stabilizes the crossed alpha helices. The change is impressive but doesn't the lack of a need for RsbT suggest that this mutation bypasses the normal mechanism? (Is dimerization enuf? Or other protein cross helices?).

      We have modified the text to clarify this point: “One prediction of our hypothesis that RsbT stabilizes the crossed alpha helices of the RsbU dimer, is that RsbT should bind more tightly to rsbU<sup>Q94L</sup> than to RsbU because the coiled-coil conformation that RsbT binds would be more energetically favorable.” Another way of putting this is that if the Q94L substitution activates RsbU through an on-pathway mechanism, RsbT must bind more tightly.

      (5) Figure 3A, Figure S3: Please label the yellow (interface) residues in RsbU and RsbT in Fig. S3 and the green (suppressor) spheres in Figure 3A.

      We have added labels to the figures as suggested.

      If RbsT interacts with the N-terminal dimerization domain and linker, why were residues 174 and 178 (from PPM domain) shown to be implicated in binding?

      The fact that residues in the switch region suppress a mutation that decreases RsbT binding suggests that this region is part of an allosteric network that links RsbT binding, the linker, and dimerization of the phosphatase domains. For example, any substitution that promotes a conformation of the phosphatase domain that is more favorable for dimerization would also promote RsbT binding. However, the precise details of how each mutation fits into this network is not clear and we have therefore chosen to not specify a particular model to avoid over interpreting our data.

      Are these marked in Figure S3?

      We have added labels to make this clear.

      Are these part of a dimerization interface in the C-terminal domain? Are any/all of these RsbU mutants suppressed by Q94L, as one might predict (apparently Y28I is since Q94L was again identified)?

      We chose to focus on Y28I because it was the best studied previously, but we would predict that Q94L would suppress other RsbT binding mutations.

      (6) Line 191-192: Is it surprising that no suppressors were isolated in RsbT?

      We didn’t have a preconception of whether or not it would be possible to identify similar suppressors in RsbT. Explanations for why we did not identify such suppressors could include that RsbT may be destabilized more easily by substitution, that RsbT is more constrained because it has other interaction partners, or that the particular substitutions that would suppress Y28I are less common by the PCR mutagenesis strategy we used.

      (7) Figure 3: Would the same mutants arise if the screen had been done in the absence of RsbT? Was RsbT-dependent tested for the rsbU alleles?

      Our prediction is that we would not have identified any of these mutations except for Q94L in the absence of rsbT. We tested a few of the alleles and found them all to be rsbT dependent, but did not systematically test all of the alleles and therefore did not include this analysis in the manuscript.

      Given the findings earlier in the paper for Q94L, suggesting that this stabilizes the coiled-coil and shows some activity in the absence of RsbT, it seems that the interpretation of other mutants in this region (and Q94L itself) as evidence that RsbT contacts the linker directly and that contact is necessary for activation may be an overinterpretation. If these are in fact RsbT independent, they support the importance of the linker (do they further stabilize coiled-coil formation?), rather than the role of RsbT here. Are G92 and T89 on the outside of the coiled-coil? If Q94 is buried, is it qualitatively different from these others?

      G92 and T89 are predicted to be exposed. The fact that these mutations are near Q94 is part of the reason that we focused on R91 and the predicted contact with D92 of RsbT as another approach to validate the predicted interface.

      (8) Figure 3C addresses the issue of direct interaction of RsbT with the RsbU linker to some extent, given that RsbU R91E doesn't appear to have a lot of activity without RsbT. It would be helped by telling the reader what the R91 contact is initially.

      We have modified the text to clarify this point: “To test the model that RsbT activates RsbU by directly interacting with the linker to dimerize the RsbU phosphatase domains, we introduced a charge swap at position R91 that would abolish a predicted salt-bridge with RsbT D92 (Fig. 3C).”

      (9) Figure 4 and the discussion of it in the text is not likely to be easily understandable for many readers. Aside from providing a bit more explanation of what these analyses are showing, it would be useful to start the whole section (or maybe even much earlier in the paper) with the information found on lines 261-264, that other studies show that the N-terminus dimerizes stably on its own (and is it known that the C-terminus does not?). Then the discussion of the alternative models early in this section would be clearer.

      We have updated the introduction to emphasize this point “RsbU has an N-terminal four-helix bundle domain that dimerizes RsbU and is also the binding site for RsbT, which activates RsbU as a phosphatase (Fig. 1C,D) (Delumeau et al. 2004).”

      We have also added clarification to the model presented at the beginning of this section: “A second possibility is that inactive RsbU is dimerized by the N-terminal domains but that the linkers of inactive RsbU are flexible and that the phosphatase domains only interact with each other when RsbT orders the linkers into a crossing conformation.”

      Is the dimerization of the N-terminal domains previously determined similar/the same as what is seen in the AlphaFold models used here (or the AlphaFold dimerization derived primarily from that data?).

      Yes, the dimerization in the AlphaFold models matches closely to the published structure.

      (10) Discussion and Figure 5: The final part of this work predicts AlphaFold models for a set of other phosphatases involved in initiating GSR across bacterial species, and suggests that linked-mediated phosphatase dimerization is the critical factor to activate the phosphatase. Clearly, this is the most speculative but interesting aspect of the paper. A number of possible questions are suggested by some of this:

      a. Do any of the activating mutants In RsbU and RsbP in the PPM domain (that apparently improve dimerization and thus activation) do a similar job in the other modeled proteins?

      This is an interesting question, but unfortunately most of these proteins have not been biochemically characterized. We highlight examples of RsbP and E. coli RssB for which similar activating mutations have been characterized.

      b. The legend (Figure 5G) suggests that all of the linker combinations will be coiled-coils, but that they will undergo different types of activating (and dimerizing?) transitions. Is that in fact what is being proposed here?

      Yes, this is our working hypothesis.

      c. If there is no dimerization (as noted, only weak dimerization has been reported for E. coli RssB), does that generalize the model to there are linkers and their structures are important? At the least, would the folding up of the E. coli RssB linker with antiadaptor binding be considered another mode of signal transduction or rather some sort of storage form?

      Interestingly, the P. aeruginosa RssB constitutively dimerizes, suggesting the E. coli is the outlier.

      d. Would the "toolkit" model, in which different changes occur in the linker regions, suggest that the interacting proteins are going to be critical for the type of linker changes that will be important? Or something about the nature of the linkers themselves?

      This is an interesting question that we cannot yet answer. We have chosen to focus on the possible flexibility of this mechanism and anticipate that a variety of mechanisms will be used.

      e. Given the extensive comparison to E. coli RssB, the authors might consider a figure to clarify the relative domain architecture, sequences that are akin to switch regions, and others important to the discussion here.

      We tried to highlight this in Figure 5C including coloring the regions similar to the switch regions.

      Reviewer #3 (Recommendations for the authors):

      Given the caveats noted above related to the reliability of computed structure models, I would recommend the authors make the following additions/modifications to their manuscript:

      (1) The authors should show alpha fold models coloured by pLDDT scores in an additional supplementary figure to help the reader interpret the confidence level of the predicted structures.

      We have added these models to figure 1 – figure supplement 2.

      (2) Because of the points mentioned above the authors should tone down the generalisation relating to the activation mechanism of this family of phosphatases presented in the discussion.

      We have modified the paper throughout to emphasize where we are speculating.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1:

      Summary:

      Kimura et al performed a saturation mutagenesis study of CDKN2A to assess functionality of all possible missense variants and compare them to previously identified pathogenic variants. They also compared their assay result with those from in silico predictors.

      Strengths:

      CDKN2A is an important gene that modulate cell cycle and apoptosis; therefore it is critical to accurately assess functionality of missense variants. Overall, the paper reads well and touches upon major discoveries in a logical manner.

      Weaknesses:

      The paper lacks proper details for experiments and basic data, leaving the results less convincing. Analyses are superficial and does not provide variant-level resolution. Many of which were addressed during the revision process.

      Comments on revisions:

      The manuscript was improved during the revision process.

      We thank the reviewer for their comments. We are grateful for the opportunity to provide additional information and data to clarify our approach and study results.

      Reviewer #2:

      Summary:

      This study describes a deep mutational scan across CDKN2A using suppression of cell proliferation in pancreatic adenocarcinoma cells as a readout for CDKN2A function. The results are also compared to in silico variant predictors currently utilized by the current diagnostic frameworks to gauge these predictors' performance. The authors also functionally classify CDKN2A somatic mutations in cancers across different tissues.

      Review:

      The goal of this paper was to perform functional classification of missense mutations in CDKN2A in order to generate a resource to aid in clinical interpretation of CDKN2A genetic variants identified in clinical sequencing. In our initial review, we concluded that this paper was difficult to review because there was a lack of primary data and experimental detail. The authors have significantly improved the clarity, methodological detail and data exposition in this revision, facilitating a fuller scientific review. Based on the data provided we do not think the functional characterization of CDKN2A variants is robust or complete enough to meet the stated goal of aiding clinical variant interpretation. We think the underlying assay could be used for this purpose but different experimental design choices and more replication would be required for these data to be useful. Alternatively, the authors could also focus on novel CDKN2A variants as there seems to be potential gain of function mutations that are simply lumped into "neutral" that may have important biological implications.

      Major concerns:

      Low experimental concordance. The p-value scatter plot (Figure 2 Figure Supplement 3A) across 560 variants shows low collinearity indicating poor replicability. These data should be shown in log2fold changes, but even after model fitting with the gamma GLM still show low concordance which casts strong doubt on the function scores.

      Concordance among non-significant p-values is generally low because most of the signal comes from random variability across repeats. If the observed log2 fold change between the repeats is entirely due to noise, one would expect two repeated p-values to behave like independent random uniforms. True concordance is typically more evident in significant p-values because they reflect consistent effects above random noise. Functionally deleterious variants are called when their associated p-value is significant. To confirm this statement, a scatter plot with the log2 normalized fold change was added in Figure 2 Supplement 3C. We see low concordance between repeats in the log2 normalized fold changes centered around 0, corresponding to log log2 normalized changes mainly due to noise. The concordance increases as the variants become significant. One can notice that the correlation coefficient between duplicate assay results was almost identical between the model-based p-values and log2normalized fold change (Figure 2-figure supplement 3A and 3C, Appendix 1-table 4, and Appendix 1-table 6). Also, importantly, no variant was functionally deleterious in one replicate and functionally neutral in another, implying a perfect concordance in calls if we exclude variants that were called indeterminate in one of the two repeats. Finally, of variants with discordant classifications, only 6/560 repeats (1.1%) were functionally deleterious (significant p-value) in one replicate and of indeterminate function in another. We have updated the text as follows:

      “Of variants with discordant classifications, 6 (1.1%) were functionally deleterious in one replicate and of indeterminate function in another. While 102 variants (18.2%) were functionally neutral in one replicate and of indeterminate function in another. Importantly, no variant that was functionally deleterious in one replicate and functionally neutral in another (Appendix 1 -table 4). Furthermore, the correlation coefficient between duplicate assay results was similar using the gamma GLM and log2 normalized fold change (Figure 2-figure supplement 3A and 3C).”

      The more detailed methods provided indicate that the growth suppression experiment is done in 156 pools with each pool consisting of the 20 variants corresponding to one of the 156 aa positions in CKDN2A. There are several serious problems with this design.

      Batch effects in each of the pools preventing comparison across different residues. We think this is a serious design flaw and not standard for how these deep mutational scans are done. The standard would be to combine all 156 pools in a single experiment. Given the sequencing strategy of dividing up CDKN2A into 3 segments, the 156 pools could easily have been collapsed into 3 (1 to 53, 54 to 110, 111 to 156). This would significantly minimize variation in handling between variants at each residue and would be more manageable for performance of further replicates of the screen for reproducibility purposes. The huge variation in confluency time 16-40 days for each pool suggest that this batch effect is a strong source of variation in the experiment.

      While there is variation in time to confluency between different amino acid residues, we do not anticipate this batch effect to significantly affect variant classifications in our study. For example, our results were generally consistent with previous classifications. All synonymous variants (one per residue) and benchmark benign variants assayed were classified as functionally neutral. Furthermore, of benchmark pathogenic variants assayed, none were classified as functionally neutral. 84% were classified as functionally deleterious and 16 percent were classified as indeterminate function.

      Lack of experimental/biological replication: The functional assay was only performed once on all 156 CDKN2A residues and was repeated for only 28 out of 156 residues, with only ~80% concordance in functional classification between the first and second screens. This is not sufficiently robust for variant interpretation. Why was the experiment not performed more than once for most aa sites?

      In our study we determined functional classifications for all CDKN2A missense variants while assessing variability with replicates across 28 residues. Of these variants, only 6 (1.1%) were functionally deleterious in one replicate and of indeterminate function in another. Furthermore, no variant was functionally deleterious in one replicate and functionally neutral in another (Appendix 1 -table 4).  As noted above, we provided additional context in the manuscript.

      For the screen, the methods section states that PANC-1 cells were infected at MOI=1 while the standard is an MOI of 0.3-0.5 to minimize multiple variants integrating into a single cell. At an MOI =1 under a Poisson process which captures viral integration, ~25% of cells would have more than 1 lentiviral integrant. So in 25% of the cells the effect of a variant would be confounded by one or more other variants adding noise to the assay.

      As noted previously, we are not able to differentiate effects due to multiple viral integrations per cells. However, we do not anticipate multiple viral integrations to significantly affect variant classifications in our study as our results are consistent with previous classifications, as described above.

      While the authors provide more explanation of the gamma GLM, we strongly advise that the heatmap and replicate correlations be shown with the log2 fold changes rather than the fit output of the p-values.

      Thank you for the suggestion. As noted, we provide additional explanation in the manuscript about why we classified variants using a gamma GLM. Using a gamma GLM, classification thresholds were determined using the change in representation of 20 non-functional barcodes in a pool of PANC-1 cells stably expressing CDKN2A after a period of in vitro proliferation. Our variant classifications were therefore not based on assay outputs for previously reported – benchmark – pathogenic or begin variants to determine thresholds. We strongly prefer using p-values and classifications using the gamma GLM in the manuscript. However, comparison of assay outputs using a gamma GLM and log2 fold change are included in the manuscript. Read counts, log2 fold change, and classifications based on log2 fold change are presented in the manuscript, for all variants. Readers who wish to use these data may do so and we refer them to the manuscript text, Appendix 1 -table 4, Appendix 1 -table 6, and Figure 2 -figure supplement 2.

      In this study, the authors only classify variants into the categories "neutral", "indeterminate", or "deleterious" but they do not address CDKN2A gain-of-function variants that may lead to decreased proliferation. For example, there is no discussion on variants at residue 104, whose proliferation values mostly consist of higher magnitude negative log2fold change values. These variants are defined as neutral but from the one replicate of the experiment performed, they appear to be potential gain-of-function variants.

      We have added a comment to the discussion to highlight that we did not identify potential gain-of-function variants. Specifically:

      “We classified CDKN2A missense variants using a gamma GLM, as either functionally deleterious, indeterminate functional or functionally neutral. However, we did not classify variants that may have gain-of-function effects, resulting in decreased representation in the cell pool. Future studies are necessary to determine the prevalence and significance of CDKN2A gain-of-function variants.”

      Minor concerns:

      The differentiation between variants of "neutral" and "indeterminate" function seems unnecessary and it seems like there are too many variants that fall into the "indeterminate" category. The authors seem to have set numerical thresholds for CDKN2A function using benchmark variants of known function. While the benchmark variants are important as a frame of reference for the "dynamic range" of the assay, their function scores should not necessarily be used to define hard cutoffs of whether a variant's function score can be interpreted.

      We did not utilize benchmark variants to define thresholds for functional classifications using a gamma GLM. This is one of the strengths of using a gamma GLM model for classification. As explained in our manuscript, classification thresholds were determined using the change in representation of 20 non-functional barcodes in a pool of PANC-1 cells stably expressing CDKN2A after a period of in vitro proliferation. Our variant classifications were therefore not based on assay outputs for previously reported – benchmark – pathogenic or begin variants. While not required when using a gamma GLM, we included indeterminate classifications, which are not uncommon.

      Figure 2 supplement 2 - on the x-axis, should "intermediate" be "indeterminate"?

      This, and a similar typographical error in Figure 2 -figure supplement 3, has been corrected.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public Review):

      This study elucidates the toxic effects of the lipid aldehyde trans-2-hexadecenal (t-2-hex). The authors show convincingly that t-2-hex induces a strong transcriptional response, leads to proteotoxic stress and causes the accumulation of mitochondrial precursor proteins in the cytosol.

      The data shown are of high quality and well-controlled. The genetic screen for mutants that are hyper-and hypo-sensitive to t-2-hex is elegant and interesting, even if the mechanistic insights from the screen are rather limited. Moreover, the authors show evidence that t-2-hex affects subunits of the TOM complex. However, they do not formally demonstrate that the lipidation of a TOM subunit is responsible for the toxic effect of t-2-hex. A t-2-hex-resistant TOM mutant was not identified. Nevertheless, this is an interesting and inspiring study of high quality. The connection of proteostasis, mitochondrial biogenesis and sphingolipid metabolism is exciting and will certainly lead to many follow-up studies.

      Reviewer #3 (Public Review):

      Summary:

      The authors investigate the effect of high concentrations of the lipid aldehyde trans-2-hexadecenal (t-2-hex) in a yeast deletion strain lacking the detoxification enzyme. Transcriptomic analyses as global read out reveal that a large range of cellular functions across all compartments are affected (transcriptomic changes affect 1/3 of all genes). The authors provide additional analyses, from which they built a model that mitochondrial protein import caused by modification of Tom40 is blocked.

      Our initial transcriptomic study with high doses of t-2-hex in a detoxifying mutant as an experimental approach is only a starting experiment and was aimed to identify as many determinants of t-2-hex toxicity as possible as stated in the manuscript. From this, we developed multiple independent approaches in wild-type (and mutant) cells at low t-2-hex concentrations, demonstrating that proteostasis and mitochondrial protein trafficking are physiologically important targets of the pro-apoptotic lipid. Specifically, proteostasis-specific PACE reporters are robustly induced in a detoxification mutant by 5mM t-2-hex (Figure 3D,E) and significantly induced by 10 mM t-2-hex in detoxification competent wild type cells (new Figure 3F).

      We do not propose Tom40 as the lipid's primary target, while we show that several subunits of the TOM (and TIM) complex are directly targeted by low t-2-hex concentrations in vitro (Figure 8B), and Tom20 and Tom70 are important for lipid toxicity (Figure 8D) and mitochondrial protein trafficking in vivo (Suppl. Figure 2).

      Strengths:

      Global analyses (transcriptomic and functional genomics approach) to obtain an overview of changes upon yeast treatment with high doses of t-2-hex.

      Weaknesses:

      The use of high concentrations of t-2-hex in combination with a deletion of the detoxifying enzyme Hfd1 limits the possibility to identify physiological relevant changes. From the hundreds of identified targets the authors focus on mitochondrial proteins, which are not clearly comprehensible from the data.

      The initial transcriptomic study with high doses of t-2-hex in a detoxifying mutant is a starting experiment and was aimed to identify as many determinants of t-2-hex toxicity as possible as stated in the manuscript. As stated (page 4), genes up-regulated (>2 log2FC) by t-2-hex were selected and subjected to GO category enrichment analysis (Supplemental Table 1). We found that “Mitochondrial organization” was the most numerous GO group activated by t-2-hex.  Among the strongly t-2-hex induced genes encoding mitochondrial proteins, CIS1 represented the most inducible gene with a known mitochondrial function. Cis1 is the central protein of the MitoCPR pathway, which is specifically induced upon and protects from mitochondrial protein import stress. We further show that proteostasis and mitochondrial protein trafficking are physiologically important targets at low t-2-hex doses in several independent experimental approaches: proteostasis-specific PACE reporters are robustly induced in a detoxification mutant by 5mM t-2-hex (Figure 3D,E) and significantly induced by 10mM t-2-hex in detoxification competent wild type cells (new Figure 3F); mitochondrial pre-protein accumulation is induced by 10mM t-2-hex in wild type cells (Figure 5G); several subunits of the TOM and TIM complexes are lipidated by low (10mM) t-2-hex doses in wild type cell extracts (Figure 8B), mitochondrial import assays with mt-GFP in intact yeast wild type cells reveal that t-2-hex significantly inhibits import at low (5mM) t-2-hex concentrations (new Suppl. Figure 1). 5-10mM t-2-hex applied here is considerably lower than the published data in human cells with ³ 25mM on intact cells or cell extracts (Jarugumilli et al. 2018).

      The main claim of the manuscript that t-2-hex targets the TOM complex and inhibits mitochondrial protein import is not supported by experimental data as import was not experimentally investigated. The observed accumulation of precursor proteins could have many other reasons (e.g. dissipation of membrane potential, defects in mitochondrial presequence proteases, defects in cytosolic chaperones, modification of mitochondrial precursors by t-2-hex rendering them aggregation prone and thus non-import competent). However, none of these alternative explanations have been experimentally addressed or discussed in the manuscript.

      We have now performed additional experiments, alternative to the pre-protein quantifications, showing that t-2-hex specifically inhibits mitochondrial protein import. We investigated the effect of t-2-hex on mitochondrial protein import using flow cytometric GFP assays in live yeast cells. Specifically, we compared the expression and maturation of GFP targeted either to the cytosol or the mitochondrial matrix and show that low doses of t-2-hex (≥5 μM) significantly inhibited mt-GFP activity compared to cytosolic GFP in wild-type cells (new Supplemental Figure 1B). In contrast, this inhibition was not observed with the saturated derivative, t-2-hex-H2. Flow cytometric rhodamine123 assays revealed that t-2-hex did not alter ΔΨm within the concentration range that efficiently inhibits mt-GFP activity (new Supplemental Figure 1C). Alternative t-2-hex effects such as the direct modification of mitochondrial pre-proteins or cytosolic chaperones, potentially making the precursors prone to aggregation, are less likely, as the mitochondrial and cytosolic GFP used in these import studies differ only by the small, cysteine-free PreSu9 pre-peptide. This information is now included in the Results and Discussion sections.

      Furthermore, many of the results have been reported before (interaction of Tom22 and Tom70 with Hfd1) or observed before (TOM40 as target of t-2-hex in human cells).

      The interaction of Tom22 or Tom70 with Hfd1 has been only reported in high throughput pull-down studies in yeast (Opalinski et al., 2018 and Burri et al., 2006), and no functional connection between Hfd1 lipid detoxification and TOM has been investigated. Here we corroborate these high throughput results by targeted pull-down experiments, which strengthens the new finding that Hfd1 functionally interacts with the TOM complex. Tom40 has been found to be lipidated by high t-2-hex concentrations in human cell extracts in high throughput in vitro proteomic studies (Jarugumilli et al., 2018), but no functional connection between human TOM and t-2-hex has been investigated so far. Here we corroborate these high throughput results by targeted experiments, which strengthens the new findings that t-2-hex and TOM interact functionally.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      Congratulations on this exciting study. Even if some of the mechanistic details will have to be addressed in further studies (which of the modified sites are physiologically relevant; which sites are modified in vivo without external addition of t-2-hex) this study is inspiring and opens a new direction of mitochondrial research. I therefore fully support publication of this nice study in its current form.

      Reviewer #3 (Recommendations For The Authors):

      Two of the reviewers pointed out that the observation of precursors in whole cell extract is not sufficient to draw conclusions on mitochondrial protein import rates. The authors did not provide any new experiments but argued that a recent publication (Weidberg and Amon, 2018) had used the same readout for this conclusion. Why this manuscript was accepted with this statement is not known to this reviewer, but it does not change the fact, that the conclusion is not valid. Many alternative explanations are possible (see public review) and the claim that the import competence of the TOM complex is affected upon t-2-hex treatment is not appropriate.

      We have now performed new experiments addressing the inhibition of mitochondrial protein import by t-2-hex as an alternative to our precursor accumulation assays. We compared the induced expression of cytosolic and mitochondrial GFP by flow cytometry as a quantitative mitochondrial import assay (Sirk et al., Cytometry A. 2003 Nov; 56(1) 15-22). Low doses of t-2-hex (≥5 μM) significantly inhibited mt-GFP activity as compared to cytosolic GFP in wild-type cells (new Supplemental Figure 1B). This inhibition of mitochondrial GFP is independent of mitochondrial membrane potential perturbation (new Supplemental Figure 1C) and alternative t-2-hex effects, such as the direct modification of the mtGFP precursor or cytosolic chaperones are less likely, as the mitochondrial and cytosolic GFP used in these import studies differ only by the small, cysteine-free PreSu9 pre-peptide.

      The first sentence of the abstract states that t-2-hex „induces mitochondrial dysfunction in a conserved manner from yeast to human". I find two issues with this statement: 1) if the mechanism is known what is the question addressed in the present manuscript and 2) the second sentence of the results fully contradicts the above sentence „In human cells, t-2-hex causes mitochondrial dysfunction by directly stimulating Bax-oligomerisation at the outer mitochondrial membrane. In yeast, however, t-2-hex efficiently interferes with mitochondrial function and cell growth in a Bax independent manner."

      We agree that the first sentence was misleading, this has been fixed now in the revised version.

      The first reviewer requested a repetition of key experiments with lower concentrations and the authors provided additional in vitro data, however, for this, 10 uM is still very high. To gain valuable and physiological relevant data the initial transcriptomic analysis should be repeated with a low amount and in a wild-type yeast background.

      Published t-2-hex chemoproteomic experiments on human cell extracts were performed at higher concentrations (>25mM) and human Bax is hardly lipidated by 10mM t-2-hex (Jarugumilli et al., 2018), therefore the in vitro lipidation data provided in our study should be considered a low t-2-hex dose. The initial transcriptomic study with high doses of t-2-hex in a detoxifying mutant is a starting experiment and was aimed at identifying as many determinants of t-2-hex toxicity as possible. Building on this, we further show that proteostasis and mitochondrial protein trafficking, the relevant cellular functions for our study, are physiologically important targets at low t-2-hex doses in several independent experimental approaches: proteostasis-specific gene expression is robustly induced in a detoxification mutant by 5mM t-2-hex (Figure 3D,E) and significantly induced by 10mM t-2-hex in detoxification competent wild type cells (new Figure 3F); mitochondrial pre-protein accumulation is induced by 10mM t-2-hex in wild type cells (Figure 5G); several subunits of the TOM and TIM complexes are lipidated by low (10mM) t-2-hex doses in vitro in wild type extracts (Figure 8B), mitochondrial import assays with mt-GFP in intact yeast wild type cells reveal that t-2-hex significantly inhibits import at low (5mM) t-2-hex concentrations (new Suppl. Figure 1).

      As already stated above there are many alternative explanations for the observed accumulation of precursor proteins, e.g. the decreased proteasome activity could be cause and not consequence. Also, the modification of precursors directly upon translation in the cytosol could likely impact on their further transport and result in direct aggregation in the cytosol.

      As mentioned above, we have now corroborated the t-2-hex specific mitochondrial protein import defect by alternative in vivo experiments, which are not dependent on the accumulation of mitochondrial precursors. We have tested now the possibility that decreased proteasome activity could indirectly inhibit mitochondrial import. This is not the case because a rpn4 mutant with impaired proteasomal activity induces normal mtGFP levels (new Suppl. Figure 1D). We cannot exclude that the modification of precursors by t-2-hex upon translation might additionally impact on the transport of some mitochondrial pre-proteins. However, mitochondrial and cytosolic GFP used in the import studies only differ in the small cysteine-free PreSu9 pre-peptide making it very unlikely that precursor lipidation is secondarily responsible for the observed import defect.

      Many of the comments after first reviewing the manuscript were not addressed experimentally although many of the suggested experiments are easy to perform. I can only encourage the authors to provide more experimental support and controls, as the claims are currently not sufficiently supported.

      In the two revisions of our manuscript, we have included several control experiments to better link the pro-apoptotic lipid t-2-hex with mitochondrial import stress. These include: in vitro lipidation of TOM/TIM subunits by low t-2-hex concentrations, t-2-hex tolerance and recovery of mitochondrial protein import in specific tom mutants, inhibition of mitochondrial protein import (pre-protein and mtGFP assays) by low t-2-hex doses independently on mitochondrial membrane potential and proteasome activity, and induction of proteostasis specific gene expression by low t-2-hex doses.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors have developed self-amplifying RNAs (saRNAs) encoding additional genes to suppress dsRNA-related inflammatory responses and cytokine release. Their results demonstrate that saRNA constructs encoding anti-inflammatory genes effectively reduce cytotoxicity and cytokine production, enhancing the potential of saRNAs. This work is significant for advancing saRNA therapeutics by mitigating unintended immune activation.

      Strengths:

      This study successfully demonstrates the concept of enhancing saRNA applications by encoding immune-suppressive genes. A key challenge for saRNA-based therapeutics, particularly for non-vaccine applications, is the innate immune response triggered by dsRNA recognition. By leveraging viral protein properties to suppress immunity, the authors provide a novel strategy to overcome this limitation. The study presents a well-designed approach with potential implications for improving saRNA stability and minimizing inflammatory side effects.

      We thank Reviewer #1 for their thorough review and for recognizing both the significance of our work and the potential of our strategy to expand saRNA applications beyond vaccines.

      Weaknesses:

      (1) Impact on Cellular Translation:

      The authors demonstrate that modified saRNAs with additional components enhance transgene expression by inhibiting dsRNA-sensing pathways. However, it is unclear whether these modifications influence global cellular translation beyond the expression of GFP and mScarlet-3 (which are encoded by the saRNA itself). Conducting a polysome profiling analysis or a puromycin labeling assay would clarify whether the modified saRNAs alter overall translation efficiency. This additional data would strengthen the conclusions regarding the specificity of dsRNA-sensing inhibition.

      We thank the reviewer for this helpful insight and suggestion. We aim to conduct a puromycin labelling assay to clarify the effect of the various saRNA constructs on translation efficiency.

      (2) Stability and Replication Efficiency of Long saRNA Constructs:

      The saRNA constructs used in this study exceed 16 kb, making them more fragile and challenging to handle. Assessing their mRNA integrity and quality would be crucial to ensure their robustness.

      Furthermore, the replicative capacity of the designed saRNAs should be confirmed. Since Figure 4 shows lower inflammatory cytokine production when encoding srIkBα and srIkBα-Smad7-SOCS1, it is important to determine whether this effect is due to reduced immune activation or impaired replication. Providing data on replication efficiency and expression levels of the encoded anti-inflammatory proteins would help rule out the possibility that reduced cytokine production is a consequence of lower replication.

      This is another very helpful comment. We will conduct an analysis of saRNA integrity and quality by denaturing gel electrophoresis. To examine replicative capacity of the saRNA constructs, we aim to conduct RT-qPCR experiments.

      (3) Comparative Data with Native saRNA:

      Including native saRNA controls in Figures 5-7 would allow for a clearer assessment of the impact of additional genes on cytokine production. This comparison would help distinguish the effect of the encoded suppressor proteins from other potential factors.

      Thank you for your suggestion. We will implement this change in the next version of the manuscript.

      (4) In vivo Validation and Safety Considerations:

      Have the authors considered evaluating the in vivo potential of these saRNA constructs? Conducting animal studies would provide stronger evidence for their therapeutic applicability. If in vivo experiments have not been performed, discussing potential challenges - such as saRNA persistence, biodistribution, and possible secondary effects-would be valuable.

      (5) Immune Response to Viral Proteins:

      Since the inhibitors of dsRNA-sensing proteins (E3, NSs, and L*) are viral proteins, they would be expected to induce an immune response. Analyzing these effects in vivo would add insight into the applicability of this approach.

      We recognize the importance of in vivo studies and immune cell responses and plan to incorporate in vivo imaging in future studies to investigate these interactions, as well as examining delivery of various cargoes via saRNA to determine potential therapeutic benefits in different animal models of inflammatory pain, but such studies are beyond the scope of this current investigation. As suggested by the reviewer, we will incorporate a section on potential challenges of in vivo saRNA work in the revised manuscript.

      (6) Streamlining the Discussion Section:

      The discussion is quite lengthy. To improve readability, some content - such as the rationale for gene selection-could be moved to the Results section. Additionally, the descriptions of Figure 3 should be consolidated into a single section under a broader heading for improved coherence.

      Thank you for your suggestions, we will make these changes in the next revision.

      Reviewer #2 (Public review):

      Summary:

      Lim et al. have developed a self-amplifying RNA (saRNA) design that incorporates immunomodulatory viral proteins, and show that the novel design results in enhanced protein expression in vitro in mouse primary fibroblast-like synoviocytes. They test constructs including saRNA with the vaccinia virus E3 protein and another with E3, Toscana virus NS protein and Theiler's virus L protein (E3 + NS + L), and another with srIκBα-Smad7-SOCS1. They have also tested whether ML336, an antiviral, enables control of transgene expression.

      Strengths:

      The experiments are generally well-designed and offer mechanistic insight into the RNA-sensing pathways that confer enhanced saRNA expression. The experiments are carried out over a long timescale, which shows the enhance effect of the saRNA E3 design compared to the control. Furthermore, the inhibitors are shown to maintain the cell number, and reduce basal activation factor-⍺ levels.

      We thank Reviewer #2 for their detailed assessment and recognition of the mechanistic insights provided by our study.

      Weaknesses:

      One limitation of this manuscript is that the RNA is not well characterized; some of the constructs are quite long and the RNA integrity has not been analyzed. Furthermore, for constructs with multiple proteins, it's imperative to confirm the expression of each protein to confirm that any therapeutic effect is from the effector protein (e.g. E3, NS, L). The ML336 was only tested at one concentration; it is standard in the field to do a dose-response curve. These experiments were all done in vitro in mouse cells, thus limiting the conclusion we can make about mechanisms in a human system.

      We agree that these are weaknesses of our work. We plan to address some of these weaknesses by performing a dose response curve for ML336, examining saRNA integrity through denaturing gel electrophoresis, and will also aim to provide additional evidence for effects of effector proteins through RT-qPCR. We are also looking into testing these constructs in patient-derived FLS.

    1. Author response:

      Thanks for the positive review of our manuscript and for appreciating our work.

      We align in many ways with the reviewers comments.. Our initial finding concerning the slight shift of f_free in a/b neurons after conditioning is interesting but we agree it would certainly deserve a follow-up to substantiate its link with memory formation. We also agree that an analysis in distribution rather than through an averaged signal might be more sensitive.

      We however have to cope with the fact that extending our investigation would require manpower resources that are no longer available. Therefore we appreciate the suggestion made by the 3 reviewers to restrain the claim and hence change the title to "In vivo NAD(P)H autofluorescence lifetime imaging reveals metabolic heterogeneity within the Drosophila mushroom body.". We find it matches better with the scope of this study which is mostly to showcase the potential of NAD(P)H FLIM to quantify variations in metabolism in Drosophila brain rather than firmly testing a specific hypothesis linked to memory formation. In this respect, we do provide quantitative results showing metabolic profile variations between brain tissues such as the somata and calyx regions but also between different Kenyon cells subtypes. We would then present the shifts of f_free induced by conditioning as a curio that might entice future work, as advised by Reviewer #2.

      Altogether, in the revised version we will change the title to restrain the claim, move two supplementary figures as main figures to better focus on and describe the registration process. We will also correct the figure panels pointed by the reviewers and add individual samples to our boxplots. We will also slightly compress the introduction and expand the discussion on potential applications. Finally, we will evaluate if statistical tests based on distributions may be more sensitive to observe a significant shift in FLIM signal in the a/b KCs after conditioning, to strengthen our last observation if confirmed.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review): Summary:

      The authors demonstrate that two human preproprotein human mutations in the BMP4 gene cause a defect in proprotein cleavage and BMP4 mature ligand formation, leading to hypomorphic phenotypes in mouse knock-in alleles and in Xenopus embryo assays.

      Strengths:

      They provide compelling biochemical and in vivo analyses supporting their conclusions, showing the reduced processing of the proprotein and concomitant reduced mature BMP4 ligand protein from impressively mouse embryonic lysates. They perform excellent analysis of the embryo and post-natal phenotypes demonstrating the hypomorphic nature of these alleles. Interesting phenotypic differences between the S91C and E93G mutants are shown with excellent hypotheses for the differences. Their results support that BMP4 heterodimers act predominantly throughout embryogenesis whereas BMP4 homodimers play essential roles at later developmental stages.

      Weaknesses:

      (1) A control of BMP7 alone in the Xenopus assays seems important to excludeBMP7 homodimer activity in these assays.

      We and other have shown that BMP7 homodimers have weak or no activity while BMP4/7 heterodimers single at a much higher level than either BMP4 or BMP7 homodimers in Xenopus ectodermal and mesodermal cells. We have expanded the description of these published findings in the results section (lines 182-187). We have also added representative examples of experiments in which BMP4 and BMP7 alone controls are included (new Fig. S2). Since the level of activity of BMP7 + BMP4 variants is equivalent to that of BMP7 + WT BMP4, this cannot be accounted for by BMP7 homodimers.

      (2) The Discussion could be strengthened by more in-depth explanations of how BMP4 homodimer versus heterodimer signaling is supported by the results, so that readers do not have to think it all through themselves. Similarly, a discussion of why the S91C mutant has a stronger phenotype than E93G early in the Discussion would be helpful or least mention that it will be addressed later.

      We have revised the discussion as suggested by the reviewer. Please see responses to recommendations 2-4 below.

      Reviewer #1 (Recommendations for the authors):

      (1) A control of BMP7 injection alone seems missing when comparing the BMP4/7 variants. BMP4 in the embryo assays presented in Fig 1. Is it not possible that the activity observed is BMP7 homodimers, perhaps due to inhibited heterodimer formation by the BMP4 variant?

      Multiple published studies have shown that BMP7 homodimers have weak or no activity in Xenopus ectodermal and mesodermal cells, and that ½ dose of RNA encoding BMP4 and BMP7 together signals at a higher level than does a full dose of RNA encoding either BMP4 or BMP7 alone. We have expanded our description of these published findings (lines 182-187), have included additional details about RNA doses that were injected (line 156, 175, 182) and have added representative examples of experiments in which BMP4 and BMP7 controls were included in a new Figure (Fig. S2).

      (2) In reading the Discussion, I was continually thinking of the stronger phenotype of the S91C mutant compared to the E93G one, although both are discussed together throughout most of the Discussion. Only at the end of the Discussion is the stronger phenotype of S91C discussed with a compelling explanation for the stronger phenotype, not related to the phosphorylation site function. I wonder if it would be better placed earlier in Discussion or at least mentioned the difference in phenotypes that will be discussed later.

      We have moved the possible explanation of differences between Bmp4<sup>S91C</sup> and Bmp4<sup>E93G</sup> mutants to immediately follow the introductory paragraph of the results section.

      (3) Along these same lines, why is it that the E93G exhibits rather normal cleavage at E10.5? Might the mechanisms of cleavage vary in different contexts with phosphorylation-dependent cleavage not functioning at early stages of development? I believe the hypothesis is that it is cleaved due to heterodimerization with BMP7. More discussion of this excellent hypothesis should be provided with clear statements, rather than inferences, if I'm understanding this correctly. For example, I had to read 3 times the first sentence of the last paragraph on p.14 before I understood it. Better to break that sentence down and the one that follows it, so it is easier to understand.

      We have rewritten and expanded the paragraphs describing phenotypic and biochemical evidence for defective homodimer but not heterodimer signaling as suggested (lines 343-375). We have also more explicitly stated the possibility that normal cleavage of BMP4<sup>E93G</sup> in embryonic lystates may be due to a predominance of BMP4/7 heterodimers in early embryonic stages or spatiotemporal differences in phosphorylation-dependent cleavage of BMP4 homodimers (lines 369-372)

      (4) Similarly the last paragraph of the Discussion mentions that the authors provide evidence of BMP4 homodimer signaling. I agree with the authors, but I had to think through the evidence myself. Better if the authors clearly explain the evidence that points to this, as this is a very good point of

      See response to point 3, above. Thank you for these useful suggestions.

      (5) Last sentence, first paragraph on p.11 should be qualified for the E93G mutant to E13.5, since it was normal at E10.5 regarding Figure 4 results.

      Thank you for pointing this out. It has been corrected.

      (6) Skip the PC acronym, since it is only repeated once in the text and hard to remember almost 10 pages later when it is used again.

      We have corrected this.

      (7) In the Discussion, a typo in "a single intramolecular disulfide bond that stabilizes the dimer", should be 'intermolecular'.

      Thank you for catching our switch in the use of inter- and intramolecular. We have corrected this (lines 334-335).

      (8) At times the E93G mutant is referred to having early lethality, often in conjunction with S91C, while other times it is referred to as late lethality. Considering that the homozygotes die postnatally after weaning, most would consider it late lethality. In contrast S91C is indeed an early lethal.

      We have changed the wording in the introduction to state that “mice carrying Bmp4<sup>S91C</sup> or Bmp4<sup>E93G</sup> knock in mutations show embryonic or enhanced postnatal lethality, respectively,… (lines 141-143)” and have removed the word “early” from the title.

      Reviewer #2 (Public review): Summary:

      Kim et al. report that two disease mutations in proBMP4, Ser91Cys and Glu93Gly, which disrupt the Ser91 FAM20C phosphorylation site, block the activation of proBMP4 homodimers. Consequently, analysis of DMZ explants from Xenopus embryos expressing the proBMP4 S91C or E93G mutants showed reduced pSmad1 and tbxt1 expression. The block in BMP4 activity caused by the mutations could be overcome by co-expression of BMP7, suggesting that the missense mutations selectively affect the activity of BMP4 homodimers but not BMP4/7 heterodimers. The expert amphibian tissue transplant studies were extended to in vivo studies in Bmp4S91C/+ and Bmp4E93G/+ mice, demonstrating the impact of these mutations on embryonic development, particularly in female mice, in line with patient studies. Finally, studies in MEFs revealed that the mutations did not affect proBMP4 glycosylation or ER-to-Golgi transport but appeared to inhibit the furin-dependent cleavage of proBMP4 to BMP4. Based on these findings and AI (AlphaFold) modeling of proBMP4, the authors speculate that pSer91 influences access of furin to its cleavage site at Arg289AlaLysArg292.

      Strengths:

      The Xenopus and mouse studies are valuable and elegantly describe the impact of the S91C and E93G disease mutations on BMP signaling and embryonic development.

      Weaknesses:

      The interpretation of how the mutations may disturb the furin-mediated cleavage of proBMP4 is underdeveloped and does not consider all of their data. Understanding how pS91 influences the furin-dependent cleavage at Arg292 seems to be the crux of this work and thus warrants more consideration. Specifically:

      (1) Figure S1 may be significantly more informative than implied. The authors report that BMP4S91D activates pSmad1 only incrementally better than S91C and much less than WT BMP4. However, Fig. S1B does not support the conclusion on page 7 (numbering beginning with title page); "these findings suggest that phosphorylation of S91 is required to generate fully active BMP4 homodimers". The authors rightly note that the S91C change likely has manifold effects beyond inhibiting furin cleavage. The E93G change may also affect proBMP4 beyond disturbing FAM20C phosphorylation. Additional mutation analyses would strengthen the work.

      The major goal of generating and comparing the activity of the S91D mutant with S91C was to control for phosphorylation independent defects cause by the deleterious introduction of a cysteine residue, which might cause aberrant disulfide bonding. We opted to introduce S91D since “phosphomimics” can sometimes approximate the phosphorylated state. S91D has significantly higher activity than S91C (p<0.01) and has a less significant loss of activity (p<0.05) than does S91C (<p<0.0001) relative to wild type BMP4 (Fig. S1), consistent with deleterious effects of the cysteine residue and supporting a possible explanation for the more severe phenotype of S91C vs E93G mice. We have rewritten this section to clarify our interpretation (lines 165-174)and have changed our statement that our activity data “suggest the importance of phosphorylation” to a statement that they are consistent with this possibility (lines 179-180). We do not believe that further mutational analysis using activity assays in Xenopus would shed light on how or whether phosphorylation affects proteolytic activation of BMP4.

      (2) These findings in Figure S1 are potentially significant because they may inform how proBMP4 is protected from cleavage during transit through the TGN and entry into peripheral cellular compartments. Intriguing modeling studies in Figure 6 suggest that pSer91 is proximal to the furin cleavage site. Based on their presentation, pSer91 may contact Arg289, the critical P4 residue at the furin site. If so, might that suggest how pS91 may prevent furin cleavage, thus explaining why the S91D mutation inhibits processing as presented, and possibly how proBMP4 processing is delayed until transit to distal compartments (perhaps activated by a change in the endosomal microenvironment or a Ser91 phosphatase)? Have the authors considered or ruled out these possibilities? In addition to additional mutation analyses of the FAM20C site, moving the discussion of this model to an "Ideas and Speculation" subsection may be warranted.

      The model shown in Fig. 6B proposes the possibility that phosphorylation unmasks (rather than preventing) the furin cleavage motif due to the proximity of Ser91 to the cleavage site (lines 399-402). If S91D truly mimicked phosphorylation, we would predict it would facilitate processing rather than inhibiting it. We do not have data comparing cleavage of S91D relative to wild type BMP4 and have not generated knock in S91D mice to test this idea. While the reviewers questions are intriguing, they cannot be answered by mutational analysis of the FAM20C site and are beyond the scope of the current studies that sought to understand the impact of human pS91C and pE93G mutations and cell biological implications. We have moved the models to an “Ideas and Speculation” subsection as suggested (lines 377-414) since these models are meant to provoke further thought rather than provide definitive answers based on our data.

      (3) The lack of an in vitro protease assay to test the effect of the S91 mutations on furin cleavage is problematic.

      Although we routinely perform in vitro cleavage assays with recombinant furin, we don’t believe they would be informative on how S91 phosphorylation or mutation of this residue impacts cleavage since in vitro synthesized substrate used in these assays is neither dimerized not post-translationally modified, and cleavage would be tested in isolation from the endogenous trafficking environment that we propose influences cleavage.

      Reviewer #2 (Recommendations for the authors):

      (1) The impact of BMPS91A should be determined and paired with the S91D phosphomimic data to reveal if it causes proBMP4 to be cleaved prematurely and disturbs pSmad1 expression. Data for S93G should also be included.

      Our major goal in comparing the activity of S91D with S91C was to control for phosphorylation independent defects cause by the deleterious introduction of a cysteine residue in S91C, which might cause aberrant disulfide bonding. We opted to introduce S91D since “phosphomimics” can sometimes approximate the phosphorylated state. We note that S91D has significantly higher activity than S91C, consistent with deleterious effects of the cysteine residue and supporting a possible explanation for the more severe phenotype of S91C vs E93G mice. We have revised the wording of this section to clarify this. Our models predict that S91D would be cleaved more efficiently than S91C or S91A, if it really mimics the endogenous phosphorylated state, rather than being cleaved prematurely. Our biochemical analysis compares cleavage of endogenous BMP4 in wild type and mutant MEFs. Generation of S91D, S91A or S93G mutant mice to compare cleavage is beyond the scope of the current work.

      (2) Is the distance between pS91 and Arg289 close enough to form a hydrogen bond? If so, might this interaction influence furin access?

      AI modeling does not provide high probability prediction of structures surrounding the furin motif (see Fig. S7) and thus we cannot comment on whether or not these residues are close enough to form a hydrogen bond. We have revised the wording of the discussion to state “This simple model building indicates the possibility of direct contact between pSer91 and Arg289, and that phosphorylation is required for furin to access the cleavage site, although we note that predictions surrounding the furin motif represent low probability conformations (Fig. S7) (lines 399-402).”

      (3) The genotypes in Figure 2 are labeled awkwardly. Consider labeling the headers for the three subsections of panels (A-F, G-L, and M-O) differently.

      We have revised Fig. 2 to clarify that the three subsections of panels are distinct, and to emphasize that the middle subsection represents views of the right and left side of the same embryo.

      (4) The tables should be reformatted. As is, the labeling is frequently cut off, and the numbers of expected and observed progeny should both be stated to aid the reader.

      We thank the reviewer for noting the formatting errors in the tables, which we have corrected. We have also changed the tables so that normal or abnormal mendelian distributions are reported as numbers of observed/expected progeny rather than numbers/percent observed progeny.

      Reviewer #3 (Public review):

      Summary:

      The authors describe important new biochemical elements in the synthesis of a class of critical developmental signaling molecules, BMP4. They also present a highly detailed description of developmental anomalies in mice bearing known human mutations at these specific elements.

      Strengths:

      Exceptionally detailed descriptions of pathologies occurring in mutant mice. Novel findings regarding the interaction of propeptide phosphorylation and convertase cleavage, both of which will move the field forward. Provocative hypothesis regarding furin access to cleavage sites, supported by Alphafold predictions.

      Weaknesses:

      Figure 6A presents two testable models for pre-release access of furin to cleavage sites since physical separation of enzyme from substrate only occurs in one model; could immunocytochemistry resolve?

      Available reagents are not sensitive enough to detect endogenous furin and BMP4 with high resolution. Because PC/substrate interactions are transient, whereas the bulk of furin and BMP4 is distributed throughout the secretory pathway, it is not possible to co-immunolocalize furin and BMP4 in vivo at present. Studies using more advanced cell biological techniques such along with tagged proteins may enable us to test these hypotheses in the future.

      Reviewer #3 (Recommendations for the authors):

      This interesting paper presents new data on an important family of developmental signaling molecules, BMPs. Mutations at FAM20C consensus sites within BMP prodomains are known to cause birth defects. The authors have here explored differential effects of human mutations on hetero- and homodimer activity and maturation, issues that may well arise during human development. In addition to demonstrating the profound effect of these mutations on development in Xenopus and mice, the authors also show differential processing of BMP4 precursors bearing these mutations in MEF cells prepared from mutant embryos. Finally, they show that FAM20C plays a role in BMP4 prodomain processing with quite differing outcomes in homo- vs heterodimers, which they suggest is due to structural differences impacting furin access. While this latter idea remains speculative due to the lack of crystal structures (models are based on Alphafold) it is a highly promising line of work.

      The data are beautifully presented and will be of clear interest to all developmental biologists. Certain cell biology results may also extrapolate to other phosphorylated precursor molecules undergoing the interesting (and as yet unexplained) phenomenon of convertase cleavage immediately before secretion, for example, FGF23. I have only a few minor comments regarding the presentation, which is remarkably clear.

      (1) The introduction of BMP7 in the Abstract is abrupt. It should be described as a preferred dimerization partner for BMP4.

      Thank you for noting this. We have revised the first sentence of the abstract to better introduce BMP7(lines 49-50).

      (2) In Figure 1A, what is the small light green box?

      This is a small fragment released from the prodomain by the second cleavage. We have clarified this in the introduction (lines 112-114) and in the legend to Figure 1 (lines 758-759).

      (3) In the Discussion it might be relevant to mention that FAM20C propeptide is not cleaved by convertases but by S1P (Chen 2021).

      We have added this information to clarify (lines 394-396).

      (4) Figure 3, define VSD; Figure 5, Endo H removes sugars only from immature (nonsialylated) sugars, not from all chains as implied. More importantly, EndoH and PNGase remove N-linked sugars, yet Results refer only to O-linked glycosylation.

      Thank you for noting these oversights. We have defined VSD in Figure 3. We have also revised the headers for Fig. 5 and for the relevant subsection of the results to include N-linked glycosylation and note in the results that EndoH removes only immature N-linked carbohydrates (lines 301-304).

      (5) Figure 5- for clarity, I suggest it be broken up into two larger panels labeled "Embryos" and "MEFs"

      Thank you for this suggestion, we have subdivided the Figure into two panels.

      (6) Figure 6A presents two testable models for pre-release access of furin to cleavage sites since the physical separation of the enzyme from substrate only occurs in one model; could confocal immunocytochemistry resolve?

      Available reagents are not sensitive enough to detect endogenous furin and BMP4 with high resolution and PC/substrate interactions are transient whereas the bulk of both furin and BMP4 is in transit through the secretory pathway. For these reasons it is not possible to co-immunolocalize furin and BMP4 in vivo. Future studies using advanced cell biological techniques may enable us to test these hypotheses in the future.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript uses the eye lens as a model to investigate basic mechanisms in the Fgf signaling pathway. Understanding Fgf signaling is of broad importance to biologists as it is involved in the regulation of various developmental processes in different tissues/organs and is often misregulated in disease states. The Fgf pathway has been studied in embryonic lens development, namely with regards to its involvement in controlling events such as tissue invagination, vesicle formation, epithelium proliferation and cellular differentiation, thus making the lens a good system to uncover the mechanistic basis of how the modulation of this pathway drives specific outcomes. Previous work has suggested that proteins, other than the ones currently known (e.g., the adaptor protein Frs2), are likely involved in Fgfr signaling. The present study focuses on the role of Shp2 and Shc1 proteins in the recruitment of Grb2 in the events downstream of Fgfr activation.

      Strengths:

      The findings reveal that the juxtamembrane region of the Fgf receptor is necessary for proper control of downstream events such as facilitating key changes in transcription and cytoskeleton during tissue morphogenesis. The authors conditionally deleted all four Fgfrs in the mouse lens that resulted in molecular and morphological lens defects, most importantly, preventing the upregulation of the lens induction markers Sox2 and Foxe3 and the apical localization of F-actin, thus demonstrating the importance of Fgfrs in early lens development, i.e. during lens induction. They also examined the impact of deleting Fgfr1 and 2, on the following stage, i.e. lens vesicle development, which could be rescued by expressing constitutively active KrasG12D. By using specific mutations (e.g. Fgfr1ΔFrs lacking the Frs2 binding domain and Fgfr2LR harboring mutations that prevent binding of Frs2), it is demonstrated that the Frs2 binding site on Fgfr is necessary for specific events such as morphogenesis of lens vesicle. Further, by studying Shp2 mutations and deletions, the authors present a case for Shp2 protein to function in a context-specific manner in the role of an adaptor protein and a phosphatase enzyme. Finally, the key surprising finding from this study is that downstream of Fgfr signaling, Shc1 is an important alternative pathway - in addition to Shp2 - involved in the recruitment of Grb2 and in the subsequent activation of Ras. The methodologies, namely, mouse genetics and state-of-the-art cell/molecular/biochemical assays are appropriately used to collect the data, which are soundly interpreted to reach these important conclusions. Overall, these findings reveal the flexibility of the Fgf signaling pathway and it downstream mediators in regulating cellular events. This work is expected to be of broad interest to molecular and developmental biologists.

      Weaknesses:

      A weakness that needs to be discussed is that Le-Cre depends on Pax6 activation, and hence its use in specific gene deletion will not allow evaluation of the requirement of Fgfrs in the expression of Pax6 itself. But since this is the earliest Cre available for deletion in the lens, mentioning this in the discussion would make the readers aware of this issue.

      Reviewer #2 (Public review):

      Summary

      I have reviewed the revised manuscript submitted by Wang et al., which is entitled "Shc1 cooperates with Frs2 and Shp2 to recruit Grb2 in FGF-induced lens development". In this paper, the authors first examined lens phenotypes in mice with Le-Cre-mediated knockdown (KD) of all four FGFR (FGFR1-4), and found that pERK signals, Jag1 and foxe3 expression are absent or drastically reduced, indicating that FGF signaling is essential for lens induction. Next, the authors examined lens phenotypes of FGFR1/2-KD mice and found that lens fiber differentiation is compromised and that proliferative activity and cell survival are also compromised in lens epithelium. Interestingly, Kras activation rescues defects in lens growth and lens fiber differentiation in FGFR1/2-KD mice, indicating that Ras activation is a key step for lens development, downstream of FGF signaling. Next, the authors examined the role of Frs2, Shp2 and Grb2 in FGF signaling for lens development. They confirmed that lens fiber differentiation is compromised in FGFR1/3-KD mice combined with Frs2-dysfunctional FGFR2 mutants, which is similar to lens phenotypes of Grb2-KD mice. However, lens defects are milder in mice with Shp2YF/YF and Shp2CS mutant alleles, indicating that involvement of Shp2 is limited for the Grb2 recruitment for lens fiber differentiation. Lastly, the authors showed new evidence on the possibility that another adapter protein, Shc1, promotes Grb2 recruitment independent of Frs2/Shp2-mediated Grb2 recruitment.

      Strength

      Overall, the manuscript provides valuable data on how FGFR activation leads to Ras activation through the adapter platform of Frs2/Shp2/Grb2, which advances our understanding on complex modification of FGF signaling pathway. The authors applied a genetic approach using mice, whose methods and results are valid to support the conclusion. The discussion also well summarizes the significance of their findings.

      Weakness

      The authors found that the new adaptor protein Shc1 is involved in Grb2 recruitments in response to FGF receptor activation. However, the main data on Shc1 are only histological sections and statistical evaluation of lens size. In the revised manuscript, the authors did not answer my major concern that cellular-level data are missing, which is not fully enough to support their main conclusion on the involvement of Shc1 in Grb2 recruitment of FGF signaling for lens development. Since the title of this manuscript is that Shc1 cooperates with Frs2 and Shp2 to recruit Grb2 in FGF-induced lens development, it is important to provide the cellular-level evidence on Shc1.

      Reviewer #3 (Public review):

      Summary:

      The manuscript entitled "Shc1 cooperates with Frs2 and Shp2 to recruit Grb2 in FGF-induced lens development" by Wang et al., investigates the molecular mechanism used by FGFR signaling to support lens development. The lens has long been known to depend on FGFR-signaling for proper development. Previous investigations have demonstrated the FGFR signaling is required for embryonic lens cell survival and for lens fiber cell differentiation. The requirement of FGFR signaling for lens induction has remained more controversial as deletion of both Fgfr1 and Fgfr2 during lens placode formation does not prevent the induction of definitive lens markers such as FOXE3 or αA-crystallin. Here the authors have used the Le-Cre driver to delete all four FGFR genes from the developing lens placode demonstrating a definitive failure of lens induction in the absence of FGFR-signaling. The authors focused on FGFR1 and FGFR2, the two primary FGFRs present during early lens development and demonstrated that lens development could be significantly rescued in lenses lacking both FGFR1 and FGFR2 by expressing a constitutively active allele of KRAS. They also showed that the removal of pro-apoptotic genes Bax and Bak could also lead to a substantial rescue of lens development in lenses lacking both FGFR1 and FGFR2. In both cases, the lens rescue included both increased lens size and the expression of genes characteristic of lens cells.

      Significantly the authors concentrated on the juxtamembrane domain, a portion of the FGFRs associated with FRS2. Previous investigations have demonstrated the importance of FRS2 activation for mediating a sustained level of ERK activation. FRS2 is known to associate both with GRB2 and SHP2 to activate RAS. The authors utilized a mutant allele of Fgfr1, lacking the entire juxtamembrane domain (Fgfr1ΔFrs) and an allele of Fgfr2 containing two-point mutations essential for Frs2 binding (Fgfr2LR). When combining three floxed alleles and leaving only one functional allele (Fgfr1ΔFrs or Fgfr2LR) the authors got strikingly different phenotypes. When only the Fgfr1ΔFrs allele was retained, the lens phenotype matched that of deleting both Fgfr1 and Fgfr2. However, when only the Fgfr2LR allele was retained the phenotype was significantly milder, primarily affecting lens fiber cell differentiation, suggesting that something other than FRS2 might be interacting with the juxtamembrane domain to support FGFR signaling in the lens. The authors also deleted Grb2 in the lens and showed that the phenotype was similar to that of the lenses only retaining the Fgfr2LR allele, resulting a failure of lens fiber cell differentiation and decreased lens cell survival. However, mutating the major tyrosine phosphorylation site of GRB2 did not affect lens development. The authors additionally investigated the role of SHP2 in lens development by either deleting SHP2 or by making mutations in the SHP2 catalytic domain. The deletion of the SHP2 phosphatase activity did not affect lens development as severely as total loss of SHP2 protein, suggesting a function for SHP2 outside of its catalytic activity. Although the loss of Shc1 alone has only a slight effect on lens size and pERK activation in the lens, the authors showed that the loss of Shc1 exacerbated the lens phenotype in lenses lacking both Frs2 and Shp2. The authors suggest that SHC1 binds to the FGFR juxtamembrane domain allowing for the recruitment of GRB2 in independently of FRS2.

      Strengths:

      (1) The authors used a variety of genetic tools to carefully dissect the essential signals downstream of FGFR signaling during lens development.

      (2) The authors made a convincing case that something other than FRS2 binding mediates FGFR signaling in the juxtamembrane domain.

      (3) The authors demonstrated that despite the requirement of both the adaptor function and phosphatase activity of SHP2 are required for embryonic survival, neither of these activities is absolutely required for lens development.

      (4) The authors provide more information as to why FGFR loss has a phenotype much more severe than the loss of FRS2 alone during lens development.

      (5) The authors followed up their work analyzing various signaling molecules in the context of lens development with biochemical analyses of FGF-induced phosphorylation in murine embryonic fibroblasts (MEFs).

      (6) In general, this manuscript represents a Herculean effort to dissect FGFR signaling in vivo with biochemical backing with cell culture experiments in vitro.

      Weaknesses:

      (1) The authors demonstrate that the loss of FGFR1 and FGFR2 can be compensated by a constitutive active KRAS allele in the lens and suggest that FGFRs largely support lens development only by driving ERK activation. However, the authors also saw that lens development was substantially rescued by preventing apoptosis through the deletion of BAK and BAX. To my knowledge, the deletion of BAK and BAX should not independently activate ERK. The authors do not show whether ERK activation is restored in the BAK/BAX deficient lenses. Do the authors suggest the FGFR3 and/or FGFR4 provide sufficient RAS and ERK activation for lens development when apoptosis is suppressed? Alternatively, is it the survival function of FGFR-signaling as much as a direct effect on lens differentiation?

      (2) Do the authors suggest that GRB2 is required for RAS activation and ultimately ERK activation? If so, do the authors suggest that ERK activation is not required for FGFR-signaling to mediate lens induction? This would follow considering that the GRB2 deficient lenses lack a problem with lens induction.

      (3) The increase in p-Shc is only slightly higher in the Cre FGFR1f/f FGFR2r/LR than in the FGFR1f/Δfrs FGFR2f/f. Can the authors provide quantification?

      (4) The authors have not shown directly that Shc1 binds to the juxtamembrane region of either Fgfr1 or Fgfr2.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      In the revised manuscript, the authors have responded to my recommendations to revise the original manuscript, except for three suggestions below.

      (1) The original recommendation: Results (page 6, line 8): The authors mentioned "we observed .... expression of Foxe3 in ...mutant lens cells (Figure 1E, arrows). However, Foxe3-expressing lens cells are a very small population in Figure 1E. It is important to state the decreased number of Foxe3-expressing lens cells in FGFR1/2 mutants. In addition, I would like to request the authors to show histograms indicating sample size and statistical analysis for marker expression: Foxe3 (Figure 1E), Prox1 and aA-crystallin (Fig. 1F), cyclin D1 and TUNEL (Fig. 1G) and pmTOR and pS6 (Supplementary figure 1B).

      Author's response: We added a statement indicating that the number of Foxe3-expressing cells is reduced in FGFR1/2 mutants, which is now quantified in Fig. 1H. Quantifications for Cyclin D1 and TUNEL are now shown in Fig. 1I and J, respectively. However, we chose not to quantify Prox1, αA-crystallin, pmTOR, and pS6, as the FGFR1/2 mutants showed no staining for these markers.<br /> My recommendation: Although the authors have responded to revise the quantification of Foxe3-expressing cells, Cyclin D1 and TUNEL, they did not conduct statistical analysis of Prox1, αA-crystallin, pmTOR, and pS6, because of absence of these marker signals. I understand that no signal makes statistical analysis no meaningful. However, it is still important to indicate how many the authors repeated experiments to confirm the same result. Please indicate the number of biological replicates or independent experiments in the figure legends, for example "Biological replicates, n=3" or "Three independent experiments show similar results". As for pS6 labeling, there seems to be a weak signal in Supplementary Figure 1B, so please show statistical analysis to indicate its histogram.

      We have added the number of biological replicates for Prox1 and αA staining in the legend of Fig.1. The review is correct that there is weak staining of pS6, and also pmTOR. The quantification of pS6 and pmTOR staining are now shown in Supplementary Fig. 1C and D.

      (2) The original recommendation: Results (page 6, line 19- page 7, line 6): The authors showed that inducible expression of constitutive active Kras, KrasG12D, using Le-Cre, recovered lens size to the half level of wild-type control. However, in the lens of mice with Le-Cre; FGFR1/2f/f; LSL-KrasG12D, pERK was detected in the most posterior edge of the lens fiber core, whereas pERK was detected in the broader area of the lens in control. Furthermore, pMEK was detected in the whole lens of mice with Le-Cre; FGFR1/2f/f; and LSL-KrasG12D, whereas pMEK was detected only in the lens epithelial cells at the equator. So, the spatial profile of pERK and pMEK expression was different from those of wild-type, although the authors observed that Prox1 and Crystallin expression are normally induced in the lens of mice with Le-Cre; FGFR1/2f/f; LSL-KrasG12D. I wonder whether the lens normally develops in mice with Le-Cre; LSL-KrasG12D? Is the lens growth enhanced in mice with Le-Cre; LSL-KrasG12D? Please add the panels of mice with Le-Cre; LSL-KrasG12D in Figure 2B and 2C. In addition, I wonder whether apoptosis is suppressed in the lens of mice with Le-Cre; FGFR1/2f/f; LSL-KrasG12D?

      Authors' response: Response: As we previously reported (Developmental Biology 355, 2011, 12-20), Le-Cre; LSL-KrasG12D did not lead to enhanced lens growth. While we agree that including images of Le-Cre; LSL-KrasG12D as controls in Fig. 2B and C and evaluating apoptosis in Le-Cre; FGFR1/2f/f; LSL-KrasG12D mutants would be appropriate, we regretfully no longer have these animals available to conduct these experiments.

      My recommendation: I would like to suggest the authors conduct these experiments again, because the recovery of lens formation by Bax/Bak KD in Fgfr1/2 KD mice (Fig. 2F) suggests that KrasG12D activates the AKT-mediated cell survival pathway as well as that MEK/MAPK pathway downstream of FGF signaling pathway. Regarding the availability of mouse strains, in general, it is necessary to keep animal strains available for sincere response to reviewers' suggestions. Please clarify why these strains are now not available and justify the reason in the response to reviewers' recommendations.

      We acknowledge the reviewer's suggested experiments. However, our research utilized multiple mouse strains that are costly to maintain, a challenge that was exacerbated during and after the COVID-19 pandemic. Unfortunately, we no longer have access to the specific mouse strains required to conduct these additional studies.

      (3) The original recommendation: Figures 7E, and 7F: The authors showed that lens morphology and lens size evaluation in genetic combinations: control, Frs2/Shc1 KD, Frs2/Shp2 KD, and Frs2/Shp2/Shc1 KD. However, I would like to request the authors to show more detailed data in these genetic combinations, for example, pERK, foxe3, Maf, Prox1, Jag1, p57, cyclin D3, g-crystallin, and TUNEL.

      Authors' response: Unfortunately, we no longer have these mutant mice to perform these detailed staining.

      My recommendation: As I mentioned in the statement on weakness above, it is important to provide the cellular-level evidence to support the main conclusion on the involvement of Shc1 in Grb2 recruitment of FGF signaling for lens development, because this is the main novel finding in this manuscript. Regarding the availability of mouse strains, it is generally necessary to keep animal strains available for sincere response to reviewers' suggestions. Please clarify why these strains are now not available and justify the reason in the response to the reviewers' suggestions.

      We regret that we did not anticipate these experiments suggested by the reviewer. Unfortunately, we are unable to perform these studies as we no longer maintain the required mouse strains in our colony.

      Reviewer #3 (Recommendations for the authors):

      The changes made by the authors improved the manuscript. I have no further suggestions.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In their manuscript, Kong Fang et al describe a robust pipeline for the isolation of small extracellular vesicles through a combination of size exclusion chromatography and miniaturized density gradient separation. Subsequently, they prove that the method is reproducible and suitable for small-volume operations while at the same time not compromising the quality of vesicles.

      Strengths:

      The paper narrates a robust method for purifying high-quality sEVs from small amounts of blood plasma. They also demonstrate that through this approach, they can derive sEVs without compromising the protein composition, integrity of the vesicles, or contamination with other proteins or lipids.

      Weaknesses:

      The paper is a nice summary of how to enrich sEVs from blood samples. Although well performed and substantiated with data, the paper primarily deals with method development and optimisation.

      We agree with the reviewer's assessment that this paper primarily focuses on the development and optimization of a method. Using this robust technique for isolating small extracellular vesicles (sEVs) from small blood volumes, our future research will investigate sEVs isolated from clinical samples, with a particular focus on their role in various diseases.

      Reviewer #2 (Public Review):

      Summary:

      In this work, the authors manage to optimize a simple and rapid protocol using SEC followed by DGCU to isolate sEVs with adequate purity and yield from small volumes of plasma. Isolated fractions containing sEVs using SEC, DGCU, SEC-DGCU, and DGCU-SEC are compared in terms of their yield, purity surface protein profile, and RNA content. Although the combined use of these methodologies has already been evaluated in previous works, the authors manage to adapt them for the use of small volumes of plasma, which allows working in 1.5 mL tubes and reducing the centrifugation time to 2 hours.

      The authors finally find that although both the SEC-DGCU and DGCU-SEC combinations achieve isolates with high purity, the SEC-DGCU combination results in higher yields.

      This work provides an interesting tool for the rapid obtention of sEVs with sufficient yield and purity for detailed characterization which could be very useful in research and clinical therapy.

      Strengths:

      - The work is well-written and organized.

      - The authors clearly state the problem they want to address, that is, optimizing a method that allows sEV to be isolated from small volumes of plasma.

      - Although these methodologies have been tested in previous works, the authors manage to isolate sEVs of high purity and good performance through a simple and fast methodology.

      - The characteristics of all isolated fractions are exhaustively analyzed through various state-of-the-art methodologies.

      - They present a good interpretation of the results obtained through the methodologies used.

      Weaknesses:

      - Lack of references that support some of the results obtained.

      - Although this work focuses on comparing different techniques and their combinations to find an optimal option, the authors do not use any statistical method that reliably shows the differences between these techniques, except when repeatability is measured.

      We appreciate the reviewer's insightful comments and will incorporate the suggested missing references. We acknowledge that we did not perform statistical analyses when comparing the differences among the three methods. Nevertheless, the superiority of the SEC-DGUC method is evident from observations based on several independent characterization methods, including Cryo-EM, TEM, western blot, and total RNA quantification.

      Firstly, repeated Cryo-EM observations consistently confirm that the SEC-alone method shows severe lipoprotein contamination while the SEC-DGUC method drastically reduces such lipoprotein contamination. In comparing the SEC-DGUC and DGUC-SEC methods, multiple independent characterization methods showed that the SEC-DGUC method yields significantly greater quantity of sEVs: 1) The western blot experiment showed much higher signal intensity for all four tested sEV markers (CD9, CD63, CD81, and TSG101), with estimated concentrations approximately 2.1, 2.1, 4.7, and 4.2 times higher than the DGUC-SEC method. 2) The total RNA analysis showed that SEC-DGUC-1 contained more than 4 times the total amount of RNA compared to DGUC-SEC-PF. 3) Establishing the normalization baseline, particle size distributions in SEC-DGUC-1 and DGUC-SEC-PF measured by TEM were found to be similar, suggesting comparable purity and distribution of the captured sEVs. For comparison purposes, within each independent characterization method, the same plasma source and total plasma volume were used, while across different methods, different plasma sources were used. These independent characterization methods have consistently demonstrated the superiority of the SEC-DGUC method over the DGUC-SEC or SEC-alone methods.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      In my opinion, this work is elegantly designed and supported by data, which would motivate more studies related to blood-derived microvesicles in the context of infectious and systemic diseases. Overall, the manuscript is well-written and explained in sufficient detail. I only have minor comments.

      (1) Recruitment of volunteers for blood/plasma collection: there is a need for a statement that this was in accordance with ethical and biosafety regulations of the Institute/Clinic.

      We added two sentences at the beginning of the Blood Collection section (under Materials and methods): “All procedures involving peripheral blood specimens were approved by the Singapore National Health Group Domain Specific Review Board (the central ethics committee) and were mutually recognized by the Nanyang Technological University Institutional Review Board (IRB#2018/00671). All blood specimens were de-identified prior to their use in the experiments.”

      (2) Since this is a method development and validation article, it would be good to include an image of the iodixanol gradient with the high-density sEV zone, after centrifugation.

      We have incorporated an image after centrifugation in Supplementary Figure 3.

      (3) Although several sEV markers are shown in Figure 7A, flotillin is missing in this figure which was part of Figure 6B. Does flotillin show a different pattern? Flotillin is a DRM-associated marker, and hence may behave differently, would be interesting to add any insights.

      We appreciate the reviewer’s careful observation. In Figure 6B, Flotillin was used to confirm the presence of sEVs in different density zones. However, for the purpose of comparing the yield between the SEC-DGUC and DGUC-SEC methods, as shown in Figure 7A, Flotillin was not included in the western blot analysis. No obvious pattern changes were observed in other sEV markers tested in both Figures 6B and 7A.   

      (4) Methods section of LC/MS analysis- which protein database was used for protein identification?

      We added the following sentence at the end of the LC/MS analysis section: “The protein database used for protein identification was Uniprot Human.”

      Reviewer #2 (Recommendations For The Authors):

      In line 43 some references are needed.

      We added this reference: EL Andaloussi, S., Mäger, I., Breakefield, X. et al. Extracellular vesicles: biology and emerging therapeutic opportunities. Nat Rev Drug Discov 12, 347–357 (2013). https://doi.org/10.1038/nrd3978

      In line 107, please avoid using short forms such as "it's".

      We have revised that to “it is.”

      In line 153: "...separates low-density particles from those of high density, but a considerable amount of..." the word "but" should not be in the sentence.

      We have removed “but” in this sentence.

      In line 181 the authors establish that "Notably, SEC-PF exhibited a high level of ApoB and low expression of sEV markers." Is there any explanation for this?

      SEC-PF represents the eluate from the SEC step, collected before the DGUC step. This fraction contains a mixture of lipoproteins and sEVs. Due to the overwhelming abundance of lipoproteins compared to sEVs, the western blot predictably shows a high level of ApoB with minimal expression of sEV markers. This highlights that SEC alone effectively reduces plasma protein content but does not efficiently remove lipoproteins. Figure 6C further illustrates this point, as cryo-EM images of SEC-PF reveal the presence of sEVs, which are vastly outnumbered by lipoproteins.

      In line 198, the sentence "Theoretically, the DGUC-SEC protocol should also effectively isolate sEVs from plasma" need to be supported by references.

      See for instance:

      - Holcar M, Ferdin J, Sitar S, Tušek-Žnidarič M, Dolžan V, Plemenitaš A, Žagar E, Lenassi M. 2020. Enrichment of plasma extracellular vesicles for reliable quantification of their size and concentration for biomarker discovery. Sci Rep 10:21346. doi:10.1038/s41598-020-78422-y.

      - Jia Y, Yu L, Ma T, Xu W, Qian H, Sun Y, Shi H. 2022. Small extracellular vesicles isolation and separation: Current techniques, pending questions and clinical applications. Theranostics 12:6548-6575. doi:10.7150/thno.74305

      - Vergauwen G, Dhondt B, Van Deun J, De Smedt E, Berx G, Timmerman E, Gevaert K, Miinalainen I, Cocquyt V, Braems G, Van den Broecke R, Denys H, De Wever O, Hendrix A. 2017. Confounding factors of ultrafiltration and protein analysis in extracellular vesicle research. Sci Rep 7:2704. doi:10.1038/s41598-017-02599-y

      We have added this reference: Holcar M, Ferdin J, Sitar S, Tušek-Žnidarič M, Dolžan V, Plemenitaš A, Žagar E, Lenassi M. 2020. Enrichment of plasma extracellular vesicles for reliable quantification of their size and concentration for biomarker discovery. Sci Rep 10:21346. https://doi.org/10.1038/s41598-020-78422-y.  

      In line 309 the authors establish that "NTA measured size distributions displayed well-overlapped histograms of particles". It is possible for the authors to analyze this overlapping using some statistical test as a chi-squared test?

      We have conducted a statistical analysis of the histogram similarities using the Jensen-Shannon Divergence (JSD) method. This is reflected in the manuscript under the results section, “Repeatability and reliability of the SEC-DGUC protocol”, where we state: “We then compared size distributions for each plasma fraction using Jensen-Shannon Divergence (JSD). The JSD values, which are well below 0.1 (Figure 10B), indicate a consistent population of isolated particles, as further supported by Supplementary Figure 8.” Additionally, we included JSD values in the legend of Figure 10B: “JSD values for SEC-DGUC-1 to 4 are 0.015, 0.006, 0.001, and 0.002, indicating strong similarities among the histograms.” These additions demonstrate the robustness and repeatability of the SEC-DGUC protocol.

      In line 360, "lasts ~ 16 hours or more." This statement needs a reference that supports this time.

      We have added this reference: Vergauwen, G. et al. Robust sequential biophysical fractionation of blood plasma to study variations in the biomolecular landscape of systemically circulating extracellular vesicles across clinical conditions. J Extracell Vesicles 10, e12122 (2021).

      In line 399, the reference format is different from the previously used format.

      This is corrected. We thank the reviewer for this careful examination.

      Line 466: This sentence is not quite clear. It can be understood that for every 0.5 mL of plasma, 2 mL of particle fraction are obtained and that for 6 mL of plasma, this method will give a total volume of 24 mL. However, it is not clear what is meant by the fact that it has been concentrated to 6 mL. While one can assume that those final 6 mL concentrates come from the initial 24 mL, perhaps the way this sentence was worded was not appropriate. I would recommend rewriting it for a simpler interpretation of how this method was performed.

      We have changed the sentence to: “For the DGUC experiment using the 12 ml tube, 24 ml of PFs were obtained from 6 ml of plasma and subsequently concentrated to 6 ml. The 6 ml of concentrated PFs were then transferred to a Beckman Coulter ultra-clear centrifuge tube (344059, Beckman Coulter, USA) for further processing.”

      Line 519: The authors established a second dilution to avoid absorbance values above 1.2. Is there any justification for this value, taking into account that the Lambert-Beer law presents more precision in the absorbance range of 0.2 to 0.8?

      We have added this reference: https://diagnostic.serumwerk.com/wp-content/uploads/2021/05/V05-Serumwerk.pdf

      Line 519-520: "Also included were water and 0.25 M sucrose as blanks". Perhaps authors could consider rephrasing this sentence.

      We have changed the sentence to: “The absorbance measurements were made against water and 0.25 M sucrose blanks.”

      In line 520, the sentence must say "each sample was made by triplicate".

      We have changed the sentence to: “Each sample was prepared by triplicate to reduce error.” We thank the reviewer for this suggestion.

      Line 673: The phrase "0.1% formic acid in 100% ACN" would be better, in my opinion, if it said "0.1% formic acid in ACN".

      Yes, these two expressions have the same meaning. However, to ensure clarity, we have updated the description to “0.1% formic acid in ACN.”. We thank reviewer for this suggestion.

      Supplementary Figure 1: in the Figure caption there is an error in the numbering: at the end, where it is written (E), it should be (F). Please, correct this.

      We have made the necessary correction and sincerely appreciate the reviewer’s attentiveness.

      Supplementary Figure 5: Some sEVs are hard to visualize due to poor image resolution. Is there any possibility for the authors to enhance these images?

      We thank the reviewer for this valuable comment. To improve the visual clarity of the images, we have opted to display four sub-figures instead of nine.

    1. Author response:

      We appreciate the effort the reviewers have put into evaluating our work, and will take the opportunity to revise and improve our submission. In response to the reviewer's comments, we will carefully revisit our manuscript to address the concerns they have raised. Specifically, we will ensure that our revised version is coherent with our annotations and public databases, clarify any discrepancy between the investigated proteins and gene models, and re-examine our discussion of the evolutionary implications in light of their suggestions. We are confident that these revisions will strengthen our work and provide a clearer understanding of our research findings.

    1. Author response:

      We sincerely thank all three reviewers for their time, comments, and valuable suggestions, which will help improve our manuscript. Below, we provide preliminary remarks addressing some of the key issues that have been raised.

      Reviewer 1:

      We agree with the reviewer on the challenge of accurately mapping reads to multigene families. We carefully considered this issue and addressed it by evaluating the performance of multiple aligners using simulated RNA-seq reads. Our results indicate that kallisto performs particularly well in this context, outperforming widely used aligners such as Bowtie2 and STAR. This is likely due to kallisto’s expectation-maximization (EM) algorithm (described in the Materials and Methods section), which employs a probabilistic model to assign reads from similar transcripts. Previous studies have demonstrated the effectiveness of this approach in quantifying highly repetitive sequences, such as transposons (doi.org/10.1093/bioinformatics/btv422). In the revised manuscript, we are considering the inclusion of a supplementary figure to further support the selection of the mapping algorithm.

      Reviewer 2:

      We believe that obtaining experimental evidence on the influence of multiple multigene families would represent a significant advancement in the field. However, we would like to emphasize that this is a short communication centered on a specific and biologically relevant observation within a single multigene family. The manuscript is not intended to comprehensively address all aspects of the experiment but rather to highlight what we consider an important biological phenomenon with potential functional implications.

      The influence of phenotypic heterogeneity and its possible advantages under environmental pressures has been previously proposed for Trypanosoma cruzi, related trypanosomatids, and other biological systems, ranging from bacteria to tumors (Seco-Hidalgo 2015, doi: 10.1098/rsob.150190 and Luzak 2021, doi: 10.1146/annurev-micro-040821-012953, for a comprehensive review on this topic). While the reviewer is correct in noting that our model does not demonstrate a functional role for TcS heterogeneity, the experimental approaches required to address this question in a large multigene family are highly complex and beyond the scope of this study. However, we acknowledge the importance of clarifying that the proposed functional implications remain speculative, so we will revise the manuscript accordingly.

      As the reviewer suggests, in the revised version of the manuscript, we will include additional analyses on the characteristics of frequently expressed TcS genes to identify common features that may explain their expression patterns.

      We appreciate the reviewer’s comments and suggestions regarding the clarity of methodological choices and the explanation of key concepts. Accordingly, we will refine the description of our methodology and ensure that our figures are more intuitive and self-explanatory.

      Reviewer 3:

      We recognize the limitations imposed by gene dropout in our data, as highlighted by the reviewer. In the manuscript, we have aimed to be transparent about this issue and discussed its impact in two separate sections (lines 110–121 and 175–181). To enhance clarity, we will revise these paragraphs to provide a more comprehensive discussion of this limitation. Unfortunately, gene dropout is an inherent limitation of 10x genomics data. Trypanosomatids are not an exception in this regard, and the general metrics of the single-cell RNA-seq data in other reports are equivalent to those obtained in our experiment.

      Despite this important limitation, we believe that our comparative analyses (the contrast between TcS and ribosomal protein expression) provide valuable insights into a biological phenomenon with potential functional relevance for the parasite. Furthermore, we are actively working on generating single-cell RNA-seq data using alternative methodologies that improve gene dropout rates. We anticipate that these future studies will help clarify the extent of the phenomenon described in this work.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Liu et al., present glmSMA, a network-regularized linear model that integrates single-cell RNA-seq data with spatial transcriptomics, enabling high-resolution mapping of cellular locations across diverse datasets. Its dual regularization framework (L1 for sparsity and generalized L2 via a graph Laplacian for spatial smoothness) demonstrates robust performance of their model and offers novel tools for spatial biology, despite some gaps in fully addressing spatial communication.

      Overall, the manuscript is commendable for its comprehensive benchmarking across different spatial omics platforms and its novel application of regularized linear models for cell mapping. I think this manuscript can be improved by addressing method assumptions, expanding the discussion on feature dependence and cell type-specific biases, and clarifying the mechanism of spatial communication.

      The conclusions of this paper are mostly well supported by data, but some aspects of model development and performance evaluation need to be clarified and extended.

      We thank the reviewer for their thoughtful comments. We will clarify the model assumptions and the feature selection process to make it more understandable. To clarify, the performance of glmSMA does not depend on cell type. For some rare cell types, the small number of cells can lead to a drop in performance. To better illustrate our results and reduce cell type-specific biases, we will shuffle and randomly sample the cell types.

      (1) What were the assumptions made behind the model? One of them could be the linear relationship between cellular gene expression and spatial location. In complex biological tissues, non-linear relationships could be present, and this would also vary across organ systems and species. Similarly, with regularization parameters, they can be tuned to balance sparsity and smoothness adequately but may not hold uniformly across different tissue types or data quality levels. The model also seems to assume independent errors with normal distribution and linear additive effects - a simplification that may overlook overdispersion or heteroscedasticity commonly observed in RNA-seq data.

      Thank you for this comment. We acknowledge that the non-linear relationships can be present in complex tissues and may not be fully captured by a linear model. 

      Our choice of a linear model was guided by an investigation of the relationship in the current datasets, which include intestinal villus, mouse brain, and fly embryo.

      There is a linear correlation between expression distance and physical distance [Nitzan et al]. Within a given anatomical structure, cells in closer proximity exhibit more similar expression patterns. In tissues where non-linear relationships are more prevalent—such as the human PDAC sample—our mapping results remain robust. We acknowledge that we have not yet tested our algorithm in highly heterogeneous regions like the liver, and we plan to include such analyses in future work if necessary. Regarding the regularization parameters, we agree that the balance between sparsity and smoothness is sensitive to tissue-specific variation and data quality. In our current implementation, we explored a range of values to find robust defaults.

      (2) The performance of glmSMA is likely sensitive to the number and quality of features used. With too few features, the model may struggle to anchor cells correctly due to insufficient discriminatory power, whereas too many features could lead to overfitting unless appropriately regularized. The manuscript briefly acknowledges this issue, but further systematic evaluation of how varying feature numbers affect mapping accuracy would strengthen the claims, particularly in settings where marker gene availability is limited. A simple way to show some of this would be testing on multiple spatial omics (imaging-based) platforms with varying panel sizes and organ systems. Related to this, based on the figures, it also seems like the performance varies by cell type. What are the factors that contribute to this? Variability in expression levels, RNA quantity/quality? Biases in the panel? Personally, I am also curious how this model can be used similarly/differently if we have a FISH-based, high-plex reference atlas. Additional explanation around these points would be helpful for the readers.

      Thank you for this thoughtful comment. The performance of our method is indeed sensitive to the number and quality of selected features. To optimize feature selection, we employed multiple strategies, including Moran’s I statistic, identification of highly variable genes, and the Seurat pipeline to detect anchor genes linking the spatial transcriptomics data with the reference atlas. The number of selected markers depends on the quality of the data. For high-quality datasets, fewer than 100 markers are typically sufficient for accurate prediction. To address this more clearly, we will revise the manuscript to include detailed descriptions of our feature selection process and demonstrate how varying the number of selected features impacts performance.

      We evaluated our method across diverse tissue types and platforms—including Slide-seq, 10x Visium, and Virtual-FISH—which represent both sequencing-based and imaging-based spatial transcriptomics technologies. Our model consistently achieved strong performance across these settings. It's worth noting that the performance of other methods, such as CellTrek [Wei et al] and novoSpaRc [Nitzan et al], also depends heavily on feature selection. In particular, performance degrades substantially when fewer features are used.

      We do not believe that the observed performance is directly influenced by cell type composition. Major cell types are typically well-defined, and rare cell types comprise only a small fraction of the dataset. For these rare populations, a single misclassification can disproportionately impact metrics like KL divergence due to small sample size. However, this does not necessarily indicate a systematic cell type–specific bias in the mapping. To mitigate this issue, we will implement shuffling and sampling procedures to reduce potential bias introduced by rare cell types.

      (3) Application 3 (spatial communication) in the graphical abstract appears relatively underdeveloped. While it is clear that the model infers spatial proximities, further explanation of how these mappings translate into insights into cell-cell communication networks would enhance the biological relevance of the findings.

      Thank you for this valuable feedback. We agree that further elaboration on the connection between spatial proximity and cell–cell communication would enhance the biological interpretation of our results. While our current model focuses on inferring spatial relationships, we may provide some cell-cell communications in the future.

      (4) What is the final resolution of the model outputs? I am assuming this is dictated by the granularity of the reference atlas and the imposed sparsity via the L1 norm, but if there are clear examples that would be good. In figures (or maybe in practice too), cells seem to be assigned to small, contiguous patches rather than pinpoint single-cell locations, which is a pragmatic compromise given the inherent limitations of current spatial transcriptomics technologies. Clarification on the precise spatial scale (e.g., pixel or micrometer resolution) and any post-mapping refinement steps would be beneficial for the users to make informed decisions on the right bioinformatic tools to use.

      Thank you for the comment. For each cell, our algorithm generates a probability vector that indicates its likely spatial assignment along with coordinate information. We will include the resolution and the number of cells assigned to each spot in future versions. In our framework, each cell is mapped to one or more spatial locations with associated probabilities. Depending on the amount of regularization through L1 and L2 norms, a cell may be localized to a small patch or distributed over a broader domain. For the 10x Visium data, we applied a repelling algorithm to enhance visualization [Wei et al]. If a cell’s original location is already occupied, it is reassigned to a nearby neighborhood to avoid overlap. The users can also see the entire regularization path by varying the penalty terms. 

      Nitzan M, Karaiskos N, Friedman N, Rajewsky N. Gene expression cartography. Nature. 2019;576(7785):132-137. doi:10.1038/s41586-019-1773-3

      Wei, R. et al. (2022) ‘Spatial charting of single-cell transcriptomes in tissues’, Nature Biotechnology, 40(8), pp. 1190–1199. doi:10.1038/s41587-022-01233-1. 

      Reviewer #2 (Public review):

      Summary:

      The author proposes a novel method for mapping single-cell data to specific locations with higher resolution than several existing tools.

      Thank you for recognizing our contribution. Our goal was to develop a method that achieves higher spatial resolution in mapping single-cell data compared to existing tools. We are encouraged by the results and will continue to refine the approach to improve accuracy and generalizability across platforms and tissue types.

      Strengths:

      The spatial mapping tests were conducted on various tissues, including the mouse cortex, human PDAC, and intestinal villus.

      Thank you for this comment. We believe that evaluating our method across diverse tissue types—such as the mouse cortex, human PDAC, and intestinal villus—demonstrates its robustness and broad applicability. We plan to continue expanding these evaluations to additional tissue contexts and species to further validate the method’s generalizability.

      Weakness:

      (1) Although the researchers claim that glmSMA seamlessly accommodates both sequencing-based and image-based spatial transcriptomics (ST) data, their testing primarily focused on sequencing-based ST data, such as Visium and Slide-seq. To demonstrate its versatility for spatial analysis, the authors should extend their evaluation to imaging-based spatial data.

      Thank you for the comment. We have tested our algorithm on the virtual FISH dataset from the fly embryo, which serves as an example of image-based spatial omics data. However, such datasets often contain a limited number of available genes. To address this, we will conduct additional testing on image-based data if needed. The Allen Brain Atlas provides high-quality ISH data, and we can select specific brain regions from this resource to further evaluate our algorithm if necessary [Lein et al]. Currently, we plan to focus more on the 10x Visium platform, as it supports whole-transcriptome profiling and offers a wide range of tissue samples for analysis.

      (2) The definition of "ground truth" for spatial distribution is unclear. A more detailed explanation is needed on how the "ground truth" was established for each spatial dataset and how it was utilized for comparison with the predicted distribution generated by various spatial mapping tools.

      Thank you for the comment. To clarify how ground truth is defined across different tissues, we provide the following details. Direct ground truth for cell locations is often unavailable in scRNA-seq data due to experimental constraints. To address this, we adopted alternative strategies for estimating ground truth in each dataset:

      - 10x Visium Data: We used the cell type distribution derived from spatial transcriptomics (ST) data as a proxy for ground truth. We then computed the KL divergence between this distribution and our model's predictions for performance assessment.

      - Slide-seq Data: We validated predictions by comparing the expression of marker genes between the reconstructed and original spatial data.

      - Fly Embryo Data: We used predicted cell locations from novoSpaRc as a reference for evaluating our algorithm.

      These strategies allowed us to evaluate model performance even in the absence of direct cell location data. In addition, we can apply multiple evaluation strategies within a single dataset.

      (3) In the analysis of spatial mapping results using intestinal villus tissue, only Figure 3d supports their findings. The researchers should consider adding supplemental figures illustrating the spatial distribution of single cells in comparison to the ground truth distribution to enhance the clarity and robustness of their investigation.

      Thank you for the comment. We will include additional details for this dataset in the supplementary figures. As the intestinal villus is a relatively simple tissue, most existing algorithms performed well on it. For this reason, we did not initially provide extensive details in the main text.

      (4) The spatial mapping tests were conducted on various tissues, including the mouse cortex, human PDAC, and intestinal villus. However, the original anatomical regions are not displayed, making it difficult to directly compare them with the predicted mapping results. Providing ground truth distributions for each tested tissue would enhance clarity and facilitate interpretation. For instance, in Figure 2a and Supplementary Figures 1 and 2, only the predicted mapping results are shown without the corresponding original spatial distribution of regions in the mouse cortex. Additionally, in Figure 3c, four anatomical regions are displayed, but it is unclear whether the figure represents the original spatial regions or those predicted by glmSMA. The authors are encouraged to clarify this by incorporating ground truth distributions for each tissue.

      Thank you for the comment. To improve visualization, we will include anatomical structures alongside the mapping results in the next version, wherever such structures are available (e.g., mouse brain cortex, human PDAC sample, etc.). Regions will be color-coded to enhance clarity and make the spatial organization easier to interpret.

      (5) The cell assignment results from the mouse hippocampus (Supplementary Figure 6) lack a corresponding ground truth distribution for comparison. DG and CA cells were evaluated solely based on the gene expression of specific marker genes. Additional analyses are needed to further validate the robustness of glmSMA's mapping performance on Slide-seq data from the mouse hippocampus.

      Thank you for the comment. The ground truth for DG and CA cells was not available. To better evaluate the model's performance, we will compute the KL divergence between the original and predicted cell type distributions, following the same approach used for the 10x Visium dataset.

      (6) The tested spatial datasets primarily consist of highly structured tissues with well-defined anatomical regions, such as the brain and intestinal villus. Anatomical regions are not distinctly separated, such as liver tissue. Further evaluation of such tissues would help determine the method's broader applicability.

      Thank you for the comment. We have already tested our algorithm on the fly embryo, where anatomical structures are not well defined or clearly separated. If needed, we can further apply glmSMA to more complex tissues such as the liver. To clarify the role of anatomical structures in our model: glmSMA does not require anatomical information as input. Instead, it leverages a distance matrix between cells to apply L2 norm regularization. Despite the absence of anatomical information, the model still demonstrates strong performance. We will include results to illustrate its effectiveness without anatomical input. Additionally, we plan to evaluate the model on tissues where anatomical regions are not clearly delineated.

      Lein, E., Hawrylycz, M., Ao, N. et al. Genome-wide atlas of gene expression in the adult mouse brain. Nature 445, 168–176 (2007). https://doi.org/10.1038/nature05453

      Reviewer #3 (Public review):

      Summary:

      The authors aim to develop glmSMA, a network-regularized linear model that accurately infers spatial gene expression patterns by integrating single-cell RNA sequencing data with spatial transcriptomics reference atlases. Their goal is to reconstruct the spatial organization of individual cells within tissues, overcoming the limitations of existing methods that either lack spatial resolution or sensitivity.

      Strengths:

      (1) Comprehensive Benchmarking:

      Compared against CellTrek and Novosparc, glmSMA consistently achieved lower Kullback-Leibler divergence (KL divergence) scores, indicating better cell assignment accuracy.

      Outperformed CellTrek in mouse cortex mapping (90% accuracy vs. CellTrek's 60%) and provided more spatially coherent distributions.

      (2) Experimental Validation with Multiple Real-World Datasets:

      The study used multiple biological systems (mouse brain, Drosophila embryo, human PDAC, intestinal villus) to demonstrate generalizability.

      Validation through correlation analyses, Pearson's coefficient, and KL divergence support the accuracy of glmSMA's predictions.

      We thank reviewer #3 for their positive feedback and thoughtful recommendations.

      Weaknesses:

      (1) The accuracy of glmSMA depends on the selection of marker genes, which might be limited by current FISH-based reference atlases.

      We agree that the accuracy of glmSMA is influenced by the selection of marker genes, and that current FISH-based reference atlases may offer a limited gene set. To address this, we incorporate multiple feature selection strategies, including highly variable genes and spatially informative genes (e.g., via Moran’s I), to optimize performance within the available gene space. As more comprehensive reference atlases become available, we expect the model’s accuracy to improve further.

      (2) glmSMA operates under the assumption that cells with similar gene expression profiles are likely to be physically close to each other in space which not be true under various heterogeneous environments.

      While this assumption effectively captures spatial continuity in many cases, we acknowledge that it may not hold across all biological contexts. To address this, we plan to refine our regularization strategy and evaluate the model's performance in heterogeneous tissue regions.

    1. Author response:

      Public Reviews

      Reviewer #1 (Public review):

      Summary:

      Kwon et al present a very well-conducted and well-written sieve analysis of rotavirus infections in a passive surveillance network in the US, considering how relative vaccine efficacy changes with genetic distance from the vaccine strains including the whole genome. The results are compelling, supported by a number of sensitivity analyses, and the manuscript is generally easy to follow.

      Strengths:

      (1) The underlying study base, a surveillance network across multiple sites in the US.

      (2) The use of a test-negative design, which is well established for rotavirus, to estimate vaccine efficacy.

      (3) The use of genetic distance to measure differences between infecting and vaccine strains, and the innovative use of k-means clustering to make results more interpretable.

      (4) The secondary and sensitivity analyses that provide additional context and support for the primary findings.

      Weaknesses:

      (1) As identified by the authors, there is a limited sample size for the analysis of RV1 (monovalent rotavirus vaccine).

      (2) Sieve analyses were originally designed for randomized trials, in which setting their key assumptions are more likely to be met. There is little discussion in this paper of how those assumptions might be violated and what effect that might have on the results. The authors have access to some important confounders, but I believe some more discussion on potential biases in this observational study is warranted.

      We appreciate the reviewer’s positive comments and the opportunity to discuss the application of sieve analysis in observational vaccine effectiveness studies, contrasting it with its traditional use in clinical trials assessing vaccine efficacy. We fully acknowledge the reviewer's point that sieve analysis was originally developed for, and is most frequently employed in, randomized controlled trials (RCTs).

      Sieve analysis, as defined by Gilbert et al. (2001), has the following core assumptions: (A1) uniform susceptibility to infection for all participants except for vaccine-induced strain-specific effects; (A2) equal exposure (for each strain s = 1,…,K ) distribution between vaccine groups; and (A3), constant strain prevalence. RCTs ensure these through randomization. However, our observational design is vulnerable to violating these assumptions, especially A1 and A3. To address A1 and A3, we adjusted for age (in years), sample collection year, and clinical setting (i.e., outpatient, inpatient, ED), aiming to account for both individual-level and temporal variations.

      A2 is particularly challenging in observational settings. We found that study site was correlated with both vaccination status (main predictor) and the strain distribution, potentially violating A2. However, adjusting for study site reversed the expected association. Upon further reflection, we realized that the site-specific differences in strain distributions likely reflect the population-level effect of vaccination, which we believe outweighs the potential confounding by study site as an independent cause of both individual-level vaccination status and strain distributions irrespective of vaccination. Thus, adjusting for site would have obscured this genuine population-level effect, and therefore we elected not to do so. We will include further discussion of this point in the revised manuscript.

      Our study demonstrates the unique capacity of sieve analysis to disentangle individual- and population-level effects on vaccine effectiveness in observational settings. We will expand on these considerations, including the potential biases inherent to observational studies and the rationale for our analytical choices, within the discussion section of the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      This study introduces a new metric for assessing the efficacy of rotavirus vaccines through the genetic distance clustering of strains. The authors analyzed variations in vaccine protection using whole genome sequencing.

      Strengths:

      Evaluating vaccine efficacy using whole genome sequencing can enhance our understanding of how pathogen evolution influences disease transmission and control.

      Weaknesses:

      While the study proposed a new method for evaluating vaccine efficacy using genetic information, its weaknesses arise from the insufficient evidence that analyses based on whole genome sequencing are more reliable than those that rely solely on VP7 and VP4 genotypes.

      Though most cases received the RV5 vaccine (n=119 compared to n=30 for RV1), Figure 2 and the primary focus of the paper concentrate on RV1, as the authors identified a stronger association with genetic distance.

      Additionally, it is unclear whether the difference between the two groups (j=0 versus j=1) is statistically significant for the analysis based on genetic distance to the RV1 strain, as well as for that based on minimum genetic distance to any of the RV5 vaccine strains. In both cases, the confidence intervals show substantial overlap

      The authors do not seem to have used a criterion for model selection based on the number of clusters; therefore, k=2 may not represent the optimal number of clusters, particularly in relation to the genetic distance associated with the RV5 vaccine (Figure 1B), which does not appear to show a bimodal distribution.

      Finally, outcomes for RV1 are highly associated with both homotypic and heterotypic antibody responses (Supplemental Figure 10), which have already been shown to impact vaccine effectiveness (The Pediatric Infectious Disease Journal 40(12):p 1135-1143, 2021, doi:10.1097/INF.0000000000003286). Given this strong association, the benefit of using genetic distance is unclear, as the GxPx genotype serves as a good proxy for genetic similarity. 

      We sincerely appreciate reviewer's careful consideration of our manuscript and their constructive suggestions for improvement.

      Regarding the comparison of whole-genome sequencing with traditional VP7/VP4 genotyping, we concur that a more explicit comparison would strengthen our findings. To this end, we plan to incorporate the direct comparison of genetic distance (GD) and genotype-specific vaccine effectiveness (VE) analyses into the main text. Additionally, we will conduct an analysis of VE based on homotypic, partially heterotypic, and fully heterotypic genotype groupings. This will provide a clearer demonstration of the potential added value of GD in refining VE estimates, particularly for future applications. Given the potential for reassortment among the rotavirus gene segments, our analysis highlights that relying solely on the VP7/VP4 genotype can at times be misleading. 

      Regarding k-means clustering, we wish to clarify that the selection of k=2 was not arbitrary. It was determined using the elbow method on the total within-sum-of-squares (using the fviz_nbclust function in the factoextra R package, with n=5000 bootstrapping). While we acknowledge that other methods, such as silhouette and gap statistics, may yield different optimal cluster numbers, we prioritized maximizing group sample sizes. We will explicitly state this model selection criterion within the methods section of the revised manuscript.

      We acknowledge the reviewer’s concern regarding the overlapping confidence intervals and the statistical significance of the differences between the VE for the j=0 and j=1 groups. One way to address this would be to modify our analysis. Instead of two separate logistic regression models (controls vs j=0 cases, and controls vs j=1 cases), we could employ a multinomial logistic regression model with three categories: controls (reference), j=0 cases, and j=1 cases, then conduct Wald test to directly compare the regression slopes for the j=0 and j=1 cases against controls. We intend to explore this approach in the revised manuscript, which will provide a more rigorous assessment of differences in VE by accounting for the relationship between groups within a single model.

      Reviewer #3 (Public review):

      Overall, this is an outstanding paper. It presents a novel approach to estimating rotavirus vaccine efficacy; is clearly written and presented; and has implications for this vaccine specifically as well as type-specific vaccine evaluation more generally. The analytical framework is a creative and there is rigorous use of data and statistical approaches. It has long been argued that rotavirus immunity/vaccine performance operates beyond the scale of G/P genotyping. This paper is the first to demonstrate that convincingly, using data on all 11 viral genes and whole genome sequence analysis. I have only minor comments that I recommend should be addressed.

      We sincerely thank the reviewer for their highly positive assessment of our manuscript. We will carefully address their minor comments and incorporate their recommendations in the revised manuscript, which we believe will further enhance the clarity and impact of our study.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The paper by Fournier et al. investigates the sensitivity of neural circuits to changes in intrinsic and synaptic conductances. The authors use models of the stomatogastric ganglion (STG) to compare how perturbations to intrinsic and synaptic parameters impact network robustness. Their main finding is that changes to intrinsic conductances tend to have a larger impact on network function than changes to synaptic conductances, suggesting that intrinsic parameters are more critical for maintaining circuit function.

      The paper is well-written and the results are compelling, but I have several concerns that need to be addressed to strengthen the manuscript. Specifically, I have two main concerns:

      (1) It is not clear from the paper what the mechanism is that leads to the importance of intrinsic parameters over synaptic parameters.

      (2) It is not clear how general the result is, both within the framework of the STG network and its function, and across other functions and networks. This is crucial, as the title of the paper appears very general.

      I believe these two elements are missing in the current manuscript, and addressing them would significantly strengthen the conclusions. Without a clear understanding of the mechanism, it is difficult to determine whether the results are merely anecdotal or if they depend on specific details such as how the network is trained, the particular function being studied, or the circuit itself. Additionally, understanding how general the findings are is vital, especially since the authors claim in the title that "Circuit function is more robust to changes in synaptic than intrinsic conductances," which suggests a broad applicability.

      I do not wish to discourage the authors from their interesting result, but the more we understand the mechanism and the generality of the findings, the more insightful the result will be for the neuroscience community.

      Major comments

      (1) Mechanism

      While the authors did a nice job of describing their results, they did not provide any mechanism for why synaptic parameters are more resilient to changes than intrinsic parameters. For example, from Figure 5, it seems that there is mainly a shift in the sensitivity curves. What is the source of this shift? Can something be changed in the network, the training, or the function to control it? This is just one possible way to investigate the mechanism, which is lacking in the paper.

      (2) Generality of the results within the framework of the STG circuit

      (a) The authors did show that their results extend to multiple networks with different parameters (the 100 networks). However, I am still concerned about the generality of the results with respect to the way the models were trained. Could it be that something in the training procedure makes the synaptic parameters more robust than intrinsic parameters? For example, the fact that duty cycle error is weighted as it is in the cost function (large beta) could potentially affect the parameters that are more important for yielding low error on the duty cycle.

      (b) Related to (a), I can think of a training scheme that could potentially improve the resilience of the network to perturbations in the intrinsic parameters rather than the synaptic parameters. For example, in machine learning, methods like dropout can be used to make the network find solutions that are robust to changes in parameters. Thus, in principle, the results could change if the training procedure for fitting the models were different, or by using a different optimization algorithm. It would be helpful to at least mention this limitation in the discussion.

      (3) Generality of the function

      The authors test their hypothesis based on the specific function of the STG. It would be valuable to see if their results generalize to other functions as well. For example, the authors could generate non-oscillatory activity in the STG circuit, or choose a different, artificial function, maybe with different duty cycles or network cycles. It could be that this is beyond the scope of this paper, but it would be very interesting to characterize which functions are more resilient to changes in synapses, rather than intrinsic parameters. In other words, the authors might consider testing their hypothesis on at least another 'function' and also discussing the generality of their results to other functions in the discussion.

      (4) Generality of the circuit

      The authors have studied the STG for many years and are pioneers in their approach, demonstrating that there is redundancy even in this simple circuit. This approach is insightful, but it is important to show that similar conclusions also hold for more general network architectures, and if not, why. In other words, it is not clear if their claim generalizes to other network architectures, particularly larger networks. For example, one might expect that the number of parameters (synaptic vs intrinsic) might play a role in how resilient the function is with respect to changes in the two sets of parameters. In larger models, the number of synaptic parameters grows as the square of the number of neurons, while the number of intrinsic parameters increases only linearly with the number of neurons. Could that affect the authors' conclusions when we examine larger models?

      In addition, how do the authors' conclusions depend on the "complexity" of the non-linear equations governing the intrinsic parameters? Would the same conclusions hold if the intrinsic parameters only consisted of fewer intrinsic parameters or simplified ion channels? All of these are interesting questions that the authors should at least address in the discussion.

      We thank Reviewer #1 for their valuable input. We agree with the reviewer that generality of the results may have been overstated. To address this we changed the title of the manuscript to make it more specific to rhythmic circuits and we included a sentence to this effect in the discussion. 

      (1) We were more interested in knowing which set of conductances is more robust in a population of models, rather than a mechanism. If such a mechanism exists it will be the subject of a different study.

      (2) (a) It is impossible to explore the whole parameter space of these models. Our method to find circuits will leave subsets of circuits out of the study. Our sole goal in constructing the model database was that the activities were similar but the conductances were different.  (b) Of course one could devise a cost function targeting circuits that are more or less robust to changes in one parameter. Whether those exist is a different matter. This is not what we intended to do.

      (3) For this we would need a different circuit that produces non-oscillatory activity. A normal pyloric rhythm circuit always produces oscillatory activity unless it is “crashed"either by temperature or perturbations, but even in this case because we don’t have a proper “control” activity (circuits crash in different ways) we would not be able to utilize the same approach.

      We think it is a valuable idea to perform a similar study in another small circuit with nonoscillatory (or rhythmic) activities. 

      (4) We did not explore the issue of how our results generalize to larger networks as it would be pure speculation. It could be potentially interesting to do a similar sensitivity analysis with a large network trained to perform a simple task. Our understanding is that many large trained networks are extremely sensitive to perturbations in synaptic weights, at the same time that the intrinsic properties of neurons in ANN are typically oversimplified and identical across units. 

      Reviewer #2 (Public review):

      Summary:

      This manuscript presents an important exploration of how intrinsic and synaptic conductances affect the robustness of neural circuits. This is a well-deserved question, and overall, the manuscript is written well and has a logical progression.

      The focus on intrinsic plasticity as a potentially overlooked factor in network dynamics is valuable. However, while the stomatogastric ganglion (STG) serves as a well-characterized and valuable model for studying network dynamics, its simplified structure and specific dynamics limit the generalizability of these findings to more complex systems, such as mammalian cortical microcircuits.

      Strengths:

      Clean and simple model. Simulations are carefully carried out and parameter space is searched exhaustively.

      Weaknesses:

      (1) Scope and Generalizability:

      The study's emphasis on intrinsic conductance is timely, but with its minimalistic and unique dynamics, the STG model poses challenges when attempting to generalize findings to other neural systems. This raises questions regarding the applicability of the results to more complex circuits, especially those found in mammalian brains and those where the dynamics are not necessarily oscillating. This is even more so (as the authors mention) because synaptic conductances in this study are inhibitory, and changes to their synaptic conductances are limited (as the driving force for the current is relatively low).

      (2) Challenges in Comparison:

      A significant challenge in the study is the comparison method used to evaluate the robustness of intrinsic versus synaptic perturbations. Perturbations to intrinsic conductances often drastically affect individual neurons' dynamics, as seen in Figure 1, where such changes result in single spikes or even the absence of spikes instead of the expected bursting behavior. This affects the input to downstream neurons, leading to circuit breakdowns. For a fair comparison, it would be essential to constrain the intrinsic perturbations so that each neuron remains within a particular functional range (e.g., maintaining a set number of spikes). This could be done by setting minimal behavioral criteria for neurons and testing how different perturbation limits impact circuit function.

      (3) Comparative Metrics for Perturbation:

      Another notable issue lies in the evaluation metrics for intrinsic and synaptic perturbations. Synaptic perturbations are straightforward to quantify in terms of conductance, but intrinsic perturbations involve more complexity, as changes in maximal conductance result in variable, nonlinear effects depending on the gating states of ion channels. Furthermore, synaptic perturbations focus on individual conductances, while intrinsic perturbations involve multiple conductance changes simultaneously. To improve fairness in comparison, the authors could, for example, adjust the x-axis to reflect actual changes in conductance or scale the data post hoc based on the real impact of each perturbation on conductance. For example, in Figure 6, the scale of the panels of the intrinsic (e.g., g_na-bar) is x500 larger than the synaptic conductance (a row below), but the maximal conductance for sodium hits maybe for a brief moment during every spike and than most of the time it is close to null. Moreover, changing the sodium conductance over the range of 0-250 for such a nonlinear current is, in many ways, unthinkable, did you ever measure two neurons with such a difference in the sodium conductance? So, how can we tell that the ranges of the perturbations make a meaningful comparison?

      We thank Reviewer #2 for their comments. We agree with both reviewers about scope and generalizability. We changed the title of the manuscript and included a sentence in the discussion to address this. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line 63: Tau_b is tau in Fig 1B? What is the 'network period' tau_n? Both are defined in the methods, but it would be good to clarify here and also in the figure.

      This was fixed. Tau_b is the  bursting period and we indicated it in the figure. Network period means the period of the network activity. This was rewritten.  

      (2) Line 74: "maximal conductances g_i." What is i? I can imagine what you meant, but it would be good to clarify the notation.

      There are multiple different currents. Letter ‘i' is an index over the different types. It now reads as follows,

      "The activity of the network depends on the values of the maximal conductances g ̄ i, where i is an index corresponding to the different current types (Na,CaS,CaT,Kd,KCa,A,H,Leak IMI)"

      (3) Line 78: "conductances are changed by a random amount." How much is the "random amount"? In percentages? 

      We fixed this sentence. This is how it reads now, 

      "The blue trace in Figure 1C corresponds to the activity of the same model when each  of the intrinsic conductances is changed by a random amount within a range between 0  (completely removing the conductance) and twice its starting value, 2×gi, or equivalently, an increment of 100%."

      Similarly, in Line 87: "by a similar percent." Can you provide Figures 1E-F in percentages? Are the percentages the same?

      The phrase "by a similar percent.” Is misleading and unimportant. Thank you, we removed it. 

      (4) Line 113: Why did you add I_MI? Is it important for the results or for the conclusions?

      I_MI was added because the current is known to be there and it is not more or less important for the results or conclusions than any other current. 

      (5) Line 117: "We used a genetic algorithm to generate a database." Confusing. I guess you meant that you used genetic algorithms to optimize the cost function.

      Thank you for this comment. We fixed this sentence, see below. 

      “We used a genetic algorithm to optimize the cost function, and in this way generated a database of N = 100 models with different values of maximal conductances (Holland 88)."

      (6) Line 136: "The models in the database were constrained to produce solutions whose features were similar to the experimental measurements." Why are there differences in the features? Is this an optimization issue? I thought you wanted to claim that there are degenerate solutions, that is, solutions where the parameters are different, but the output is identical. Please clarify.

      The concept of degenerate solutions does not imply that the solutions are mathematically identical. In biology this means that they provide very similar functions, but do so with different underlying parameters (in this case, maximal conductances). The activity of the pyloric network is slightly different across animals, and it also changes over time within the same individual. Variation across models reflects individual variation in the biological circuit, and it is strength of our modeling approach. The function of the circuits are equally good because they produce biologically realistic patterns, although the details of the activity patterns show differences. 

      (7) Line 139: "distributed (p > 0.05)." What test did you use? N? Similarly, at Lines 218, 241, 239, etc. Please be more rigorous when reporting statistical tests.

      Thank you. We now specify the test we utilized every time we report a p value. 

      (8) Line 143: "In this case, it is not possible to identify clusters, suggesting that there are no underlying relationships between the features in the model database." The 2D plot is misleading, as the features are in 11 dimensions. Claims should be about the 11D space, not projections onto 2D. In fact, I don't think you can rule out correlations between the features based on the 2D plots. For example, shouldn't there be correlations between the on and off phases and the burst durations?

      Thank you. These sentences were confusing and were removed. We added the following sentence to the end of that paragraph.

      "Because the feature vectors are similar, their t-SNE projections do not form groups or clusters."

      (9) Related to this, I don't understand this sentence: "Even though the conductances are broadly distributed over many-fold ranges, the output of the circuits results in tight yet uncorrelated distributions.”

      This sentence is confusing and was removed. 

      (10) Line 158: Repetition of Line 152: Figure 3 shows the currentscapes of each cell in two model networks.

      We removed the second instance of the repeated sentences. 

      (11) Line 160: "yet the activity of the networks is similar." Well, they are similar, but not identical. I can also say that the current scapes are 'similar'. This should be better quantified and not left as a qualitative description.

      While this is an interesting point it will not change the results and conclusions of the present study. The network models are different since the values of their maximal conductances are distributed over wide ranges.  

      (12) Line 218: midpoint parameter? Is that b - the sharpness? Please be consistent. Regarding the mechanism (see above) - any ideas what leads to this shift in the sensitivity curves between the two types of parameters?

      Yes, we made a mistake. ‘b’ is the midpoint parameter. This was fixed in the text, thank you.

      (13) Figure 6 illustrates why synaptic parameters are more robust, but it is not quantified. Why not provide a quantitative measure for this claim? For example, calculate the colored area within the white square for each pair, for each cell, and for each model. Show that these measures can predict improved robustness for one model over another and for synaptic vs. intrinsic parameters.

      The ratio of areas of the colored and non-colored regions in the whole hyperboxes (for intrinsic and synaptic conductances) is the number reported in the y-axis of the sensitivity curves when we include all conductances (and not just a pair). 

      We computed the ratios of the colored/noncolored areas in all panels in figure 6 and now report these quantities as follows, 

      "We computed the proportions of areas of the white boxes that correspond to pyloric activity. These values for the intrinsic conductances panels are PD = 0.58, LP = 0.50, PY = 0.49, and the proportions for the synaptic conductances panels are PDPY = 0.62, P DLP = 0.87, and LPPD = 0.94. The occupied areas for synaptic conductances are larger than in the intrinsic conductances panels, consistent with our finding that the circuits’ activities are more robust to changes in synaptic conductances versus changes in intrinsic conductances."

      "As before, we computed the proportion of areas of pyloric activity within the white boxes: PD = 0.61, LP = 0.55, PY = 0.52, and the proportions for the synaptic conductances panels are PDPY = 0.88, PDLP = 0.87, and LPP D = 0.83. These results provide an intuition of the complexities of GP . Not only are these regions hard-to-impossible to characterize in one circuit, but they are also different across circuits.” 

      (14) Does the sign of the synaptic weights affect the conclusions?

      We did not explore this issue because all chemical synapses in this network are inhibitory.

      (15) Line 492: typo: deltai.

      We fixed this.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 301 - you can also add Williams and Fletcher 2019 Neuron.

      We added the reference. Thank you. 

      (2) Line 316 - this is a strange comment as these exact regions that were shown intrinsic plasticity (e.g., Losonczy, Attila, Judit K. Makara, and Jeffrey C. Magee. "Compartmentalized dendritic plasticity and input feature storage in neurons." Nature 452.7186 (2008): 436-441).

      We did not understand this comment. 

      (3) I found only one citation for the work of Turrigiano, the most relevant of which is only mentioned in the Method section. This is odd, as her work directly relates how synaptic conductance perturbation results in changes in intrinsic conductance.

      We included more references to the work of Turrigiano to provide more context. 

      "Desai, Niraj S., Lana C. Rutherford, and Gina G. Turrigiano. "Plasticity in the intrinsic excitability of cortical pyramidal neurons." Nature neuroscience 2, no. 6 (1999): 515-520.” "Desai, Niraj S., Sacha B. Nelson, and Gina G. Turrigiano. "Activity-dependent regulation of excitability in rat visual cortical neurons." Neurocomputing 26 (1999): 101-106.”

      (4) Line 329 - The list of citations is very limited regarding studies of ext/int balance which started really way before 2009. Please give some of the credit to the classics.

      We included the following additional references.

      Van Vreeswijk, Carl, and Haim Sompolinsky. "Chaos in neuronal networks with balanced excitatory and inhibitory activity." Science 274, no. 5293 (1996): 1724-1726.

      Rubin, Ran, L. F. Abbott, and Haim Sompolinsky. "Balanced excitation and inhibition are required for high-capacity, noise-robust neuronal selectivity." Proceedings of the National Academy of Sciences 114, no. 44 (2017): E9366-E9375.

      Wang, Xiao-Jing. "Macroscopic gradients of synaptic excitation and inhibition in the neocortex." Nature reviews neuroscience 21, no. 3 (2020): 169-178.

      Lo, Chung-Chuan, Cheng-Te Wang, and Xiao-Jing Wang. "Speed-accuracy tradeoff by a control signal with balanced excitation and inhibition." Journal of Neurophysiology 114, no. 1 (2015): 650-661.

      (5) In Figure 1B, why does it say 'OFF' when the neuron is spiking?

      The label indicates the interval of time elapsed between the first spike in the PD neuron (taken as a reference), and the last spike in the burst (PD off). 

      Summary of changes to figures:

      Figure 1:

      Fixed labels indicating bursting period and burst duration.

      Figure 5:

      Added labels in panels C and D specifying the symbol corresponding to the sigmoidal parameter.

      Additional changes

      We changed the title of the manuscript as follows:

      "Rhythmic circuit function is more robust to changes in  synaptic than intrinsic conductances." We included the following sentence at the end of the Discussion Section. 

      "We believe our results will hold for other rhythmic circuits and will be relevant for similar studies in other circuits with more complex functions.”

      We realized we made a mistake with the units for maximal conductances. They were incorrectly expressed in nS (nano Siemens) in the figure labels, and correctly expressed in micro Siemens in the methods section. This was fixed and now conductances are expressed in micro Siemens consistently in the manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reply to the comments of the second referee

      We sincerely appreciate the positive evaluation and the useful suggestions on our manuscript.

      (1) The authors identified key metabolites affecting responses to perturbations in two ways: (i) by fixing a metabolite's value and (ii) by performing a sensitivity analysis. It would be helpful for the modeling community to understand better the differences and similarities in the obtained results. Do both methods identify substrate-level regulators? Is freezing a metabolite's dynamics dramatically changing the metabolic response (and if yes, which ones are so different in the two cases)? Does the scope of the network affect these differences and similarities? 

      Thank you for these suggestions. We compared the Sobolʼ total sensitivity index with the absolute values of the change in the response coefficient (Figure S6 in the revised manuscript). There is no clear relationship between the two quantities. The Sobolʼ sensitivity analysis quantifies how a perturbation on the concentration of a metabolite X contributes to the overall dynamics. On the other hand, the analysis in which metabolitesʼ concentrations are fixed measures how strongly metabolite X helps propagate the perturbations on the other metabolites throughout the metabolic network. In other words, in the Sobolʼ analysis, we evaluate the outcome when the perturbation is applied directly to metabolite X, whereas in the fixing-metabolites analysis, we consider perturbations applied to other metabolites and assess how X influences those perturbations. We believe this conceptual difference explains why the two quantities do not correlate. We suspect that this lack of correlation is independent of the networkʼs scope, because each method evaluates a different aspect of the system.  We would say that both methods identify the effect of the metabolite dynamics on the overall dynamics whatever the form is, i.e. the methods do not distinguish the perturbation on the metabolite affecting the overall dynamics by whether the stoichiometric (reactant) way or, the substrate-level regulations. Thus, identifying the substrate-level regulation by utilizing the methods would be challenging. 

      (2) Regarding the issues the authors encountered when performing the sensitivity analysis, they can be approached in two ways. First, the authors can check the methods for computing conserved moieties nicely explained by Sauro's group (doi:10.1093/bioinformatics/bti800) and compute them for large-scale networks (but beware of metabolites that belong to several conserved pools). Otherwise, the conserved pools of metabolites can be considered as variables in the sensitivity analysis-grouping multiple parameters is a common approach in sensitivity analysis. 

      Thank you for this helpful suggestion. Following the method described in the reference, we have computed the Sobolʼ sensitivity index of NADH, NADPH, and Q8H2 (with their counterparts algebraically solved and treated as dependent variables). We have updated Figure S5 accordingly.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review): 

      Summary:

      The authors examine the role of the medial prefrontal cortex (mPFC) in cognitive control, i.e. the ability to use task-relevant information and ignore irrelevant information, in the rat. According to the central-computation hypothesis, cognitive control in the brain is centralized in the mPFC and according to the local hypothesis, cognitive control is performed in task-related local neural circuits. Using the place avoidance task which involves cognitive control, it is predicted that if mPFC lesions affect learning, this would support the central computation hypothesis whereas no effect of lesions would rather support the local hypothesis. The authors thus examine the effect of mPFC lesions in learning and retention of the place avoidance task. They also look at functional interconnectivity within a large network of areas that could be activated during the task by using cytochrome oxidase, a metabolic marker. In addition, electrophysiological unit recordings of CA1 hippocampal cells are made in a subset of (lesioned or intact) animals to evaluate overdispersion, a firing property that reflects cognitive control in the hippocampus. The results indicate that mPFC lesions do not impair place avoidance learning and retention (though flexibility is altered during conflict training), do not affect cognitive control seen in hippocampal place cell activity (alternation of frame-specific firing), a measure of location-specific firing variability, in pretraining. It nevertheless has some effect on functional interconnections. The results overall support the local hypothesis. 

      Strengths:

      Straightforward hypothesis: clarification of the involvement of the mPFC in the brain is expected and achieved. Appropriate use of fully mastered methods (behavioral task, electrophysiological recordings, measure of metabolic marker cytochrome oxidase) and rigorous analysis of the data. The conclusion is strongly supported by the data. 

      Weaknesses:

      No notable weaknesses in the conception, making of the study, and data analysis. The introduction does not mention important aspects of the work, i.e. cytochrome oxidase measure and electrophysiological recordings. The study is actually richer than expected from the introduction. 

      The revised Introduction now includes:

      “We used cytochrome oxidase, a metabolic marker of baseline neuronal activity, to confirm the mPFC lesions were effective and that there are non-local network consequences despite the local lesion. We first evaluated cytochrome oxidase activity in regions known to be associated with performance in the active place avoidance task, or regions with known connectivity to the mPFC. We then evaluated covariance of activity amongst the regions in an effort to detect network consequences of the lesion.”

      Reviewer #2 (Public review): 

      Park et al. set out to test two competing hypotheses about the role of the medial prefrontal cortex (PFC) in cognitive control, the ability to use task-relevant cues and ignore taskirrelevant cues to guide behavior. The "central computation" hypothesis assumes that cognitive control relies on computations performed by the PFC, which then interacts with other brain regions to accomplish the task. Alternatively, the "local computation" hypothesis suggests that computations necessary for cognitive control are carried out by other brain regions that have been shown to be essential for cognitive control tasks, such as the dorsal hippocampus and the thalamus. If the central computation hypothesis is correct, PFC lesions should disrupt cognitive control. Alternatively, if the local computation hypothesis is correct, cognitive control would be spared after PFC lesions. The task used to assess cognitive control is the active place avoidance task in which rats must avoid a section of a rotating arena using the stationary room cues and ignoring the local olfactory cues on the rotating platform. Performance on this task has previously been shown to be disrupted by hippocampal lesions and hippocampal ensembles dynamically represent the room and arena depending on the animal's proximity to the shock zone. They found no group (lesion vs. sham) differences in the three behavioral parameters tested: distance traveled, latency to enter the shock zone, and number of shock zone entries for both the standard task and the "conflict" task in which the shock zone was rotated by 180 degrees. The only significant difference was the savings index; the lesion group entered the new shock zone more often than the sham group during the first 5 minutes of the second conflict session. This deficit was interpreted as a cognitive flexibility deficit rather than a cognitive control failure. Next, the authors compared cytochrome oxidase activity between sham and lesion groups in 14 brain regions and found that only the amygdala showed significant elevation in the lesion vs. sham group. Pairwise correlation analysis revealed a striking difference between groups, with many correlations between regions lost in the lesion group (between reuniens and hippocampus, reuniens and amygdala and a correlation between dorsal CA1 and central amygdala that appeared in the lesion group and were absent in the sham group. Finally, the authors assessed dorsal hippocampal representations of the spatial frame (arena vs. room) and found no differences between lesion and sham groups. The only difference in hippocampal activity was reduced overdispersion in the lesion group compared to the sham group on the pretraining session only and this difference disappeared after the task began. Collectively, the authors interpret their findings as supporting the local computation hypothesis; computations necessary for cognitive control occur in brain regions other than the PFC. 

      Strengths:

      (1) The data were collected in a rigorous way with experimental blinding and appropriate statistical analyses. 

      (2) Multiple approaches were used to assess differences between lesion and sham groups, including behavior, metabolic activity in multiple brain regions, and hippocampal singleunit recording. 

      Weaknesses:

      (1) Only male rats were used with no justification provided for excluding females from the sample.

      This is a weakness we acknowledge. The experiments were performed at a time when we did not have female rats in the lab.

      (2) The conceptual framework used to interpret the findings was to present two competing hypotheses with mutually exclusive predictions about the impact of PFC lesions on cognitive control. The authors then use mainly null findings as evidence in support of the local computation hypothesis. They acknowledge that some people may question the notion that the active place avoidance task indeed requires cognitive control, but then call the argument "circular" because PFC has to be involved in cognitive control. This assertion does not address the possibility that the active place avoidance task simply does not require cognitive control. 

      We beg to differ that the possibility was not addressed. Prior to making the assertion, the manuscript describes the evidence that the active place avoidance task requires cognitive control. The evidence is multifold, and includes task design, behavior, and electrophysiology; we argue that this is more evidence than has been provided for other tasks that are asserted to require cognitive control. Specifically line 417 states:

      “We have previously demonstrated cognitive control in the active place avoidance task variant we used (Fig. 1) because the rats must ignore local rotating place cues to avoid the stationary shock zone. Even when the arena does not rotate, rats distinctly learn to avoid the location of shock according to distal visual room cues and local olfactory arena cues, such that the distinct place memories can be independently manipulated using probe trials [49, 50]. When the arena rotates as in the present studies, neural manipulations that impair the place avoidance are no longer impairing when the irrelevant arena cues are hidden by shallow water [14, 15, 51, 52]. Furthermore, persistent hippocampal neural circuit changes caused by active place avoidance training are not detected when shallow water hides the irrelevant arena cues to reduce the cognitive control demand [10, 31, 33]. While these findings unequivocally demonstrate the salience of relevant stationary room cues to use for avoiding shock and irrelevant arena cues to ignore during active place avoidance, the most compelling evidence of cognitive control comes from recording hippocampal ensemble discharge. Hippocampal ensemble discharge purposefully represents current position using stationary room information when the subject is close to the stationary shock zone and alternatively represents rotating arena information when the mouse is far from the stationary shock zone [Fig. 4; 10].”

      Line 436, however, acknowledges a fact that will always be true: no matter what anyone opines - until there are universally agreed upon objective criteria, it is logically possible that active place avoidance does not require cognitive control. The revision states: Despite this evidence from task design, behavioral observations, and direct electrophysiological representational switching as required to directly demonstrate cognitive control, one might still argue that it is logically possible that the active place avoidance task does not require cognitive control and this is why the mPFC lesion did not impair place avoidance of the initial shock zone. We consider such reasoning to be unproductive because it presumes that only tasks that require an intact mPFC can be cognitive control tasks. We nonetheless acknowledge that for some, we have not provided sufficient evidence that the active place avoidance requires cognitive control.

      “We assert the evidence is compelling, and together these findings require rejecting the central-computation hypothesis that the mPFC is essential for the neural computations that are necessary for all cognitive control tasks.”

      (3) The authors did not link the CO activity with the behavioral parameters even though the CO imaging was done on a subset of the animals that ran the behavioral task nor did they make any attempt to interpret these findings in light of the two competing hypotheses posed in the introduction. Moreover, the discussion lacks any mechanistic interpretations of the findings. For example, there are no attempts to explain why amygdala activity and its correlation with dCA1 activity might be higher in the PFC lesioned group. 

      The CO study was performed to assess the effects of the lesion, as stated on line 262 “Cytochrome oxidase (CO), a sensitive metabolic marker for neuronal function [27], was used to evaluate whether lesion effects were restricted to the mPFC.” Furthermore, as a matter of fact, line 411 states “Thus, CO imaging and electrophysiological evidence identify changes in the brain beyond the directly damaged mPFC area. In particular, the dorsal hippocampus loses the inhibitory input from mPFC [45, 46] and loses the metabolic correlation with the nucleus reuniens, which is thought to be a relay between the mPFC and the dorsal hippocampus [47, 48].”

      These CO measures assess baseline metabolic function and so it would be inappropriate to correlate them with the measures of behavior. Because the lesion and control groups do not differ on most measures of behavior, a relationship to CO measures is not expected. Importantly, even if there were differences in correlations between CO activity and behavioral measures, what could they mean? The study was designed to distinguish between two hypotheses, not to determine what CO differences could mean for behavior. As such, it is not at all clear how metabolic consequences of the lesion relate to the two hypotheses being evaluated, and so we consider it inappropriate to speculate. We did examine, and now include, the correlation between lesion size and conflict behavior. The Fig. 1 legend states “Savings was not related to lesion size r = 0.009, p = 0.98. *p < 0.05.”

      (4) Publishing null results is important to avoid wasting animals, time, and money. This study's results will have a significant impact on how the field views the role of the PFC in cognitive control. Whether or not some people reject the notion that the active place avoidance task measures cognitive control, the findings are solid and can serve as a starting point for generating hypotheses about how brain networks change when deprived of PFC input. 

      We thank the reviewer for the acknowledgement.

      Reviewer #3 (Public review): 

      Summary:

      This study by Park and colleagues investigated how the medial prefrontal cortex (mPFC) influences behavior and hippocampal place cell activity during a two-frame active place avoidance task in rats. Rats learned to avoid the location of mild shock within a rotating arena, with the shock zone being defined relative to distal cues in the room. Permanent chemical lesions of the mPFC did not impair the ability to avoid the shock zone by using distal cues and ignoring proximal cues in the arena. In parallel, hippocampal place cells alternated between two spatial tuning patterns, one anchored to the distal cues and the other to the proximal cues, and this alteration was not affected by the mPFC lesion. Based on these findings, the authors argue that the mPFC is not essential for differentiating between task-relevant and irrelevant information. 

      Strengths:

      This study was built on substantial work by the Fenton lab that validated their two-frame active place avoidance task and provided sound theoretical and analytical foundations. Additionally, the effectiveness of mPFC lesions was validated by several measures, enabling the authors to base their argument on the lack of lesion effects on behavior and place cell dynamics. 

      Weaknesses:

      The authors define cognitive control as "the ability to judiciously use task-relevant information while ignoring salient concurrent information that is currently irrelevant for the task." (Lines 77-78). This definition is much simpler than the one by Miller and Cohen: "the ability to orchestrate thought and action in accordance with internal goals (Ref. 1)" and by Robbins: "processes necessary for optimal scheduling of complex sequence of behaviour." (Dalley et al., 2004, PMID: 15555683). Differentiating between task-relevant and irrelevant information is required in various behavioral tasks, such as differential learning, reversal learning, and set-shifting tasks. Previous rodent behavioral studies have shown that the integrity of the mPFC is necessary for set-shifting but not for differential or reversal learning (e.g., Enomoto et al., 2011, PMID: 21146155; Cho et al., 2015, PMID: 25754826). In the present task design, the initial training is a form of differential learning between proximal and distal cues, and the conflict training is akin to reversal learning. Therefore, the lack of lesion effects is somewhat expected. It would be interesting to test whether mPFC lesions impair set-shifting in their paradigm (e.g., the shock zone initially defined by distal cues and later by proximal cues). If the mPFC lesions do not impair this ability and associated hippocampal place dynamics, it will provide strong support for the authors' local computation hypothesis.

      Thank you for these comments. In addressing them we have provided a significant revision to the manuscript’s Introduction. While authors like those cited by the reviewer have defined cognitive control, those definitions are difficult to test rigorously, as it is almost a matter of opinion whether a subject is displaying “the ability to orchestrate thought and action in accordance with internal goals" or whether they are using "processes necessary for optimal scheduling of complex sequence of behaviour." What would such definitions of cognitive control predict about neuronal activity? We have deliberately used a simple, operational definition of cognitive control because it is physiologically testable. In the revision, starting at line 93, we have provided an excerpt from Miller and Cohen (2001) with discussion. The importance of that work is that it provides explicit neuronal criteria and a means to operationally define cognitive control. As stated on Line 118 “Accordingly, cognitive control would be at work when there is sustained neuronal network representations of task-relevant information that suppresses or gates representations of salient task-irrelevant information in accord with purposeful judicious behavior.”

      We used a R+A- task variant in which there is a stationary room-frame shock zone and task irrelevant arena-frame information. A strict correspondence to shift-shifting task design cannot be accomplished with active place avoidance because an A+R- task that requires avoiding an arena-frame shock zone in the absence of a room-frame shock zone can be accomplished trivially if the subject chooses to not move when it is in a place with no shock. However, the R+A+ task variant is readily learned, in which there is both a room-frame and an arena-frame shock zone (see cited work below). This task variant requires the subject to judiciously shift between avoiding the room-frame shock zone using stationary room information and avoiding the arena-frame shock zone using rotating arena information. This R+A+ task variant might meet the reviewer’s criteria for cognitive control. We have recorded hippocampal and entorhinal ensemble activity during the R+A+ task variant and it is very similar to the activity during the R+A- task we used. Nonetheless, future work will investigate the efect of mPFC lesion on the R+A+ task variant.

      Cited work:

      Fenton AA, Wesierska M, Kaminsky Y, Bures J (1998), Both here and there: simultaneous expression of autonomous spatial memories in rats. Proc Natl Acad Sci U S A 95:11493-11498. Kelemen E, Fenton AA (2010), Dynamic grouping of hippocampal neural activity during cognitive control of two spatial frames. PLoS Biol 8:e1000403.

      Burghardt NS, Park EH, Hen R, Fenton AA (2012), Adult-born hippocampal neurons promote cognitive flexibility in mice. Hippocampus 22:1795-1808.

      Park EH, Keeley S, Savin C, Ranck JB, Jr., Fenton AA (2019), How the Internally Organized Direction Sense Is Used to Navigate. Neuron 101:1-9.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors): 

      (1) Incorporate the cytochrome oxidase and hippocampal recordings (rationale and hypothesis) in the introduction, explaining how these aspects are relevant to the general question. 

      We have done this as requested. See lines 159-173 of the revised introduction.

      (2) Figure 1C. On Day 4-5 (conflict training) in which the shock zone was relocated 180 deg from the initial location, the behavioral tracks did not show any presence of the rat in this sector (in particular for the lesion example). Figure 4 nevertheless indicates that entrances have been made (which was expected since rats have to know that the shock zone was relocated).

      Thanks for pointing this out. The tracks are from the end of the sessions. The labels have been changed to specify which trials the tracks are from.

      (3) Figure 1C. The caption is huge as it contains the statistical analyses details. I would prefer to have these details in the text and keep the caption at a "reasonable" length. At the end of the caption (l. 190-191), it would be less confusing the keep the numbering of the training days: replace D1T1 with D2T1 and D2T9 with D3T9).

      The statistical details have been relocated to the main text and the numbering updated, as suggested, thank you.

      (4) It was not inconsiderable to show that mPFC lesion had some effects in the present task if it were only to validate the effectiveness of the lesion. This brain area has been shown to be important for planning, cognitive flexibility, etc. Indeed the authors found that the saving index was greater in sham than in mPFC rats (overdispersion in hippocampal firing was also reduced in pretraining) and interpreted this result as impaired flexibility. Would an alternative explanation be a memory deficit? I nevertheless expected that impaired flexibility in mPFC rats would be expressed in conflict trials in the form of more entrances in the zone that was initially not associated with shock (at least in the first trials of Day 4). But it appears to not be the case.

      A memory deficit is unlikely to explain the difference between the groups on the first trial of Day 5. Memory in the lesion rats was tested multiple times, specifically at the start of each trial (time to first entrance), including on the 24-h retention test, and no deficits were observed. Performance on Day 9 trial 1 is worse in the lesion group than in the controls, but it is not parsimonious to attribute this to a simple memory deficit since 24-h memory was good and similar between lesion and control rats on days 3 and 4, and memory on Day 5 was equally poor in both the lesion and control rats, as measured by time to first entrance.  

      (5) Material and methods. The injected volume of ibotenic acid should be mentioned. 

      The volume 0.2 µl was added. See line 531.

      (6) The rationale for doing the conflict training session should be indicated somewhere. 

      The rationale was provided. See lines 204-208.

      Reviewer #2 (Recommendations for the authors): 

      (1) Line 132: The text states that all sham rats improved and only 6/10 lesion rats improved is followed by a t-test, which tests the difference between means; it does not compare proportions. Also, what criterion was used to determine if an improvement was seen or not? 

      The statistical comparison is provided (now lines 230: test of proportions z = 2.3, p = 0.03). Improvement was simply numerically fewer entrances.

      (2) Line 138: This is a very long and confusing sentence. Consider revising for clarity. 

      The sentence (now line 234) was revised.

      (3) Figure 1B only includes data from 3 animals. Most published studies show the whole dataset by presenting the largest and smallest lesions. 

      Supplemental Figure S2 was added with all the lesions depicted and quantified.

      (4) Figure 1C suggestion to make the schematic shock zone line up with the shock zone shown for the tracking data. 

      Graphically, it looks better as drawn as it uses to perspective to depict a three-dimensional structure.

      (5) Methods: Clarify if the shock zone location was the same across all rats. 

      Line 570 states that the shock zone was the same for all rats.

      (6) Line 158: "Behavioral tracks" is not clear. Suggest more precise wording.

      Reworded to “Tracked room-frame positions” (now line 249)

      (7) Line 166: "effect of trial" - should this be the main effect of trial?; "interaction" - should this be "group x trial" interaction? 

      Reworded (now line 181).

      (8) Line 167: "or their interaction" is awkward in the context of the sentence. 

      Reworded (now line 182).

      (9) Line 182: Avoid talking about "trends" as if they are almost significant unless the authors suspect that they did not have sufficient statistical power to detect differences. In that case, a power analysis should be provided. 

      Removed.

      (10) Line 190: "left:...right..." is hard to follow, especially with acronyms like D1T1. Consider revising for clarity. 

      Revised (now lines 246-248).

      (11) Line 195: "effectiveness of the PFC to impair" is unnecessarily verbose. 

      Reworded (now lines 255-257).

      (12) Savings results: There is a lot of variability in the lesion group. It would be interesting to know if the extent of the lesion correlates with savings.

      Savings was not related to lesion. See line 259.

      (13) Line 300: The thalamic recording results are not reported in the results section (other than appearing in the table). Moreover, there is no detail about which thalamic nucleus these recordings are from.

      Lines 411 and 614 provides these details.  

      (14) Line 312: "no longer impair" contains a grammatical error. 

      Corrected (now line 422)

      (15) Line 325: "was not impairing" contains a grammatical error. 

      Corrected (now line 437).

      (16) Line 327: The sentence ending with "...opinion of others" seems unnecessarily confrontational. 

      Previous reviewers at other journals have maintained this position, we therefore included such a strong statement in our initial submission. However, we now revised this statement to avoid appearing confrontational.

      (17) Line 329: Sentence is awkward. Consider revising. 

      Revised (now line 443).

      (18) Line 384: The authors should disclose if there was an objective metric for determining the adequacy of the lesion. 

      The lesion assessment and quantification is better explained in the Methods under “Cytochrome oxidase activity and Nissl staining,” (lines 708-714).

      (19) Line 385: The authors should clarify how they got from 15 rats (Line 376) to 10. 

      This information is provided in the methods.

      (20) Line 390: It is not clear why skin irritation in the cage mate would prevent the rat from being tested. 

      This has been explained in the Methods under “Behavioral analysis followed by cytochrome oxidase activity” (lines 515-518).

      (21) Methods section: The authors should describe how the tracking data were acquired. Overhead camera? Tracker based on luminance or body position? What software program was used? What was the sampling rate? 

      This is now better explained in the Methods under “Active place avoidance task) (lines 538551).

      (22) Methods section: Include how fast the arena was rotating and other details about the task such as where rats were placed during the ITI. 

      Better explained in the Methods under “Active place avoidance task”.

      (23) Line 439: The recording system used (hardware & software) should be stated. 

      This is now included in the Methods (line 538).

      (24) Line 435: Though overdispersion calculation is described thoroughly, there is nothing in the paper that tells me what overdispersion means. 

      What the measure means is now described in the Methods under “Electrophysiology data analysis” (lines 646-650).

      (25) Line 561: The test used to assess effect sizes should be stated. 

      Effect sizes corresponding to the statistical tests are provided.

      Reviewer #3 (Recommendations for the authors): 

      (1) At the end of the conflict training, rats with mPFC lesions learned to avoid the new shock zone (Figure 1F, Block 16), but their place cells did not show room-preferring activity near the shock zone (Figure 4B). This observation questions whether spatial frame-specific representation is relevant for active avoidance. Can the authors clarify this point?

      This is a dynamic behavior and the hippocampal dynamics match, changing with a dynamic that is a few seconds, as we have shown in several published papers. The lack of a preference averaged over 20 minutes when the rats are avoiding both the current and former shock zones during the conflict session is pretty much what would be expected from such a coarse measurement. The important measure is the spatially-resolved measure of room versus arena preference. Figure 4B shows that in the lesion rats there is less of a frame preference during conflict, generally (consistent with poorer flexibility). However, Figure 4D quantifies the frame preference near and far from the shock zone and accordingly, there is no difference between the groups.

      (2) Related to the point above, the author might consider including panels in Figures 4C and D to show the neural activity during the pretraining and conflict training retention period. I assume p(room) will be comparable between the Near and Far segment in both sessions, but the p(room) may be higher in the Conflict training session than the Pretraining session. This would show that the mPFC lesion impairs suppressing the place cell activity encoding the old shock location. 

      Thanks for the suggestion. While we don’t think we can draw any strong conclusions from this analysis we are fine to show it. The issue is that during conflict, the rats have two perfectly reasonable representations of where there was shock, the initial location that was turned off to make the conflict, and the most recent conflict location of shock. Importantly, these recordings are during conflict retention after we turned off the shock for the retention recording (for the second time in the rat’s experience). Turning off the shock allows us to exactly match the physical conditions of pretraining, initial retention and conflict retention, which was the experimental design’s goal. However, the experiential history of the rats prior to initial retention and conflict retention cannot match, because during initial retention the rats had never experienced a changed shock zone whereas, by conflict retention, they had experienced multiple changes. Importantly, we have previously shown that mouse hippocampal ensembles represent both initial and conflict shock locations, as the animals consider their options during conflict trials (see Dvorak et al 2018, PLoS Biol 16:e2003354). Consequently, we cannot make any strong predictions about whether or not hippocampal activity during conflict retention should be room-frame preferring selectively in the vicinity of the current shock zone. As I am sure the reviewer appreciates from their own introspection, mental representations are mercifully not obliged to dictate behavior. In fact, that is what is interesting and controversial about cognitive control – it is a dynamic internal process and the innovation of our work lies in demonstrating that one cannot only rely on behavior to assess this process. Nonetheless, we did this analysis and now present it in the revised Fig. 4. During pretraining both lesion and sham groups express no particular spatially-modulated preference for either the room or the arena frame, as expected. During initial training both groups express a room-frame preference in the vicinity of the shock zone, as we initially reported. By inspection, during conflict, the sham rats express a preference for room-frame activity in the vicinity of the most recent shock zone location; this preference is weaker than what is expressed during initial retention. The lesion rats do not show this preference. These impressions are quantified in revised Fig. 4D; the comparisons within the conflict retention sessions did not reach statistical significance. We leave it to the reader to interpret what that means. Thanks for the nudge.

      (3) The significant group difference in place cell overdispersion during the pretraining phase (Figure 3C) is interesting, but some readers would appreciate additional sentences on its functional implication. Does it mean the spatial tuning of place cells was disrupted by the mPFC lesion?

      Only the reliability of spatial firing was altered, not the spatial tuning.

      (4) Although the method section described how to calculate overdispersion and SFEP, some concise, intuitive descriptions of these measures in the result section would help readers understand these results.

      Overdispersion is better explained. See lines 646-650.

      (5) I recommend adding a figure of the task performance of the rats used in the electrophysiological recording experiment and a table summarizing the number of cells recorded per animal. 

      We have included Table S2 with the cell counts and a summary of the performance for each of the rat in the electrophysiological recording experiment.

      (6) Readers would appreciate additional information on task apparatus, such as the size, appearance, and rotating speed of the arena, as well as stationary cues available in the room. 

      This is now provided in the Methods under “Active place avoidance task”.

      (7) Lines 425-416: "On the fourth day of the behavioral training, the rats had a single trial with the shock on to test retention of the training." Shouldn't it be "shock off"? 

      No the shock was on to prevent extinction learning and to increase the challenge for conflict learning.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Major Concerns/Public Review

      Comment 1: There is a mild disconnect between behavioral readout (reflexive pain) and neural circuits of interest (emotional). Considering that this circuit is likely engaged in the aversiveness of pain, it would have been interesting to see how carrageenan and/or AIE impacted non-reflexive pain measures. Perhaps this would reveal a potentiated or dysregulated phenotype that matches the neurophysiological changes reported. However, this critique does not take away from the value of the paper or its conclusions.

      We agree that including measures of non-reflexive pain would enhance future studies and potentially reveal a phenotype that is closely related to the observed changes in neurophysiology.

      Minor Concerns/Recommendations

      Comment 1: There are a few minor grammatical errors in the text, mostly in the captions. A close read should be able to identify these errors.

      We have fixed what grammatical errors we found.

      Reviewer #2:

      Major Concerns/Public Review

      No major concerns.

      Minor Concerns/Recommendations

      Comment 1: If pain sensitivity was assessed at 3 time points post carrageenan administration, why were these data averaged? Were there no differences between the time points? The data from the 3 time points should be presented, either in a figure, table, or supplementary materials.

      We averaged the pain sensitivity data across the 3 time points following carrageenan administration because we were trying to present this data in a more concise manner. Pain sensitivity did change over time following carrageenan administration. We have now included the unaveraged data in figure 2 (panels D, F, H, and J).

      Comment 2: For the optically-evoked EPSCs and IPSCs, were the peak amplitudes the max responses that could be obtained? If not, how were levels of ChR2 expression or light intensity controlled for?

      The peak amplitudes for EPSCs and IPSCs were half the maximal response that could be evoked by optical stimulation. The AMPA and NMDA currents were maximal responses as prior literature indicated some PVINs have small NMDA currents, and we wanted to ensure these currents would be detected reliably. We updated our methods section to include this information in the voltage clamp recordings section.

      Comment 3: In the example traces for the aEPSC experiment, the figure legend states that the "+" symbol indicates an asynchronous event. However, there are several "|" or "-" symbols in the figure. Perhaps this is an issue with the resolution of the figure and those are supposed to be "+"s.

      We have increased the resolution of the figures to ensure that the markings of the asynchronous events display properly. We apologize for not noticing that these symbols were not displayed correctly in the original figures included in the manuscript.

      Comment 4: For the von Frey and the Hargreaves test, were animals acclimated to the apparatus in the days leading up to the first test, or was the 5-minute pre-test the only acclimation that was done? This information needs to be provided. If the latter, there is concern that the animals did not fully acclimate to the apparatus and handling prior to testing, which should be taken into consideration in the interpretation of the behavioral analyses.

      The rats underwent handling once a day for three days prior to the first von Frey and Hargreaves tests. On the day prior to the first test, rats were acclimated to the von Frey and Hargreaves apparatuses. The acclimation period consisted of a 15-min exposure to the von Frey apparatus and a 30-min exposure to the Hargreaves apparatus for each animal. This information has been added to the revised methods section under the assessment of mechanical and thermal sensitivity heading.

      Reviewer #3:

      Major Concerns/Public Review

      Comment 1: There is incomplete evidence supporting some of the conclusions drawn in this manuscript. The authors claim that the changes in feedforward inhibition onto pyramidal cells are due to the changes in parvalbumin interneurons, but evidence is not provided to support that idea. PV cells do not spontaneously fire action potentials spontaneously in slices (nor do they receive high levels of BLA activity while at rest in slices). It is possible that spontaneous GABA release from PV cells is increased after AIE but the authors did not report sIPSC frequency. Second, the authors did not determine that PV cells mediate the feedforward BLA op-IPSCs and changes following AIE (this would require manipulation to reduce/block PV-IN activity). This limitation in results and interpretation is important because prior work shows BLA-PFC feedforward IPSCs can be driven by somatostatin cells. Cholecystokinin cells are also abundant basket cells in PFC and have been recently shown to mediate feedforward inhibition from the thalamus and ventral hippocampus, so it's also possible that CCK cells are involved in the effects observed here.

      The hypothesis that adolescent alcohol exposure could change spontaneous GABA release from PVINs is an interesting one that merits future exploration. Unfortunately, as the focus of this manuscript was on circuit-specific alterations in synaptic function, this experiment is somewhat outside the scope of the paper as sIPSCs and mIPSCs are not circuit specific measures of GABA activity and would not reflect spontaneous release from only GABA interneurons receiving input from the BLA. Despite this, a future study investigating spontaneous GABA release from PVINs in the PrL would be a valuable complement to the present study.

      While we did not directly manipulate PVINs to demonstrate that decreased oIPSC amplitude at PrL<sup>PAG</sup> neurons following AIE is due solely to changes in PVINs, it is notable that both the intrinsic excitability of PVINs and the BLA-driven E/I balance at PVINs were reduced following AIE. These changes would be consistent with decreased PVIN output onto PrL<sup>PAG</sup> neurons. However, we agree that this does not preclude the possibility that changes in SST or CCK interneurons contribute to the observed decrease in BLA-driven inhibition at PrL<sup>PAG</sup> neurons following AIE. As such, we have altered the wording in the discussion to indicate that reduced BLA-driven feedforward inhibition of PrL<sup>PAG</sup> neurons may be related, at least in part, to the observed changes in PVINs.

      Comment 2: The authors conclude that the changes in this circuit likely mediate long-lasting hyperalgesia, but this is not addressed experimentally. In some ways, the focused nature of the study is a benefit in this regard, as there is extensive prior literature linking this circuit with pain behaviors in alternative models (e.g., SNI), but it should be noted that these studies have not assessed hyperalgesia stemming from prior alcohol exposure. While the current studies do not include a causative behavioral manipulation, the strength of the association between BLA-PL-PAG function and hyperalgesia could be bolstered by current data if there were relationships detected between electrophysiological properties and hyperalgesia. Have the authors assessed this? In addition, this study is limited by not addressing the specificity of synaptic adaptations to the BLA-PL-PAG circuit. For instance, PL neurons send reciprocal projections to BLA and send direct projections to the locus coeruleus (which the authors note is an important downstream node of the PAG for regulating pain).

      We have not assessed correlations between the electrophysiological properties and hyperalgesia. We feel that future studies using DREADDs to perform cell-type and circuit-specific manipulations can better address the involvement of this circuitry in long-lasting hyperalgesia following AIE. With respect to the circuit specificity of the observed changes, we have previously evaluated the effects of AIE on pyramidal neurons projecting from the PrL to the BLA (PrL<sup>BLA</sup>). We found that following AIE exposure there was no change in the intrinsic excitability of these neurons. In addition, the amplitude and frequency of sEPSCs and sIPSCs onto PrL<sup>BLA</sup> neurons was unchanged. While these results did not assess whether the BLA-PrL-BLA circuit undergoes synaptic adaptations similar to those observed in the BLA-PrL-vlPAG circuit, it is notable that the intrinsic excitability of PrL<sup>BLA</sup> neurons was unchanged following AIE exposure. This indicates that the effects of AIE on the intrinsic excitability of pyramidal neurons in the PrL may be circuit specific. We agree that it would be interesting to study the effect of AIE on PrL neurons that project to the locus coeruleus, however due to the well-defined role of the BLA-PrL-vlPAG circuit in pain we chose to evaluate this circuit first.

      Comment 3: I have some concerns about methodology. First, 5-ms is a long light pulse for optogenetics and might induce action-potential independent release. Does TTX alone block op-EPSCs under these conditions? Second, PV cells express a high degree of calcium-permeable AMPA receptors, which display inward rectification at positive holding potentials due to blockade from intracellular polyamines. Typically, this is controlled/promoted by including spermine in the internal solution, but I do not believe the authors did that. Nonetheless, the relatively low A/N ratios for this cell type suggest that CP-AMPA receptors were not sampled with the +40/+40 design of this experiment, raising concerns that the majority of AMPA receptors in these cells were not sampled during this experiment. Finally, it should be noted that asEPSC frequency can also reflect changes in a number of functional/detectable synapses. This measurement is also fairly susceptible to differences in inter-animal differences in ChR2 expression. There are other techniques for assessing presynaptic release probability (e.g., PPR, MK-801 sensitivity) that would improve the interpretation of these studies if that is intended to be a point of emphasis.

      When we included TTX but not 4-AP we did not observe any optically evoked responses, so we don’t believe that the 5-ms pulse induced action-potential independent release in these experiments. With respect to the second point, we did not include spermine in the internal solution for the AMPA/NMDA recordings in PVINs, and it is possible that endogenous polyamines interfered with recording CP-AMPA receptors in the +40/+40 design. To address this concern, we recalculated the AMPA/NMDA ratio for PVINs using data from an optically evoked AMPA current that was collected while holding the cell at -70 mV. This data was collected at the end of the +40/+40 recording protocol as we were interested in assessing whether there would be any difference in the ratio of the +40/-70 AMPA current across treatment conditions. As there were no observed difference in the +40/-70 AMPA current ratio across treatment groups, we had originally used the +40 AMPA current for calculating the AMPA/NMDA ratio for PVINs to make the methods for calculating this ratio uniform for both PVINs and PrL<sup>PAG</sup> neurons. The methods, results, and Fig. 10 have been updated to reflect the recalculated AMPA/NMDA ratio for PVINs. Notably, only the significance of the AIE x carrageenan interaction was altered by the change in the way the AMPA/NMDA ratio was calculated. Originally, this interaction displayed a trend toward significance (p = 0.0501), however when the recalculated AMPA/NMDA ratio was analyzed this interaction term became significant (p = 0.0131). We have also added the +40/-70 AMPA ratio to figure 10 as it might be of interest.

      Finally, the point regarding aEPSC frequency reflecting not only release probability but also the number of functional/detectable synapses is an important consideration. For this manuscript, we intentionally selected aEPSC frequency for this reason. As the BLA to PrL projection continues to mature during adolescence, the number of BLA contacts onto GABA neurons in the PrL increases. Thus, we thought that it was possible that AIE would alter the number of detectable BLA inputs onto PVINs. We acknowledge that as this measure is sensitive to differences in ChR2 expression between animals/slices it can be difficult to interpret. We also agree that in the future it would be beneficial to include either PPR or MK-801 sensitivity to improve interpretability.

      Comment 4: In a few places in the manuscript, results following voluntary drinking experiments (especially Salling et al. and Sicher et al.) are discussed without clear distinction from prior work in vapor models of dependence.

      We have altered the manuscript to specifically note where voluntary drinking was used rather than vapor models.

      Comment 5: Discussion (lines 416-420). The authors describe some differing results with the literature and mention that the maximum current injection might be a factor. To me, this does not seem like the most important factor and potentially undercuts the relevance of the findings. Are the cells undergoing a depolarization block? Did the authors observe any changes in the rheobase or AP threshold? On the other hand, a more likely difference between this and previous work is that the proportion of PAG-projecting cells is relatively low, so previous work in L5 likely sampled many types of pyramidal cells that project to other areas. This is a key example where additional studies by the current group assessing a distinct or parallel set of pyramidal cells would aid in the interpretation of these results and help to place them within the existing literature. Along these lines, PAG-projecting neurons are Type A cells with significant hyperpolarization sag. Previous studies showed that adolescent binge drinking stunts the development of HCN channel function and ensuing hyperpolarization sag. Have the authors observed this in PAG-projecting cells? Another interesting membrane property worth exploring with the existing data set is the afterhyperpolarization / SK channel function.

      In discussing the maximum current injection as a factor in differing results on intrinsic excitability, we were principally considering how the additional data points increase the power of the analysis and thus the likelihood of detecting an effect. In focusing on this, however, we ignored other relevant and interesting factors that we should also have discussed. Additional analyses examining HCN and SK channel function have now been added to the manuscript and incorporated into the results section under the heading Adolescent Intermittent Ethanol Exposure and Carrageenan Enhanced the Intrinsic Excitability of Prelimbic Neurons Projecting to the Ventrolateral Periaqueductal Gray. We have also modified the third paragraph in the discussion to add additional context. Additional information on the biophysical properties of the neurons has been added to Figure 4.

      Minor Concerns/Recommendations

      Comment 1: Subheadings are vague. "Analysis of..." Should be rephrased to use active voice to describe key findings.

      The subheadings have been rephrased to describe key findings.

      Comment 2: Consider altering or consolidating the figure layout for clarity. For instance, it would be helpful for aEPSCs to be near the AMPA and NMDA experiments. The feedforward IPSCs could also be with the PV-IN recordings. This would be helpful in developing a cohesive picture of key findings. To that end, a working model or graphical abstract would be helpful.

      It doesn’t appear that this journal allows graphical abstracts, but we have added a model that summarizes the principal findings in the discussion.

      Comment 3: There are a lot of statistics punctuating the text in the Results. It can be hard to parse at times.

      We considered moving the statistics to tables, but this became unwieldy.

      Comment 4: The Discussion is quite long (10 paragraphs). Suggest consolidating to 3-4 most salient points.

      We appreciate this comment and have made some edits to the discussion, albeit without consolidating it to only 3-4 points.

    1. Author response:

      eLife Assessment

      The authors provide a useful summary of ten years of Brain Initiative funding including the historical development, the specific funding mechanisms, and examples of grants funded and work produced. The authors also conduct analyses of the impact on overall funding in Systems and Computational Neuroscience, the raw and field normalized bibliographic impact of the work, the social media impact of the funded work, and the popularity of some tools developed. The evidence for impact is incomplete due to the omission of a comparison group of funded grants.

      In this combined version, we include a comparison group of non-BRAIN Initiative R01s derived from the parent notice of funding opportunity from FY2014-2022. We performed a bibliometric analysis of the publications, citations, RCR and budget productivity measure of the non-BRAIN parent R01. 

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This is a convincing description of approximately ten years of funding from the NIH BRAIN initiative. It is of particular value at this moment in history, given the cataclysmic changes in the US government structure and function occurring in early 2025.

      Strengths:

      The paper contains a fair bit of documentation so that the curious reader can actually parse what this BRAIN program funded.

      Weaknesses:

      There are too many acronyms, and the manuscript reads as if it were an internal NIH document, where the audience knows all of the NIH nomenclature and program details. It is not particularly friendly to the outside, lay reader.

      In this version, we have attempted to minimize acronyms and explain NIH nomenclature and program details to make it more accessible to readers not familiar with NIH terminology.

      Reviewer #2 (Public review):

      Summary:

      The authors provide an important summary of ten years of Brain Initiative funding including a description of the historical development of the initiative, the specific funding mechanisms utilized, and examples of grants funded and work produced. The authors also conduct analyses of the impact on overall funding in Systems and Computational Neuroscience, the raw and field normalized bibliographic impact of the work, the social media impact of the funded work, and the popularity of some tools developed.

      Strengths:

      This is a useful perspective on an important funding initiative over a ten-year period. It is clearly written and the illustrations and analyses are mostly useful for understanding the impact of the initiative.

      Weaknesses:

      The major limitation is that the bibliographic analysis does not provide a comparison group of funded grants. Because work that successfully competes for funding is likely to be more impactful than all work in a given area, the normalization of citations to field medians may reflect this "grant review" effect, rather than anything special about the Brain Initiative. Hopefully, this speculation is incorrect (I would guess that it is), but it would be helpful to try to demonstrate this more directly by including a funded comparison group.

      In this version, we have provided a comparison group of parent R01s that are not funded through the BRAIN Initiative from FY2014-2022 in Figure 3. We include publication metrics and budget efficiency measures for this comparison group.  

      There are also minor inconsistencies in the numbering of the figures that need to be cleared up.

      We have updated the figure numbers.

    1. Author response:

      eLife Assessment 

      The manuscript presents some useful accounts of experiences funding team projects within the BRAIN Initiative. These would be more appropriate to add to the companion manuscript since the present manuscript contains some overlapping analyses and does not stand well on its own. Therefore the evidence supporting the conclusions is incomplete. 

      We appreciate the feedback on merging both manuscripts into one and have followed the advice in this version. 

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      In this useful narrative, the authors attempt to capture their experience of the success of team projects for the scientific community.  

      Strengths: 

      The authors are able to draw on a wealth of real-life experience reviewing, funding, and administering large team projects, and assessing how well they achieve their goals. 

      Weaknesses: 

      The utility of the RCR as a measure is questionable. I am not sure if this really makes the case for the success of these projects. The conclusions do not depend on Figure 1. 

      We respectfully disagree about the utility of the RCR, particularly because it is metric that is normalized by both year and topical area. We have added a more detailed description of how the RCR is calculated on page 6-7. Please note that figure 1 is aimed to highlight the funding opportunities, investments and number of awards associated with small lab (exploratory) versus team (elaborated, mature) research rather than a description of publication metrics.  

      Reviewer #2 (Public review): 

      Summary: 

      The authors review the history of the team projects within the Brain initiative and analyze their success in progression to additional rounds of funding and their bibliographic impact. 

      Strengths: 

      The history of the team projects and the fact that many had renewed funding and produced impactful papers is well documented. 

      Weaknesses: 

      The core bibliographic and funding impact results have largely been reported in the companion manuscript and so represent "double dipping" I presume the slight disagreement in the number of grants (by one) represents a single grant that was not deemed to address systems/computational neuroscience. The single figure is relatively uninformative. The domains of study are sufficiently large and overlapping that there seems to be little information gained from the graphic and the Sankey plot could be simply summarized by rates of competing success. 

      While we sincerely appreciate the feedback, we chose to retain these plots on domains and models to provide a sense of the broad spectrum of research topics contained in our TeamBCP awards. Further details on the awards can be derived from the award links provided in the text. Additionally, we retained the Sankey plots because these are a visual depiction of how awards transition from one mechanism to another, evolve in their funding sources, and advance in their research trajectories. The plot is an example of our continuity analysis which is only reported in the text and not visually shown for the remaining BCP programs.

    1. Author response:

      We thank the reviewers for their detailed and constructive comments on our manuscript entitled “Activity-Dependent Changes in Ion Channel Voltage-Dependence Influence the Activity Patterns Targeted by Neurons.” We appreciate the time and effort the reviewers invested in critiquing our work and are grateful for the opportunity to clarify and improve our manuscript.

      As noted by the reviewers, the main message of the manuscript is that the intrinsic properties and activity characteristics of targeted bursters depend on the timescale of half-(in)activation alterations in the homeostatic mechanism. However, the concerns of the reviewers reveal that the manuscript is organized in ways that detract from this message. Below we respond to the points the reviewers raise and close by outlining the changes that we will make to the manuscript as a result. Our goal will be to streamline the message of the paper while addressing the concerns of the reviewers.

      Response to Reviewer #1:

      Point 1: We interpret the reviewer’s question about “mechanism” to be: why do half-(in)activation alterations redirect degenerate bursters to different parameter regions? (A separate aspect of “mechanism,” namely how these alterations might be biologically implemented, is already addressed in the paper.)

      We speculate that Figure 3 illustrates this process. As conductance densities slowly evolve, rapid half-(in)activation changes cause the sensor variable (α) to jump abruptly as it searches for a voltage-dependence configuration that meets calcium targets (Figure 3A). The channel densities are slightly altered and this process continues again. Slowing the half-(in)activations alterations reduces these abrupt fluctuations (Figure 3B). Making the alterations infinitely slow effectively removes half-(in)activation changes altogether, leaving the system reliant solely on slower alterations in maximal conductances (Figure 3C). Because each timescale of half-(in)activation produces a different channel repertoire at each time step, the neuron follows distinct trajectories through the space of activity characteristics and intrinsic properties over the long term.

      Point 2: We appreciate the reviewer’s skepticism regarding our statistical approach with the “Group of 5” and “Group of 20.” These groups arose from historical aspects of our analysis and this analysis does not directly advance the main point—that changes in the timescale of channel voltage-dependence alterations impact the properties of bursters to which the homeostatic mechanism converges. Therefore, we plan to remove the references to the Group of 5 and focus on how the Group of 20 responds to variations in the timescale of voltage-dependent alterations.

      Point 3: Our paper claims that the half-(in)activation mechanism is subordinate to the maximal conductance mechanism. We agree with the reviewer that making this claim requires more care. The simulations we run are controls in the spirit described below.

      The reviewer notes that in our simulations, half-(in)activations are already near the range required for bursting, which forces maximal conductances to undergo larger changes and thus appear more critical. We however note that the opposite can also occur: if half-(in)activation values were already positioned in ranges required for bursting, an arrangement of small maximal conductances may potentially produce bursting. The latter might give the impression that maximal conductance alterations and half-(in)activation alterations are equally important. The simulations we ran are simply suggested this wasn’t true for these models.

      Points 4 - 6: In Point 4, the reviewer highlights model choices (e.g., constraints on maximal conductance and half-(in)activation, use of the L8 norm) are not clearly justified. In Point 5, the reviewer suggests that the paper provides excessive detail about other model choices. Point 6 appears to reiterate concerns about insufficient justification for some modeling decisions.

      Our intent was to acknowledge every caveat, which led us to include long section on Model Assumptions in the Discussion. However, as Point 5 notes, this makes the Discussion cumbersome. The Discussion should focus on remarks regarding the impact that timescale of half-(in)activation alterations have on the family of bursters targeted by the homeostatic mechanism. Consequently, we will relocate the extended discussion of model assumptions from the Discussion to the Methods section. This section already touches on how the constraints on half-(in)activation alterations compare to earlier versions of the model (noted in Point 6) and will be expanded to further explain our choice of the L8 norm (Point 4).

      Response to Reviewer #2:

      Weakness 1: The reviewer notes that the writing is “rather confusing.” This likely arises from the fact that we did not consistently emphasize the core message: the timescale of half-(in)activation alterations influences the intrinsic properties and activity characteristics of bursters targeted by the homeostatic mechanism. We will address this by reorganizing the manuscript to make that focus clearer, and we outline these planned revisions at the end of these responses.

      The reviewer specifically points out that the state-of-the-art is not clearly articulated. We will reorganize the Introduction to highlight this. Briefly, work on activity-dependent homeostasis has historically focused on changes in channel density. This is supported by experiment and has been modelled theoretically. In comparison, changes in channel voltage-dependence, while documented, are less explored due to the challenges of measuring them. In this work, we attempt to study the impact that alterations in channel voltage-dependence have on activity-dependent homeostasis. To do this, we extend existing computational models of activity-dependent homeostasis—models that have hitherto only altered channel density—by incorporating a mechanism that also adjusts channel voltage-dependence.

      Weakness 2: The Discussion highlights two potential implications of our findings—one for neuronal development and another for activity recovery following perturbations. However, they were outlined after the Model Assumptions section which, as Reviewer 1 points out, is quite detailed and cumbersome.

      Another aspect that may contribute to the challenge in interpreting our results may be our conceptual approach to neuronal excitability, which relies on a computational model of activity-dependent homeostasis that abstracts much of the underlying biochemistry. Our message is general: the timescale of half-(in)activation alterations influences the intrinsic properties and activity characteristics of bursters targeted by a homeostatic mechanism. As such, the implications are general. Their value lies in circumscribing a conceptual framework from which experimentalists may devise and test new hypotheses. We do not aim to predict or explain any specific phenomenon in this work. To address this concern however, we will expand our discussion of how these findings may guide experimental considerations, particularly regarding neuronal development and activity recovery during perturbations, to better illustrate the practical utility of our results.

      Response to Reviewer #3:

      Point 1: This reviewer suggests that our core message—namely, that the timescale of half-(in)activation alterations affects the intrinsic properties and activity patterns targeted by a homeostatic mechanism—should also apply during perturbations. We plan to address this by extending our analysis on the Group of 20 models. We will perturb activity by increasing extracellular potassium concentration and change the timescale of half-(in)activation alterations during the perturbation. This should underscore how the neuron’s stabilized activity pattern depends on this timescale, reinforcing our central message.

      Point 2: In this part of the Discussion, we noted that multiple half-activation shifts collectively shape the neuron’s global properties, and that averaging might obscure these effects. However, in light of the reviewers’ comments, we recognize that this observation alone does not directly advance the paper’s main message. To make it relevant, we would need to (1) identify correlations between intrinsic parameters (i.e., half-(in)activation and maximal conductance) and the resulting activity patterns, and (2) examine how these correlations shift under different timescales half-(in)activation alterations. Since we have not performed that analysis, we will revise this part of the Discussion to clarify its connection to the paper’s principal focus by noting that a deeper exploration of this notion using correlations will be the topic of future work.

      Conclusion: We outline updates we will make to the paper here.

      Introduction: In response to Reviewer 2, we will provide a clearer explanation of the state-of-the-art in activity-dependent homeostasis and highlight our specific contribution. We will emphasize that our conclusions, while generic, are relevant in experimental contexts.

      Results: We will reorganize this section to underscore the main point: the timescale of half-(in)activation alterations affects the intrinsic properties and activity characteristics of bursters in the homeostatic mechanism. Figures 1 will remain as it is. It shows assembly from random initial conditions and explain that for these simulations we must always consider the half-(in)activation mechanism with a mechanism that alters maximal conductances as the half-(in)activation alterations alone cannot form bursters. Figure 2 will remain as is, but we will remove any discussion of the “Group of 5,” addressing Reviewer 1’s feedback. What is presently Figure 4 will then follow, illustrating how timescale differences shape the properties of 20 degenerate solutions. We then present Figure 3 to address Reviewer 1’s critique on mechanism. Here we will explain how different timescales of half-(in)activation alteration cause the homeostatic mechanism to update channel properties differently, leading to distinct trajectories through the space of intrinsic properties and activity characteristics (as described in the response of Point 1 of Reviewer 1’s feedback). Finally, following Point 1 of Reviewer 3, we will add a new figure highlighting the role of half-(in)activation timescale during perturbation.

      Discussion: To streamline the Discussion, the “Model Assumptions” section will be moved to Methods. In line with Point 2 of Reviewer 3, we will clarify how the concept of "small half-(in)activation shifts lead to global changes in neuronal properties" aligns with our core message. Additionally, following Reviewer 2’s comments, we will expand our discussion of implications by including how experimentalists might use our findings to inform studies on perturbations and development.

      Methods: We will expand “Model Assumptions” to explain in more detail why we chose the L8 norm.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Overall, the conclusions of the paper are mostly supported by the data but may be overstated in some cases, and some details are also missing or not easily recognizable within the figures. The provision of additional information and analyses would be valuable to the reader and may even benefit the authors' interpretation of the data.

      We thank the reviewer for the thoughtful and constructive feedback. We are pleased that the reviewer found the overall conclusions of our paper to be well supported by the data, and we appreciate the suggestions for improving figure clarity and interpretive accuracy. Below we address each point raised:

      The conclusion that DREADD expression gradually decreases after 1.5-2 years is only based on a select few of the subjects assessed; in Figure 2, it appears that only 3 hM4Di cases and 2 hM3Dq cases are assessed after the 2-year timepoint. The observed decline appears consistent within the hM4Di cases, but not for the hM3Dq cases (see Figure 2C: the AAV2.1-hSyn-hM3Dq-IRES-AcGFP line is increasing after 2 years.)

      We agree that our interpretation should be stated more cautiously, given the limited number of cases assessed beyond the two-year timepoint. In the revised manuscript, we will clarify in both the Results and Discussion that the observed decline is based on a subset of animals. We will also state that while a consistent decline was observed in hM4Di-expressing monkeys, the trajectory for hM3Dq expression was more variable—with at least one case showing increased in signal beyond two years.

      Given that individual differences may affect expression levels, it would be helpful to see additional labels on the graphs (or in the legends) indicating which subject and which region are being represented for each line and/or data point in Figure 1C, 2B, 2C, 5A, and 5B. Alternatively, for Figures 5A and B, an accompanying table listing this information would be sufficient.

      We thank the reviewer for these helpful suggestions. In response, we will revise the relevant figures as noted in the “Recommendations for the authors”, including simplifying visual encodings and improving labeling. We will also provide a supplementary table listing the animal ID and brain regions for each data point shown in the graphs.

      While the authors comment on several factors that may influence peak expression levels, including serotype, promoter, titer, tag, and DREADD type, they do not comment on the volume of injection. The range in volume used per region in this study is between 2 and 54 microliters, with larger volumes typically (but not always) being used for cortical regions like the OFC and dlPFC, and smaller volumes for subcortical regions like the amygdala and putamen. This may weaken the claim that there is no significant relationship between peak expression level and brain region, as volume may be considered a confounding variable. Additionally, because of the possibility that larger volumes of viral vectors may be more likely to induce an immune response, which the authors suggest as a potential influence on transgene expression, not including volume as a factor of interest seems to be an oversight.

      We thank the reviewer for raising this important issue. We agree that injection volume is a potentially confounding variable. In response, we will conduct an exploratory analysis including volume as an additional factor. We will also expand the Discussion to highlight the need for future systematic evaluation of injection volume, especially in relation to immune responses or transduction efficiency in different brain regions.

      The authors conclude that vectors encoding co-expressed protein tags (such as HA) led to reduced peak expression levels, relative to vectors with an IRES-GFP sequence or with no such element at all. While interesting, this finding does not necessarily seem relevant for the efficacy of long-term expression and function, given that the authors show in Figures 1 and 2 that peak expression (as indicated by a change in binding potential relative to non-displaced radioligand, or ΔBPND) appears to taper off in all or most of the constructs assessed. The authors should take care to point out that the decline in peak expression should not be confused with the decline in longitudinal expression, as this is not clear in the discussion; i.e. the subheading, "Factors influencing DREADD expression," might be better written as, "Factors influencing peak DREADD expression," and subsequent wording in this section should specify that these particular data concern peak expression only.

      We appreciate this important clarification. In response, we will revise the title to “Factors influencing peak DREADD expression levels”, and we will specify that our analysis focused on peak ΔBP<sub>ND</sub> values around 60 days post-injection. We will also explicitly distinguish these findings from the later-stage changes in expression seen in the longitudinal PET data in both the Results and Discussion sections.

      Reviewer #2 (Public review):

      Weaknesses

      This study is a meta-analysis of several experiments performed in one lab. The good side is that it combined a large amount of data that might not have been published individually; the downside is that all things were not planned and equated, creating a lot of unexplained variances in the data. This was yet judiciously used by the authors, but one might think that planned and organized multicentric experiments would provide more information and help test more parameters, including some related to inter-individual variability, and particular genetic constructs.

      We thank the reviewer for bringing this important point to our attention. We fully agree that the retrospective nature of our dataset, compiled from multiple studies conducted within a single laboratory, introduces variability due to differences in constructs, injection sites, and timelines. While this reflects the real-world constraints of long-term NHP research, we acknowledge the need for more standardized approaches. We will add a statement in the revised Discussion emphasizing that future multicenter and harmonized studies would be valuable for systematically examining specific parameters and inter-individual variability.

      Reviewer #3 (Public review):

      Minor weaknesses are related to a few instances of suboptimal phrasing, and some room for improvement in time course visualization and quantification. These would be easily addressed in a revision.

      These findings will undoubtedly have a very significant impact on the rapidly growing but still highly challenging field of primate chemogenetic manipulations. As such, the work represents an invaluable resource for the community.

      We thank the reviewer for the positive assessment of our manuscript and for the constructive suggestions noted in the “Recommendations for the authors”. In response, we will carefully review and revise the manuscript to improve visualization and quantification.

  3. Mar 2025
    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      Evading predation is of utmost importance for most animals and camouflage is one of the predominant mechanisms. Wu et al. set out to test the hypothesis of a unique camouflage system in leafhoppers. These animals coat themselves with brochosomes, which are spherical nanostructures that are produced in the Malpighian tubules and are distributed on the cuticle after eclosion. Based on previous findings on the reflectivity properties of brochosomes, the authors provide very good evidence that these nanostructures indeed reduce the reflectivity of the animals thereby reducing predation by jumping spiders. Further, they identify four proteins, which are essential for the proper development and function of brochosomes. In RNAi experiments, the regular brochosome structure is lost, the reflectivity reduced and the respective animals are prone to increased predation. Finally, the authors provide some phylogenetic sequence analyses and speculate about the evolution of these essential genes.

      Strengths:

      The study is very comprehensive including careful optical measurements, EM and TM analysis of the nanoparticles and their production line in the malphigian tubules, in vivo predation tests, and knock-down experiments to identify essential proteins. Indeed, the results are very convincingly in line with the starting hypothesis such that the study robustly assigns a new biological function to the brochosome coating system.

      A key strength of the study is that the biological relevance of the brochosome coating is convincingly shown by an in vivo predation test using a known predator from the same habitat.

      Another major step forward is an RNAi screen, which identified four proteins, which are essential for the brochosome structure (BSMs). After respective RNAi knock-downs, the brochosomes show curious malformations that are interesting in terms of the self-assembly of these nanostructures. The optical and in vivo predation tests provide excellent support for the model that the RNAi knock-down leads to a change of brochosomes structure, which reduces reflectivity, which in turn leads to a decrease of the antipredatory effect.

      Thank you very much for your positive feedback and insightful comments on our manuscript. We are delighted that you acknowledge the efforts we have made in studying the components and functions of Brochosomal proteins. We have carefully considered your suggestions and have thoroughly revised the manuscript to address the shortcomings identified in our original submission. We hope that the revised version meets with your approval. Below, please find our detailed point-by-point responses.

      Weaknesses:

      The reduction of reflectivity by aberrant brochosomes or after ageing is only around 10%. This may seem little to have an effect in real life. On the other hand, the in vivo predation tests confirm an influence. Hence, this is not a real weakness of the study - just a note to reconsider the wording for describing the degree of reflectivity.

      Thank you for your valuable suggestions. Based on your recommendations, we have revised the manuscript accordingly. Although the absolute reduction in light reflection due to Brochosomal coverage is approximately 10%, the relative decrease in light reflection on the leafhopper's surface is nearly 30%. Specifically, in the ultraviolet region, the reflection is reduced from about 30% to 20%, and in the visible light region, it is reduced from 20% to 10%. For detailed revisions, please refer to lines 151-156 of the revised manuscript.

      The single gene knockdowns seemed to lead to a very low penetrance of malformed brochosomes (Figure Supplement 3). Judging from the overview slides, less than 1% of brochosomes may have been affected. A quantification of regular versus abnormal particles in both, wildtype and RNAi treatments would have helped to exclude that the shown aberrant brochosomes did not just reflect a putative level of "normal" background defects. Of note, the quadruple knock-down of all BSMs seemed to lead to a high penetrance (Figure 4), which was already reflected in the microtubule production line. While the data shown are convincing, a quantification might strengthen the argument.

      While the RNAi effects seemed to be very specific to brochosomes and therefore very likely specific, an off-target control for RNAi was still missing. Finding the same/similar phenotype with a non-overlapping dsRNA fragment in one off-target experiment is usually considered required and sufficient. Further, the details of the targeted sequence will help future workers on the topic.

      Thank you for your valuable suggestions. Based on your recommendations, we have synthesized dsRNA targeting two non-overlapping regions of the coding sequences for four Brochosomal structural protein genes. These dsRNAs were injected individually and in combination for each gene. Our RNAi experiments for each BSM gene demonstrated that both individual and combined injections significantly suppressed the expression of the target genes, with the combined injection yielding slightly better silencing efficiency. Statistical analysis of the SEM observations revealed that the combined injection of dsRNAs targeting two non-overlapping regions led to a 60-70% reduction in the surface area coverage of Brochosomes. Additionally, approximately 20% of the remaining Brochosomes exhibited significant morphological changes. For detailed revisions, please refer to lines 199-211 of the revised manuscript, as well as Figures 3A and 3C, and Supplementary Figures 4 and 5.

      The main weakness in the current manuscript may be the phylogenetic analysis and the model of how the genes evolved. Several aspects were not clearly or consistently stated such that I felt unsure about what the authors actually think. For instance: Are all the 4 BSMs related to each other or only BSM2 and 3? If so, not only BSM2 and 3 would be called "paralogs" but also the other BSMs. If they were all related, then a phylogenetic tree including all BSMs should be shown to visualize the relatedness (including the putative ancestral gene if that is the model of the authors). Actually, I was not sure about how the authors think about the emergence of the BSMs. Are they real orphan genes (i.e. not present outside the respective clade) or was there an ancestral gene that was duplicated and diverged to form the BSMs? Where in the phylogeny does the first of the BSMs or ancestral proteins emerge (is the gene found in Clastoptera arizonana the most ancestral one?)? Maybe, the evolution of the BSMs would have to be discussed individually for each gene as they show somewhat different patterns of emergence and loss (BSM4 present in all species, the others with different degrees of phylogenetic restriction).

      Thank you very much for your constructive feedback on our phylogenetic analysis and the modeling of gene evolution. We fully agree with your insights and acknowledge that the evolutionary analysis of BSM genes remains somewhat ambiguous. This ambiguity is primarily due to the limited research on the precise structural protein composition of Brochosomes. While proteomics studies have analyzed and discussed the structural proteins of Brochosomes, the accurate composition of these proteins is still poorly understood. In this study, we identified four BSM proteins, but given the intricate structure of Brochosomes as proteinaceous spheres, we believe there may be additional BSM genes that have not yet been identified. Moreover, despite the presence of over ten thousand species within the Cicadomorpha, only three species have genome sequences available, and fewer than a hundred species have transcriptome sequencing data. The scarcity of research on Brochosomes, as well as the limited availability of genomic and transcriptomic data, poses significant challenges for our phylogenetic analysis and understanding of BSM gene evolution.

      Based on your suggestions, we have revised the manuscript accordingly. Specifically, we have updated Figure 5C by including ten additional species from Cereopoidea, Cicadoidea, and Fulgoroidea to better illustrate that BSM genes are true orphan genes. We have also added a phylogenetic tree of BSM genes within Cicadidae in Supplementary Figure 3. Additionally, we have expanded the discussion of BSM gene evolution in the manuscript (lines 503-556). For detailed revisions, please refer to Figure 5C, Supplementary Figure 3, and lines 507-585 of the revised manuscript.

      Related to these questions I remained unsure about some details in Figure 5. On what kind of analysis is the phylogeny based? Why are some species not colored, although they are located on the same branch as colored ones? What is the measure for homology values - % identity/similarity? The homology labels for Nephotetix cincticeps and N. virescens seem to be flipped: the latter is displayed with 100% identity for all genes with all proteins while the former should actually show this. As a consequence of these uncertainties, I could not fully follow the respective discussion and model for gene evolution.

      Thank you very much for your insightful comments and suggestions. We have carefully considered your feedback and have thoroughly revised our manuscript accordingly. Specifically, we have enhanced the description of the phylogenetic analysis process to provide greater clarity and transparency, with the detailed methods now included in lines 789-798. Regarding Figure 5C, we appreciate your attention to the coloring scheme. We would like to clarify that the family Cicadellidae comprises 25 subfamilies, many of which are represented by only one species in our figure. To ensure clarity and meaningful representation, we have chosen to color only those subfamilies with more than three species, thereby avoiding visual clutter and emphasizing the most relevant taxonomic groups. Additionally, we have corrected the inverted homology labels for Nephotetix cincticeps and Nephotetix virescens to ensure the accuracy and consistency of our data presentation.

      Conclusion:

      The authors successfully tested their hypothesis in a multidisciplinary approach and convincingly assigned a new biological function to the brochosomes system. The results fully support their claims - only the quantification of the penetrance in the RNAi experiments would be helpful to strengthen the point. The author's analysis of the evolution of BSM genes remained a bit vague and I remained unsure about their respective conclusions.

      The work is a very interesting study case of the evolutionary emergence of a new system to evade predators. Based on this study, the function of the BSM genes could now be studied in other species to provide insights into putative ancestral functions. Further, studying the self-assembly of such highly regular complex nano-structures will be strongly fostered by the identification of the four key structural genes.

      Reviewer #1 (Recommendations for the authors):

      Main manuscript:

      Please consider the annotated pdf with suggestions for wording and comments at the authors' discretion:

      Thank you very much for your detailed suggestions and comments provided in the annotated PDF. We have carefully reviewed each of your points and have revised the manuscript accordingly. All changes have been highlighted in red text for your convenience. The revised manuscript with tracked changes is available for your review. We believe these revisions have improved the clarity and quality of our manuscript. Thank you again for your valuable feedback.

      Supplementary Figure 2 C:

      Y-axes:

      - label: "surface coverage in %"

      - there are different scale values for the different days (e.g. 80-105 for day 5 and 0-80 at day 25). As a comparison between days is interesting, it would help to have the same scale values for all. That would show the decrease more intuitively.

      Thank you very much for your suggestion regarding the Y-axis in Supplementary Figure 2C. We agree that using a consistent scale across all time points is essential for clear and intuitive comparison. In the revised manuscript, we have standardized the Y-axis scale for Supplementary Figure 2C to a uniform range of 0-100% for all days. This change allows for a more straightforward visualization of the decreasing trend in surface coverage over time.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, the authors investigate the optical properties of brochosomes produced by leafhoppers. They hypothesize that brochosomes reduce light reflection on the leafhopper's body surface, aiding in predator avoidance. Their hypothesis is supported by experiments involving jumping spiders. Additionally, the authors employ a variety of techniques including micro-UV-Vis spectroscopy, electron microscopy, transcriptome and proteome analysis, and bioassays. This study is highly interesting, and the experimental data is well-organized and logically presented.

      Strengths:

      The use of brochosomes as a camouflage coating has been hypothesized since 1936 (R.B. Swain, Entomol. News 47, 264-266, 1936) with evidence demonstrated by similar synthetic brochosome systems in a number of recent studies (S. Yang, et al. Nat. Commun. 8:1285, 2017; L. Wang, et al., PNAS. 121: e2312700121, 2024). However, direct biological evidence or relevant field studies have been lacking to directly support the hypothesis that brochosomes are used for camouflage. This work provides the first biological evidence demonstrating that natural brochosomes can be used as a camouflage coating to reduce the leafhoppers' observability of their predators. The design of the experiments is novel.

      We are extremely grateful for your positive feedback and insightful comments on our manuscript. We are delighted that you have recognized the efforts we have put into our research on how brochosomes serve as a camouflage coating to reduce the detectability of leafhoppers to their predators. We have carefully considered your suggestions and have thoroughly revised the manuscript to address the shortcomings of the original version. We hope that the revised version meets with your approval. Below, please find our detailed point-by-point responses.

      Weaknesses:

      (1) The observation that brochosome coatings become sparse after 25 days in both male and female leafhoppers, resulting in increased predation by jumping spiders, is intriguing. However, since leafhoppers consistently secrete and groom brochosomes, it would be beneficial to explore why brochosomes become significantly less dense after 25 days.

      Thank you very much for your valuable suggestions. We appreciate your interest in the reduction of brochosomal density on the surface of leafhoppers after 25 days.We believe that the primary reason for the decreased density of brochosomes on the leafhopper surface after 25 days is the reduced synthesis and secretion of brochosomes. The Malpighian tubules are the main sites for brochosome synthesis. As shown in Figure 2D and Supplementary Figure 1, the thick glandular segments of the Malpighian tubules in both male and female leafhoppers begin to atrophy 15 days after reaching adulthood. This indicates a gradual decline in brochosome synthesis and secretion after day 15 of adulthood. Following your suggestion, we have revised the discussion section of the manuscript to elaborate on this observation. The detailed changes can be found in lines 474-491 of the revised manuscript.

      (2) The authors demonstrate that brochosome coatings reduce UV (specular) reflection compared to surfaces without brochosomes, which can be attributed to the rough geometry of brochosomes as discussed in the literature. However, it would be valuable to investigate whether the proteins forming the brochosomes are also UV absorbing.

      Thank you very much for your valuable suggestions. Following your advice, we have successfully expressed four BSM genes in a prokaryotic system, purified the corresponding proteins, and applied them to quartz glass surfaces. We then measured the light reflectance of the quartz glass surfaces coated with these purified proteins. The results showed that the purified BSM proteins did not exhibit better antireflective properties compared to the control GST protein. For more details, please refer to Supplementary Figure 8 in the revised manuscript.  We believe that the excellent antireflective properties of brochosomes are fundamentally due to their unique geometric shapes. The hollow pores within the brochosomes, with diameters of approximately 100 nm, are significantly smaller than most wavelengths in the visible spectrum. When light passes through these tiny pores, diffraction occurs, while light passing through the ridges of the brochosomes causes scattering. The interference between the diffracted and scattered light from these pores and ridges results in the observed extinction characteristics of brochosomes. We have incorporated these insights into the discussion section of the revised manuscript (lines 416-425 and lines 432-442 of the revised manuscript).

      (3) The experiments with jumping spiders show that brochosomes help leafhoppers avoid predators to some extent. It would be beneficial for the authors to elaborate on the exact mechanism behind this camouflage effect. Specifically, why does reduced UV reflection aid in predator avoidance? If predators are sensitive to UV light, how does the reduced UV reflectance specifically contribute to evasion?

      Thank you very much for your valuable suggestions. Based on your advice, we have included a detailed discussion on how reducing ultraviolet (UV) reflection can help insects avoid predation. The revised content can be found in lines 445-460 of the revised manuscript.

      “UV light serves as a crucial visual cue for various insect predators, enhancing foraging, navigation, mating behavior, and prey identification (Cronin & Bok, 2016; Morehouse et al., 2017; Silberglied, 1979). Predators such as birds, reptiles, and predatory arthropods often rely on UV vision to detect prey (Church et al., 1998; Li & Lim, 2005; Zou et al., 2011). However, UV reflectance from insect cuticles can disrupt camouflage, increasing the risk of detection and predation, as natural backgrounds like leaves, bark, and soil typically reflect minimal UV light (Endler, 1997; Li & Lim, 2005; Tovee, 1995). To mitigate this risk, insects often possess anti-reflective cuticular structures that reduce UV and broad-spectrum light reflectance. This strategy is widespread among insects, including cicadas, dragonflies, and butterflies, and has been shown to decrease predator detection rates (Hooper et al., 2006; Siddique et al., 2015; Zhang et al., 2006). For example, the compound eyes of moths feature hexagonal protuberances that reduce UV reflectance, aiding nocturnal concealment (Blagodatski et al., 2015; Stavenga et al., 2005). In butterflies, UV reflectance from eyespots on wings can attract predators, but reducing UV reflectance or eyespot size can lower predation risk and enhance camouflage (Chan et al., 2019; Lyytinen et al., 2004). Hence, the reflection of ultraviolet light from the insect cuticle surface increases the risk of predation by disrupting camouflage (Tovee, 1995)”

      (4) An important reference regarding the moth-eye effect is missing. Please consider including the following paper: Clapham, P. B., and M. C. Hutley. "Reduction of lens reflection by the 'Moth Eye' principle." Nature 244: 281-282 (1973).

      Thank you very much for pointing out the omission of the important reference on the “moth eye” effect. We sincerely apologize for the oversight. Based on your suggestion, we have now included the seminal paper by Clapham and Hutley (1973) in the revised manuscript. The reference has been added to both the Introduction and Discussion sections to provide a more comprehensive context for our discussion on anti-reflective structures in insects.

      (5) The introduction should be revised to accurately reflect the related contributions in literature. Specifically, the novelty of this work lies in the demonstration of the camouflage effect of brochosomes using jumping spiders, which is verified for the first time in leafhoppers. However, the proposed use of brochosome powder for camouflage was first described by R.B. Swain (R.B. Swain, Notes on the oviposition and life history of the leafhopper Oncometopta undata Fabr. (Homoptera: Cicadellidae), Entomol. News. 47: 264-266 (1936)). Recently, the antireflective and potential camouflage functions of brochosomes were further studied by Yang et al. based on synthetic brochosomes and simulated vision techniques (S. Yang, et al. "Ultra-antireflective synthetic brochosomes." Nature Communications 8: 1285 (2017)). Later, Lei et al. demonstrated the antireflective properties of natural brochosomes in 2020 (C.-W. Lei, et al., "Leafhopper wing-inspired broadband omnidirectional antireflective embroidered ball-like structure arrays using a nonlithography-based methodology." Langmuir 36: 5296-5302 (2020)). Very recently, Wang et al. successfully fabricated synthetic brochosomes with precise geometry akin to those natural ones, and further elucidated the antireflective mechanisms based on the brochosome geometry and their role in reducing the observability of leafhoppers to their predators (L. Wang et al. "Geometric design of antireflective leafhopper brochosomes." Proceedings of the National Academy of Sciences 121: e2312700121 (2024)).

      Thank you very much for your valuable suggestions regarding the revision of the introduction to accurately reflect the relevant contributions in the literature. Based on your feedback, we have thoroughly revised the introduction and added the suggested references to provide a comprehensive context for our study. The details of these revisions can be found in lines 84-94 of the revised manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) In Figure 2E, the data for Male-5d appears to be missing. Please verify and ensure all relevant data is included.

      Thank you for pointing out the issue regarding the data presentation in Figure 2E.We apologize for any confusion caused by the overlapping data points and the less conspicuous color choice for Male-5d. We have carefully reviewed the data and confirmed that all relevant data points, including Male-5d, are indeed present in the dataset. In the revised manuscript, we have adjusted the color scheme for Male-5d and Female-5d in Figure 2E to ensure that both curves are clearly distinguishable, even in areas where they overlap. This adjustment should facilitate a more accurate and convenient observation of the data trends. We appreciate your attention to detail, and we believe these revisions have improved the clarity and readability of the figure.

      (2) In Figure 6, please clarify the reflectance data in the inset. Clearly explain what the blue and light blue curves represent.

      Thank you for your suggestion regarding Figure 6.We have revised the figure to improve clarity. The light blue curve now represents the reflectance measurements of leafhoppers with higher brochosome coverage, while the dark blue curve corresponds to those with lower coverage. These changes, along with updated labels in the figure legend, ensure that the data are clearly distinguishable and easy to interpret. We appreciate your feedback and believe these revisions have enhanced the overall clarity of the figure.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Weaknesses (clarifications needed):

      (1) Experimental Design:

      The study does not mention whether the authors examined sex differences or any measures of attractiveness or hierarchy among participants (e.g., students vs. teachers). Including these variables could provide a more nuanced understanding of group dynamics.

      We are grateful to the reviewer for pointing out this valuable question. We have clarified that future studies should include sex differences or any measures of attractiveness or hierarchy among participants (e.g., students vs. teachers) (p. 27).

      “Finally, future research should investigate additional variables, including sex differences and measures of attractiveness or hierarchy among participants, such as students versus teachers.”  p. 27

      (2) fNIRS Data Acquisition:

      The authors' approach to addressing individual differences in anatomy is lacking in detail. Understanding how they identified the optimal channels for synchrony between participants would be beneficial. Was this done by averaging to find the location with the highest coherence?

      We apologize for missing some details here. We have included the following information in the fNIRS data acquisition and fNIRS data analyses to clarify the details (pp. 8 and 12).

      We employed the one-sample t-test method to assess the GNS disparity between the baseline and task sessions, identifying particular channels of interest. This analysis did not ascertain the maximum coherence level, but rather pinpointed the channel exhibiting significant divergence between the two sessions, which we designated as pertinent to the group decision-making task. Furthermore, we selected the PFC and left TPJ as our reference brain regions, guided by existing literature.

      “Two optode probe sets were used to cover each participant's prefrontal and left TPJ regions (Figure S1). The DLPFC plays a crucial role in group decision-making processes, with findings suggesting that individuals exhibiting reduced prefrontal activity were more prone to out-group exclusion and demonstrated stronger in-group preferences (Goupil et al., 2021; Jankovic, 2014; Yang et al., 2020). Similarly, the left TPJ has been previously reported to be associated with decision-making and information exchange (Freitas et al., 2019; Tindale et al., 2019).”  p. 8

      “Time-averaged GNS (also averaged across channels in each group) was compared between the baseline session (i.e., the resting phase) and the task session (from reading information to making decisions) using a series of one-sample t-tests. Here, p-values were thresholded by controlling for FDR (p < 0.05; Benjamini & Hochberg, 1995). When determining the frequency band of interest, the time-averaged GNS was also averaged across channels. After that, we analyzed the time-averaged GNS of each channel. Then, channels showing significant GNS were regarded as regions of interest and included in subsequent analyses.” p. 12

      (3) Behavioral Analysis:

      For group identification, the analysis currently uses a dichotomous approach. Introducing a regression model to capture the degree of identification could offer more granular insights into how varying levels of group identification affect collective behavior and performance.

      Thank you for your suggestion. As suggested, we have conducted the regression model to examine how varying levels of group identification affect collective performance, with the score of group identification being the independent variable and collective performance as the dependent variable (pp.9 and 15).

      “Moreover, we employed a regression model to examine how varying levels of group identification affect collective performance, using group identification scores as the independent variable and collective performance as the dependent variable.”  p.9

      “The results from the regression model highlighted a significant association between the degree of group identification and collective performance (β \= 0.45, t = 4.56, p \= 0.019).”  p.15

      (4) Single Brain Activation Analysis:

      The application of the General Linear Model (GLM) is unclear, particularly given the long block durations and absence of multiple trials. Further explanation is needed on how the GLM was implemented under these conditions.

      Thank you for your suggestion, we have added more details in this section (p.11).

      “In the GLM model analysis, HbO was the dependent variable, and the regression amount was set for different task stages (a. Reading information, b. Sharing private information, c. Discussing information, d. Decision). After that, we convolved the regression factor with the Hemodynamic Response Function (HRF) and obtained the brain activation β value of each participant in each channel at different task stages through regression analysis.’  p.11

      (5) Within-group neural Synchrony (GNS) Calculation:

      The method for calculating GNS could be improved by using mutual information instead of pairwise summation, as suggested by Xie et al. (2020) in their study on fMRI triadic hyperscanning. Additionally, the explanation of GNS calculation is inconsistent. At one point, it is mentioned that GNS was averaged across time and channels, while elsewhere, it is stated that channels with the highest GNS were selected. Clarification on this point is essential.

      We appreciate the reviewer for highlighting this inquiry. We utilized a conventional GNS calculation approach, as detailed in Line 296 of the manuscript, where the GNS was determined in pairs after the WTC computation, and then averaged. Further details regarding the second question have been provided in the article (p.12).

      (6) Placement of fNIRS Probes:

      The probes were only placed in the frontal regions, despite literature suggesting that the superior temporal sulcus (STS) and temporoparietal junction (TPJ) regions are crucial for triadic team performance. A justification for this choice or inclusion of these regions in future studies would be beneficial.

      The original manuscript clearly stated the use of two optode probe sets to encompass the prefrontal and left TPJ regions of each participant (see Figure S1, p. 8).

      (7) Interpretation of fNIRS Data:

      Given that fNIRS signals are slow, similar to BOLD signals in fMRI, the interpretation of Figure 6 raises concerns. It suggests that it takes several minutes (on the order of 4-5 minutes) for people to collaborate, which seems implausible. More context or re-evaluation of this interpretation is needed.

      The question you have pointed out is very pertinent, and we have added more explanation for this result (pp. 25-26).

      As previous studies have shown, the BOLD signal collected by fNIRS is slowly increasing compared to neuronal activity, which means that it has hysteresis (Turner et al., 1998). In social interactions such as group decision-making, the time of neural synchronization is delayed because people need to spend time increasing the number of dialogues to improve collaboration efficiency and form the same preference (Zhang et al., 2019). For example, the study of group consensus found that participants would show significant neural alignment after completing a period of dialogue (Sievers et al., 2024). In the task of cooperation, with the improvement of tacit understanding between two participants, the higher degree of neural synchronization (Cui et al., 2012). Therefore, the generation of neural synchronization depends on the interaction over a period of time. Therefore, we believe that the 4-5 minutes of collaboration time shown in Figure 6 may be related to establishing consensus and the same preference of team members, which is reflected in the dynamic time change of neural synchronization.

      Moreover, previous studies on neural synchronization during social interaction and group decision-making revealed that substantial neural synchronization occurred around 50-55 seconds into a teaching task involving prior knowledge (Liu et al., 2019) and persisted approximately 6 minutes into the discussion period (Xie et al., 2023). These results collectively validate the suitability of utilizing fNIRS signal response time in our study (pp. 25-26).

      “Our study also has demonstrated significant increases in single-brain activation, DLPFC-OFC functional connectivity, and GNS at 7, 12, and 17 minutes, respectively, following task initiation. The significant increase in these neural activities together constructs the two-in-one neural model that explains how group identification influences the collective performance we proposed. As previous studies have shown, the BOLD signal collected by fNIRS is slowly increasing compared to neuronal activity, which means that it has hysteresis (Turner et al., 1998). In social interactions such as group decision-making, the time of neural synchronization is delayed because people need to spend time increasing the number of dialogues to improve collaboration efficiency and form the same preference (Zhang et al., 2019). For example, participants would exhibit significant neural alignment, but only after they had completed a period of dialogue (Sievers et al., 2024). In the task of cooperation, with the improvement of cooperation efficiency between two participants, the higher degree of neural synchronization (Cui et al., 2012). Therefore, the generation of neural synchronization depends on the interaction over a period of time, which can affect the estimation of collaboration time. Prior research has shown that when the teaching task with prior knowledge began 50-55 seconds, significant neural synchronization could be generated between teacher and students, which meant that students and teacher achieved the same goal of learning knowledge (Liu et al., 2019). Moreover, a noteworthy increase in GNS was observed approximately 6 minutes into the group discussion period for better discussing and solving the problem (Xie et al., 2023). These findings are similar to ours. Therefore, the time points we found could reflect the dynamic time change of the neural process of team collaboration.’ pp.25-26

      Reviewer #2 (Public review):

      Weaknesses:

      The authors need to clearly articulate their hypothesis regarding why neural synchronization occurs during social interaction. For example, in line 284, it is stated that "It is plausible that neural synchronization is closely associated with group identification and collective performance...", but this is far from self-evident. Neural synchronization can occur even when people are merely watching a movie (Hasson et al., 2004), and movie-watchers are not engaged in collective behavior. There is no direct link between the IBS and collective behavior. The authors should explain why they believe inter-brain synchronization occurs in interactive settings and why they think it is related to collective behavior/performance.

      Thank you for bringing these points to our attention, we have clarified the relationship between neural synchronization and collective behavior in the Introduction section. (p.4). Moreover, in order to investigate whether neural synchronization stems from a common task or environment, we pseudo-randomized all pairs of subjects and created a null distribution consisting of 1,000 pseudo-groups, as described in Lines 311-315. This approach enabled us to eliminate neural synchronization resulting from factors other than social interaction, allowing us to identify neural patterns associated with collective performance (p.12).

      “Moreover, Ni et al. (2024) indicated that neural synchronization was linked to the strength of social-emotional communication and connections between individuals. An increase in neural synchronization has also been shown to predict the coordination and cooperation abilities of group members (Lu et al., 2023). Therefore, we hypothesize that neural synchronization may be related to group performance.” p.4

      “After that, the nonparametric permutation test was conducted on the observed interaction effects on GNS of the real group against the 1,000 permutation samples. By pseudo-randomizing the data of all participants, a null distribution of 1000 pseudo-groups was generated (e.g., time series from member 1 in group 1 were grouped with member 2 in group 2 & member 3 in group 3). The GNS of 1,000 reshuffled pseudo-groups was computed, and the GNS of the real groups was assessed by comparing it with the values generated by 1000 reshuffled pseudo-groups.” p.12

      The authors state that "GNS in the OFC was a reliable neuromarker, indicating the influence of group identification on collective performance," but this claim is too strong. Please refer to Figure 4B. Do the authors really believe that collective performance can be predicted given the correlation with the large variance shown? There is a significant discrepancy between observing a correlation between two variables and asserting that one variable is a predictive biomarker for the other.

      Thank you for your suggestion, we have revised the relevant statement (p.18).

      “Through correlation and regression model analysis, we found that in group decision-making, the increase in group identity would affect group performance by improving GNS in the OFC brain region.”  p.18

      Why are the individual answers being analyzed as collective performance (See, L-184)? Although these are performances that emerge after the group discussion, they seem to be individual performances rather than collective ones. Typically, wouldn't the result of a consensus be considered a collective performance? The authors should clarify why the individual's answer is being treated as the measure of collective performance.

      We appreciate the insightful comment provided by the reviewer. The decision to utilize individual responses as a metric of overall performance is based on several key considerations. Previous studies on various hidden profile tasks have utilized averaged individual scores to represent collective performance (e.g., Stasser et al., 1995; Wittenbaum et al., 1996; Brockner et al., 2022). Secondly, while consensus outcomes are typically regarded as collective expressions, we argue that in the context of this study, individual responses are not independent entities but rather extensions of the group decision-making process. The collective deliberation process significantly influenced individual thinking and decision-making in this study. Through group discussions, members shared perspectives, adjusted their stances, and formulated their responses based on collective insights. The responses provided by participants in this study were molded by the dynamics of group conversations, serving as an indirect measure of group performance and potentially indicating the efficacy of collective deliberations.

      Performing SPM-based mapping followed by conducting a t-test on the channels within statistically significant regions constitutes double dipping, which is not an acceptable method (Kriegeskorte et al., 2011). This issue is evident in, for example, Figures 3A and 4A.

      Please refer to the following source: https://www.nature.com/articles/nn.2303

      We have carefully reviewed the articles provided by the reviewer, and we acknowledge the concerns regarding selective analysis and double dipping in our statistical approach. To address this, we believe it is important to clarify this issue further in the Discussion section (pp.26-27).

      Our study introduces a novel perspective while utilizing conventional fNIRS-based hyperscanning analyses (Liu et al., 2019; Pärnamets et al., 2020; Reinero et al., 2021; Számadó et al., 2021; Solansky, 2011), methods that are widely endorsed within the field. In our analysis, significant channels were first identified using a one-sample t-test, followed by additional analyses including ANOVA, independent samples t-tests, and other procedures. We would like to emphasize that the statistical assumptions underlying the one-sample t-test and paired-sample t-test in our study maintain a level of independence. Moreover, to further mitigate concerns about the potential for double dipping, we employed permutation testing to validate the robustness of our results and ensure that our findings are not influenced by biases inherent in the selection of significant regions.

      We recognize the importance of rigorous statistical practices and are committed to upholding the highest standards of analysis. As such, we have revisited our methodology and included a more detailed explanation of the steps taken to avoid double dipping and ensure the integrity of our analyses in the revised manuscript.

      “Although our study has found a new perspective, the analysis method still refers to and uses the traditional fNIR-based hyperscanning analyses (Liu et al., 2019; P¨arnamets et al., 2020; Reinero et al., 2021; Számadó et al., 2021; Solansky, 2011), which is generally accepted by the majority of fNIR-based hyperscanning researchers. For example, we would first identify significant channels through a one-sample t-test and then conduct further analyses, such as ANOVA or independent samples t-tests. Selective analysis is a powerful tool and is perfectly justified whenever the results are statistically independent of the selection criterion under the null hypothesis (Kriegeskorte et al., 2019). However, it may lead to double dipping and missing information. In this study, the absence of statistically significant TPJ activation in the analyzed data led to the TPJ being ignored. In the future, it should be made explicit in the analysis, and the reliability of the results should be ensured by appropriate statistical methods (e.g., cross-validation, independent data sets, or techniques to control for selective bias).” p.26-27

      In several key analyses within this study (e.g., single-brain activation in the paragraph starting from L398, neural synchronization in the paragraph starting from L393), the TPJ is mentioned alongside the DLPFC. However, in subsequent detailed analyses, the TPJ is entirely ignored.

      We thank the reviewer for your careful review and valuable comment. TPJ is referenced in certain analyses within this paper (as detailed in paragraphs L414 and L440); however, its role remains inadequately investigated and expounded upon in subsequent more intricate analyses. This is due to the absence of statistically significant TPJ activation in the analyzed data. As pointed out by the reviewer, limitations may exist in pursuing further analyses through ROIs, a point we also have addressed in the Discussion section (p.27).

      The method for analyzing single-brain activation is unclear. Although it is mentioned that GLM (generalized linear model) was used, it is not specified what regressors were prepared, nor which regressor's β-values are reported as brain activity. Without this information, it is difficult to assess the validity of the reported results.

      We have revised the relevant description to clarify the analyses of single-brain activation (p. 11)

      While the model illustrated in Figure 7 seems to be interesting, for me, it seems not to be based on the results of this study. This is because the study did not investigate the causal relationships among the three metrics. I guess, Figure 5D might be intended to explain this, but the details of the analysis are not provided, making it unclear what is being presented.

      We regret the confusion that has arisen. Firstly, as highlighted by the reviewer, the model depicted in Figure 7 is not directly derived from the causal analysis conducted in this study. Our investigation did not directly explore the causal relationships among the three indicators; instead, we constructed a model based on correlations and potential mechanisms. In the revised manuscript, we have explicitly stated that Figure 7 represents a descriptive model (p.22).

      Regarding Figure 5D, the reviewer noted that while it may offer some explanatory value, it lacks the necessary analytical detail to elucidate the chart's significance clearly. We have clarified the details of the analysis in Figure 5 (pp.13-14). The model in Figure 5D suggested that the connection between the similarity in individual-collective performance and the correlation of brain activation, as well as whether the impact of each individual’s single-brain activation on the corresponding group’s GNS was regulated by their brain activation connectivity.

      “Finally, we employed correlation and mediation analyses to assess if brain activation connectivity could explain the connection between individuals’ single-brain activation and the related group’s GNS. We examined the connection between the similarity in individual-collective performance and the correlation of brain activation, as well as whether the impact of each individual’s single-brain activation on the corresponding group’s GNS was regulated by their brain activation connectivity. We utilized the PROCESS tool in SPSS to investigate the proposed moderation effect. Specifically, we applied Model 1 with 5000 bootstrap resamples to examine the interaction between the independent variable (i.e., single-brain activation) and the moderator (i.e., brain activation connectivity) in predicting the dependent variable (i.e., GNS). It is noteworthy that prior to analysis, all variables in the moderation model were mean-centered to reduce multicollinearity and improve the interpretability of interaction terms.”  p.13-14

      “Building on the above results, we have developed a two-in-one neural model that explains how group identification influences collective performance. This descriptive model aims to illustrate the potential interrelationships among these indicators and establish a conceptual framework to inspire forthcoming research endeavors.”  p.21

      The details of the experiment are not described at all. While I can somewhat grasp what was done abstractly, the lack of specific information makes it impossible to replicate the study.

      As suggested, we have clarified the details of the experiment in the manuscript.

      (1) As stated in the public review, the details of the experiment are not described at all and while I can somewhat grasp what was done abstractly, the lack of specific information makes it impossible to replicate the study. In points a-e below, I list the aspects that I could not fully understand, but I am not asking for direct answers to these points. Instead, please provide a detailed description of the experiment so that it can be replicated.

      Thank you for your suggestion; we have responded to each question sequentially and elaborated on the experiment specifics to ensure replicability.

      (a) Please provide more detailed information about the Group Identification Task. How much did each participant speak (was there any asymmetry in the amount of speaking, and was there any possibility that the asymmetry influenced the identification rating)? Did the three participants interact in person, or online? Are they isolated from experimenters? How was the rating conducted, what I mean is that it's a PC-based rating?

      We apologize for the lack of detail in our description of the procedures for the experiment.

      For the first question, we draw upon previous studies concerning the manipulation of group identity while controlling the content of pre-task conversations. Specifically, the high-identity group engaged in self-introductions and identified similarities among the three members, whereas the low-identity group discussed topics related to the current semester's classes (Xie et al., 2023; Yang et al., 2020). Both discussions were conducted for the same duration of three minutes, ensuring that the number of exchanges between the two groups remained comparable. There was almost no asymmetry in the amount of speaking. We also conducted a manipulation check, which confirmed the effectiveness of our identity manipulation(pp.5-6).

      Xie, E., Li, K., Gu, R., Zhang, D., & Li, X. (2023). Verbal information exchange enhances collective performance through increasing group identification. NeuroImage, 279, 120339.

      Yang, J., Zhang, H., Ni, J., De Dreu, C. K., & Ma, Y. (2020). Within-group synchronization in the prefrontal cortex associates with intergroup conflict. Nature neuroscience, 23(6), 754-760.

      “Both discussions were conducted for the same duration of three minutes, ensuring that the number of exchanges between the two groups remained comparable.”  p.5-6

      For the second question,the three participants interacted offline in a face-to-face setting, while the experimenter remained outside the laboratory (p.6).

      “The three participants conducted face-to-face offline interaction throughout the manipulation process.” p.6

      For the third question, at the beginning of the experimental task, participants were isolated from the experimenters (p.6).

      “In addition to explaining the next phase of the task and controlling the timer, experimenters would be isolated from participants.” p.6

      For the last question, the rating of group identification was conducted through a questionnaire presented on participants’ phones (p.6).

      “The questionnaire was presented on participants’ phones.” p.6

      (b) The procedures of the Main Task are also unclear. For the Reading Information (5 min): How was the information presented? PC-based or paper-based? How were the participants seated? Did they read it independently?

      We apologize for the missing details. We have included the following information in the article.

      For the first and last question, each participant would get a piece of paper, which presents the common information and private information. They read independently. (p.6)

      “Each participant would get a piece of paper, which presented the information. Participants could read independently.” p.6

      About how the participants sat, the three participants sat around a table without partitions between each other. Only in the discussion stage, they could communicate face-to-face (p.6).

      “They sat around a table without partitions between each other.” p.6

      “In this process of discussion, the participants were able to communicate face-to-face and verbally.” p.6

      (c) For Sharing Private Information: The authors stated they share text messages using Tencent Meeting. If so, how and with what devices? How was the information displayed on the screen? Were the participants even in the same room?

      Thank you for your reminder. We have added more details now (p.6). Firstly, the experimenter sent the Tencent Meeting link to the participants. After the participants entered the meeting through their mobile phones, they could text the information they wanted to share in the chat box of the meeting. They were in the same room, with Tencent Meeting recording shared information, the participants could view them at any time.

      “During the group sharing, participants entered Tencent Meeting via their mobile phones and were able to text their private information in the chat box to their group members for 5 minutes.” p.6

      (d) For Discussing Information: It's a verbal interaction. How did they interact with others? What is the distance between them? I found a very small picture in Figure 8, but that is all information about experiment settings, that is provided by the authors.

      We are sorry about the missing details. As we have explained in the article it’s a verbal communication, so participants could talk face to face in one room. We have included the following information in the article (p.6).

      “Participants were sitting and communicating around a table. The distance between adjacent participants was about 15 cm, and the distance between face-to-face participants was about 40 cm. In this process of discussion, the participants were able to communicate face-to-face and verbally.” p.6

      (e) For the Decision Process (5 min): How did they answer (What I mean is verbally, writing, or computer-based input), and how did the experimenters record these answers?

      The questions were presented on paper, so the participants could write down their answers and experimenters could count the answers on paper. We have included the following information in the article(p.7).

      “After discussion, all triads were given 5 minutes to answer the following questions (i) the probability of three suspects, 0%-100% for each suspect; (ii) the motivation and tool of crime; and (iii) deduced the entire process of crime. The three questions were presented on paper, allowing participants to write their answers directly on the same sheet. Subsequently, three independent raters used these paper questionnaires to record and calculate the scores for each group.” p.7

      (2) I find the model presented in Figure 7 to be intriguing. Understanding why inter-brain synchronization occurs and how it is supported by specific single-brain activations or intra-brain functional connectivity is indeed a critical area for researchers conducting hyperscanning studies to explore. However, the content depicted in this model is not based on the results of this study. This is because the study did not investigate the causal relationships among the three metrics. I guess, Figure 5D might be intended to explain this, but the details of the analysis are not provided, making it unclear what is being presented. Please include a detailed explanation.

      The specific answers are available on page 5 of our response letter.

      (3) The analysis of single-brain activation analysis (and probably other analyses) focuses on the period from reading to making decisions (L237). Why was this entire interval chosen for analysis? Reading does not involve social interaction. As mentioned in a previous comment, the details of the tasks are unclear, so it's difficult to understand what was actually done in the reading period. Anyway, why were these different phases combined as the focus of analysis? Please clarify the reasoning behind this choice.

      Thank you for your feedback. The decision to analyze the entire interval, spanning from reading to decision-making, was primarily made to grasp the continuum of information processing comprehensively. While reading itself lacks social interaction, it serves as the foundation for subsequent decision-making, during which participants' cognitive states and affective responses gradually evolve. Therefore, examining these two phases collectively enables a more thorough investigation into how information influences decision-making. Furthermore, considering the task details remain ambiguous, we aim to uncover the underlying cognitive and affective mechanisms through a holistic analysis.

      (4) The method for analyzing single-brain activation is unclear. Please provide a detailed description of the analysis methods.

      Thank you for your suggestion, we have added more details in the Method section (p.11).

      “In the GLM model analysis, HbO was the dependent variable, and the regression amount was set to different task stages (a. Reading information, b. Sharing private information, c. Discussion information, d. Decision). After that, we convolved the regression factor with the Hemodynamic Response Function (HRF), and obtained the brain activation β value of each participant in each channel at different task stages through regression analysis.”  p.11

      (5) In the periods of Reading Information and Sharing Private Information, there appears to be no social interaction between participants (Figure1D). However, Figure 6 shows an increase in brain activity correlation even during the first 10 minutes (it corresponds to the Reading and Sharing period). Why does inter-brain correlation (GNS, in this study) increase even though there is no interaction between participants? Please provide an explanation.

      Sharing private information fosters interactive engagement, necessitating its exchange during Tencent Meetings to facilitate sharing. Previous research suggests that heightened correlations in brain activity can be attributed to (1) intrinsic cognitive processes, wherein participants display similar cognitive and emotional responses, fostering shared cognitive processing and brain activity synchronization despite limited external interaction; (2) emotional connections, as divulging private information elicits emotional responses that can be neurally correlated among individuals; and (3) environmental influences, where shared environments and contexts prompt neural interaction among participants even in the absence of direct social engagement. These factors collectively contribute to increased brain activity correlations without active interaction. Our primary focus, however, lies in the phase characterized by significant synchronized brain activity.

      Minor Comments:

      (6) Equation 1 Explanation: There is no explanation of Equation 1. It mentions Yi as the collective score, but what constitutes the collective score Yi is not defined in the manuscript. Additionally, while "i" is referred to as an item (in Line 196), the meaning of "item" is not clear. Therefore, the meaning of this equation is not understood.

      We apologize for this confusion. We have added a description in the manuscript (p.9).

      “In Eq.1, x is the individual score, y is the collective score (y is calculated from the three per capita scores), and i stands for the group number for the item. So, x_i means the individual score of participants in the _i group, and y_i means the collective score of the _i group. _d (x, y) r_epresents the distance from the individual to the collective score.”  p.9

      (7) Equation 2 Explanation: There is no explanation for Equation 2. Please provide descriptions for all variables such as S, t, and w.

      We have clearly stated the meaning of s, t, and w in the first edition of the manuscript article (p.12).

      As shown in L291-293: Here, t denotes the time, s denotes the wavelet scale, 〈⋅〉 represents a smoothing operation in time, and W is the continuous wavelet transform (Grinsted, Moore, & Jevrejeva, 2004).

      (8) Acronyms: Please define all acronyms upon their first appearance (e.g., CFI, TLI, RMSEA in L380).

      We apologize for these mistakes, and we have added full explanations for abbreviations upon their first use (p.16).

      “The mediation model demonstrated a satisfactory fit (CFI = 0.93, TLI = 0.93, RMSEA = 0.04) (CFI-Comparative Fit Index; TLI-Tucker-Lewis index; RMSEA-Root-Mean-Square Error of Approximation), suggesting that the perceived group identification of each individual affected the alterations in single-brain activations in the DLPFC, consequently leading to variations in their performance (β<sub>a</sub> = 0.16, t = 2.20, p = 0.030; β<sub>b</sub> = 0.26, t = 3.56, p < 0.001; β<sub>c</sub> = 0.18, t = 2.34, p = 0.020) (Figure 3C).”  p.16

      (9) Hyperscanning fMRI Studies: Since there are hyperscanning fMRI studies analyzing communication among three people (e.g., Xie et al., 2020, PNAS), it would be beneficial to cite this research. pnas.org/doi/pdf/10.1073/pnas.1917407117.

      As suggested, we have cited this paper. (p.4)

      (10) Line 272; Line 275: Should these references be to Benjamini & Hochberg (1995)?

      As suggested, we have revised our citation.

      (11) Research Objectives: The authors' aim seems to be understanding the relationship between Group Identification Level (High or Low), collective performance, and inter-brain synchronization (GNS). If so, shouldn't the results shown in Figure 6 illustrate how these differ between High and Low groups?

      We are grateful to the reviewer for your insightful comment. This study aimed to investigate the impact of group identity levels on collective performance and interbrain synchronization. Our analysis primarily focused on inter-group disparities to elucidate the potential influence of varying levels of group identification on collective behavior and neural synchrony, as highlighted by the reviewer. It is important to note that the relationship between group identification levels and collective performance, as well as neural synchronization, may represent a continuous or correlational process, rather than a binary comparison between two distinct groups. Notably, we treated group identification as a continuous variable and, consequently, Figure 6 was designed to illustrate trends in the association between group identification levels and both collective performance and neural synchronization, without conducting significance tests between groups. We are confident that the depiction in Figure 6 effectively captures the evolving dynamics between group identification levels and both collective performance and neural synchronization.

      (12) Figure 6 Star-Marker: What is the star marker shown in Figure 6? Please provide an explanation.

      We apologize for this confusion. We have added this explanation to the article. (p.21)

      “The red star sign indicates that at this time point, the neural signal began to increase significantly.” p.21

      (13) Pearson's Correlation: Use "Pearson's correlation" instead of "Pearson correlation."

      Thanks for your comments, we've changed Pearson correlation to Pearson's Correlation for a total of 10 places in the original text (pp. 9,11,13, 15,16, 19,23).

      “Moreover, the Pearson’s correlation was used to examine the relationship between group identification_2 and collective performance.” p.9

      “Subsequently, we used Pearson’s correlation analyses to investigate the relationship between single-brain activation and individual performance.” p.11

      “Second, the Pearson’s correlation between GNS and collective performance was performed.” p.13

      “Following that, we analyzed Pearson’s correlations between the original HbO data in the region related to individual and collective performance, denoted as brain activation connectivity (Lu et al., 2010).” p.13

      “Subsequently, the Pearson’s correlation between the quality of information exchange and collective performance was assessed.” p.15

      “Furthermore, the results of the Pearson’s correlation indicated that groups with higher group identification were more likely to exhibit better collective performance (r \= 0.38, p \= 0.003) (Figure 2B).” p.15

      “The Pearson’s correlation and its associated analyses were based on the data from group identification_2. *p < 0.05.” p.16

      “We first extracted the HbO brain activities related to individual performance (e.g., DLPFC, CH4) and collective performance (e.g., OFC, CH21) of each group member and conducted a Pearson’s correlation between the two.” p.19

      “Subsequently, Pearson’s correlation was used to test whether individual differences in the similarity in individual-collective performance were reflected by DLPFC-OFC connectivity.” p.19

      “Pearson’s correlation showed that the higher quality of information exchange, the better collective performance (r \= 0.36, p \= 0.007) (Figure 8C).” p.23

      (14) MNI Coordinates: The MNI coordinates for each channel are listed in the supporting information. How were these coordinates measured? Were they consistent for all participants? Was MRI conducted for each participant to obtain these coordinates?

      Thank you for your reminder, we have included the necessary instructions in the revised version. First, we need to clarify that we referred to previous literature to determine the placement of the optical probe plates. Following the completion of data collection, we utilized the Vpen positioning system to accurately locate the detection light poles, ultimately obtaining the MNI positioning coordinates. These coordinates were basically consistent for each participant. (p.8)

      “For each participant, one 3 × 5 optode probe set (8 emitters and 7 detectors forming 22 measurement points with 3 cm optode separation, see Table S1 for detailed MNI coordinates) was placed over the prefrontal cortex (reference optode is placed at Fpz, following the international 10-20 system for positioning). The other 2 × 4 probe set (4 emitters and 4 detectors forming 10 measurement points with 3 cm optode separation, see Table S2 for detailed MNI coordinates) was placed over the left TPJ (reference optode is placed at T3, following the international 10-20 system for positioning). The probe sets were examined and adjusted to ensure consistency of the positions across the participants. After the completion of data collection, we utilized the Vpen positioning system to accurately locate the detection light poles, ultimately obtaining the MNI positioning coordinates.”  p.8

    1. Author response:

      The following is the authors’ response to the previous reviews

      Joint Public Reviews:

      Summary

      This manuscript explores the transcriptomic identities of olfactory ensheathing cells (OECs), glial cells that support life-long axonal growth in olfactory neurons, as they relate to spinal cord injury repair. The authors show that transplantation of cultured, immunopurified rodent OECs at a spinal cord injury site can promote injury-bridging axonal regrowth. They then characterize these OECs using single-cell RNA sequencing, identifying five subtypes and proposing functional roles that include regeneration, wound healing, and cell-cell communication. They identify one progenitor OEC subpopulation and also report several other functionally relevant findings, notably, that OEC marker genes contain mixtures of other glial cell type markers (such as for Schwann cells and astrocytes), and that these cultured OECs produce and secrete Reelin, a regrowth-promoting protein that has been disputed as a gene product of OECs.

      Strengths

      This manuscript offers an extensive, cell-level characterization of OECs, supporting their potential therapeutic value for spinal cord injury and suggesting potential underlying repair mechanisms. The authors use various approaches to validate their findings, providing interesting images that show the overlap between sprouting axons and transplanted OECs, and showing that OEC marker genes identified using single-cell RNA sequencing are present in vivo, in both olfactory bulb tissue and spinal cord after OEC transplantation.

      Challenges

      Despite the breadth of information presented, and although many of the suggestions in the initial review were addressed well, some points related to quantification and discussion of sex differences are not fully addressed in this revision.

      (1) The request for quantification of OEC bridges is not fully addressed. We note that this revision includes the following statement (page 6): "We note, however, that such bridge formation is rare following a severe spinal cord injury in adult mammals." However, the title of the paper states that olfactory ensheathing cells promote neural repair and the abstract states that "OECs transplanted near the injury site modify the inhibitory glial scar and facilitate axon regeneration past the scar border and into the lesion." Statements such as these make it more crucial to include quantification of OEC bridges, because if single images are shown of remarkable, unusual bridges, but only one sentence acknowledges the low frequency of this occurrence, then this information taken together might present the wrong takeaway to readers.

      Including some sort of quantification of bridging, whether it be the number of rats exhibiting bridges, the percentage area of OECs near a lesion site, or some other meaningful analysis, would add rigor and clarity to the manuscript.

      The short answer to the OEC bridges quantification is that in our last 2 studies combined, we observed bridges in 3/13 OB-OEC-transplanted rats versus 0/16 control rats (p=0.042 by two-sample proportion test; Thornton et al., 2018, Dixie, 2019). In addition to the new data on bridge formation shown in the current manuscript, our previous and most impressive data of serotonergic axons (5-HT-labeled, red) that crossed the entire lesion site is shown below (from Thornton et al., 2018). The image together with Supplemental video 1 (https://ars.els-cdn.com/content/image/1-s2.0-S0014488618302632-mmc1.mp4) show a reconstruction of multiple sections containing serotonergic axons that bridge the injury site in one OEC-transplanted, completely transected rat (1/5 OEC vs. 0/5 fibroblast-transplanted rat). The video also shows retrogradely-labeled Pseudo-rabies virus taken up by a few scattered neurons (green dots) within and above the lesion site, additional evidence suggesting axonal regeneration.

      In addition to adding bridge quantification in the Results section, we now discuss quantified results on physiological and anatomical evidence of axon regeneration across the injury site from five of the six large spinal cord injury (SCI) studies conducted by the Phelps and Edgerton laboratories. Our studies used the most difficult SCI model, a complete, thoracic spinal cord transection in adult rats, followed by OB-OEC transplantation. This is the only model in which axon regeneration can be differentiated from axon sparing found in incomplete SCIs. An introductory paragraph now summarizes and references data generated from these studies that specifically addresses questions about how OECs modify the injury site and facilitate axonal outgrowth into and across into the lesion core. While relatively few axons cross the entire injury site to reach the caudal spinal cord, many more axons project into the injury site of OEC-transplanted rats compared to those in control rats. Quantification of axonal outgrowth into the lesion site of completely transected, OEC-transplanted rats from three previous long-term studies is now discussed in the Introduction. Based on both physiological and anatomical evidence reviewed from our previous work, we hope the editors and Reviewer agree that our previous studies have shown that OECs promote axonal outgrowth and modify the injury site.

      Page 5, Introduction:

      “Together with collaborators, we conducted six spinal cord injury studies in adult rats with a completely transected, thoracic spinal cord model followed by OB-OEC transplantation (Kubasak et al., 2008; Takeoka et al., 2011; Ziegler et al., 2011; Khankan et al., 2016; Thornton et al., 2018; Dixie, 2019). Results from five of our six studies showed physiological and anatomical evidence of axonal regeneration into and occasionally across the injury site. In 6-8-month-long studies, Takeoka et al. (2011) and Ziegler et al. (2011) reported physiological evidence of motor connectivity across the transection in OEC- but not media-transplanted rats. These experiments used transcranial electric stimulation of the motor cortex or brainstem to detect motor-evoked potentials (MEPs) with EMG electrodes in hindlimb muscles at 4- and 7-months post-transection. After 7 months, 70% of OEC-treated rats responded to stimulation with hindlimb MEPs (motor cortex, 5/20; brainstem 12/20; Takeoka et al, 2011). A complete re-transection above the original transection was carried out one month later and all MEPs in OEC-injected rats were eliminated. These results provide physiological evidence of axon conductivity across the injury site in OEC-treated rats. Additionally, three of our long-term studies evaluated anatomical axonal outgrowth of the descending serotonergic Raphespinal pathway into and through the injury site. Significantly more serotonergic-labeled axons crossed the rostral inhibitory scar border (Takeoka et al., 2011) or occupied a larger area within the injury site core (Thornton et al., 2018, Dixie, 2019) in OEC-transplanted rats than in fibroblast or media controls. In addition, significantly more neurofilament-labeled axons were found within the lesion core of OEC-transplanted versus control rats (Thornton et al., 2018, Dixie, 2019).”

      Page 7, Results: We revised the sentence below and added additional information.

      “We note, however, that such bridge formation is rare following severe spinal cord injury in adult mammals and was detected in 2 out of 8 OEC-transplanted rats and 0/11 media or fibroblast-transplanted controls in this study (Dixie, 2019). Combined with the 1/5 OEC-transplanted rats with axons crossing the injury and 0/5 fibroblast controls in our previous study (Thornton, 2018), we observed bridges in 3/13 OEC-transplanted rats vs 0/16 controls (p=0.042, two-sample proportion test). Bridge formation, in conjunction with the additional physiological and anatomical evidence of axonal connections across the injury site presented in our previous studies, strongly supports the capacity of OECs in neural repair.”

      Page 46, Figure legend 1: We added statistical data to the legend

      “Bridge formation across the injury site was observed in 2 of 8 OEC-transplanted and 0 of 11fibroblast- or media-transplanted spinal cord transected rats. Combined with the 1/5 OEC-transplanted rats with axons crossing the injury and 0/5 fibroblast controls in our previous study (Thornton, 2018), we observed bridges in 3/13 OEC-transplanted rats vs 0/16 controls (p=0.042, two-sample proportion test).”

      (2) The additional discussion of sex differences in OEC bridging elaborates on the choice to study female rats, citing bladder challenges in male rats, but does not note salient clinical implications of this choice. Men account for ~80% of spinal cord injuries and likely also have worsened urinary tract issues, so it would be important to acknowledge this clinical fact and consider including males in future studies.

      Response: We agree that studying SCI repair in male rodents is very important as most people with these injuries are male. We did find one publication by Walker et al. (2019, Journal of Neurotrauma 36:1974-1984) that looked at sex differences in aged-matched male and female rats after a moderate contusion SCI. They examined a number of histological and functional features, and did not find many differences between the genders. Compared to studies of moderate SCI, studies using a completely transected spinal cord model must carry out manual bladder expressions a minimum of twice a day throughout the entire 5 to 7-month study in order to maintain kidney health. Because male urethras are much longer than those of females, males are much more likely that females to die from kidney disease during a complicated, long-term studies such as ours. Fortunately, most SCIs in humans are contusions rather than complete transections so an incomplete contusion model is most appropriate for studying sex differences. We modified the previous statement in our Discussion section as below.

      Page 25, Discussion

      “We acknowledge that in humans, males account for ~80% of spinal cord injuries (National Spinal Cord Injury Statistical Center, 2024) and sustain more serious urinary tract issues than females. We examined females in the current study due to practical experimental considerations, but it is necessary to examine males in future studies.”

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) It is strongly recommended that some sort of quantification of bridging be included in the figures or in a table, whether this is the number of rats showing bridges, the percent area of OECs near the lesion site, or some other meaningful analysis.

      As discussed in the response in Challenge section (1) above, we observed bridges in 3/13 OEC transplanted rats vs 0/16 controls across our two most recent studies. In addition, we added evidence of physiological and anatomical axonal connections across the injury site from our previous studies. We have added the additional information in the Introduction, Results, and Figure legend 1.

      (2) It is recommended that clinical sex differences in spinal cord injury (with ~80% occurring in men) be acknowledged in the Discussion. This clinical fact could be directly mentioned without much justification.

      See Challenge (2) above and addition to the Discussion on page 25.

      (3) Figs. 1, 5, 6: There is still no quantification included for these figures, which detracts from the ability of readers to understand the context and importance of these results. It is recommended to include quantification for these figures.

      Response regarding quantification associated with Figures 1, 5 and 6:

      Regarding Figure 1: We have discussed the additions to the text of the Introduction, Results and the legend of Figure 1 in detail on pages 2-3 of this response. These are important new additions to our paper.

      Regarding Figure 5: We added quantitative information regarding the analysis of Connective Tissue Growth Factor (Ctgf) expression in the injury site.

      Page 10-11, Results:

      “We found high levels of Ctgf expression in GFP-OECs (n=4 rats) that bridged much of the injury site and also detected Ctgf on near-by cells (Figure 5d, d1-2). GFP-labeled fibroblast transplantations (n=3 rats) served as controls and also expressed Ctgf.”

      Page 36, Methods:

      “To examine Ctgf expression in the spinal cord lesion site, we processed 1 slide per animal with ~6 equally-spaced sagittal sections throughout spinal cord from the Khankan et al. (2016) study. Our aim was to assess if transplanted OECs (n=4 rats) and transplanted fibroblasts (n=3 rats) express CTGF in the injury site.”

      Regarding Figure 6: The statistics for Figure 6 are found on page 13 of the Results section and page 38 of the Methods section. We now added the statistics to the Figure 6 legend on page 49.

      Page 13, Results:

      “To determine if the proliferative OECs differ in appearance from adult OECs, and whether there is concordance between our OEC subtypes based on gene expression markers and previously described morphology-based OEC subtyping (Franceschini & Barnett, 1996), we analyzed OECs identified with the anti-Ki67 nuclear marker and anti- Ngfr<sup>p75</sup> (Figure 6g-h). Of the Ki67-positive OECs in our cultures, 24% ± 8% were strongly Ngfr<sup>p75</sup>-positive and spindle-shaped, whereas 76% ± 8% were flat and weakly Ngfr<sup>p75</sup>-labeled (n=4 cultures, p\= 0.023). Here we show that a large percentage (~3/4<sup>ths</sup>) of proliferative OECs are characterized by large, flat morphology and weak Ngfr<sup>p75</sup> expression resembling the previously described morphology-based astrocyte-like subtype. Our results indicate the two types of OEC classifications share certain degrees of overlap, indicating similarities but also differences between the two classification methods.”

      Page 38, Methods: Morphological analyses of Ki67 OEC subtypes

      “To determine if OEC progenitor cells marked with Ki67 immunoreactivity have a distinctive morphology, purified and fixed OEC cultures from 4 rats were processed with anti- Ngfr<sup>p75</sup>, anti-Ki67 and counterstained with Hoechst (Bis-benzimide, 1:500, Sigma-Aldrich, #B2261). Images were acquired from 7-10 randomly selected fields/sample using an Olympus AX70 microscope and Zen image processing and analysis software (Carl Zeiss). We distinguished the larger, flat ‘astrocyte-like’ OECs from the smaller, fusiform ‘Schwann cell-like’ OECs, and recorded their expression of Ngfr<sup>p75</sup> and Ki67. Cell counts from each field were averaged per rat and then averaged into a group mean ± SEM. A Student t-test was conducted to compare the effect of Ngfr<sup>p75</sup>-labeled cell morphology and the proliferative marker Ki67. Statistical significance was determined by p < 0.05.”

      Page 49, Figure 6 legend:

      “Of the OEC progenitors that express Ki67, 76% ± 8 of them display low levels of Ngfr<sup>p75</sup> immunoreactivity and a “flat” morphology (g2, h2; green nuclei, arrowheads). The remainder of Ki67-expressing OECs express high levels of Ngfr<sup>p75</sup> and are fusiform in shape (24% ± 8%, n=4 cultures, Student-t test, p= 0.023).”

      (4) Fig. 9: Quantification is still not included in the figure for these Western blots, although it is appreciated that the authors included some quantification in their response letter. Including this in the figure would provide clarification for the reader.

      Thank you for your suggestion. We now add the quantification to figure 9, together with the methods used for western blot quantification and the figure legend.

      Page 32, Methods:

      “For quantification, ImageJ software (NIH) was used to analyze the densitometric data. Western blot images at 400, 300, and 150 kDa resolution were converted to grayscale followed by manually defining a Region of Interest (ROI) frame that captured the entire band in each lane using the "Rectangular" tool. The area of each selected band was measured by employing the same ROI frame around the band to record the integrated density, “Grey Mean Value”. Background measurements were similarly quantified, and background subtraction was performed by deducting the inverted background from the inverted band value. For relative quantification, target protein bands were normalized to the corresponding loading control (GAPDH) to derive normalized protein expression (fold change). Band intensities were quantified in triplicate for each sample. Data were analyzed with the Mann-Whitney U test to compare normalized protein expression between the Reln<sup>-/-</sup> group and the other groups. A one-sided p-value was calculated to test the hypothesis that protein expression levels in the other groups are greater than those in the Reln<sup>-/-</sup> group (negative control). Statistical significance was determined at p < 0.05. Analysis was performed using GraphPad Prism (version 9).”

      Page 52, Figure legend 9f:

      “(f) Quantitation of multiple isoforms of Reelin from 4-15% gradient gels. Positive and negative controls are Reln<sup>+/+</sup> and Reln<sup>-/-</sup> mouse cortices. Both rat tissue from the ONL (n=3) and CM (n=9) contain more 400 and 300 kDa Reelin compared to the Reln<sup>-/-</sup> mouse. Bars represent the standard deviation of the mean. One-sided Mann-Whitney U test was used to test that protein expression levels in the other groups are greater than those in the Reln<sup>-/-</sup> group, indicative of significant expression of Reln in the test groups. *p < 0.05.”

    1. Author response:

      The following is the authors’ response to the original reviews

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the Authors):

      The interpretation of results obtained with opto-Treacle (related to Figure 2C) may be expanded.

      We thank the reviewer for their insightful comment regarding the interpretation of the results obtained with opto-Treacle. We understand the concern that the difference in the size of the condensates formed by opto-Treacle (Figure 2C) compared to Treacle-2S or other constructs may raise questions about the role of tetramerization in driving condensate formation, as 2S is known to tetramerize while FusionRed is not susceptible to multimerization.

      To address this concern, we emphasize that we have demonstrated that overexpressed Treacle forms large condensates even in the absence of any fluorescent protein, as included in the revised manuscript. This observation supports the conclusion that Treacle's ability to form condensates is intrinsic and does not depend on the multimerization capacity of the fluorescent tag.

      We believe that the observed difference in condensate size between opto-Treacle and Treacle-2S, Treacle-GFP, or untagged Treacle arises primarily from the time available for condensate assembly. Opto-Treacle condensation occurs rapidly, within approximately 10 seconds of blue light illumination, whereas Treacle-2S, Treacle-GFP, or untagged Treacle undergo condensation over the extended period of 24–48 hours of protein overexpression. This temporal difference likely accounts for the disparity in condensate size, as longer assembly times allow for larger and more mature condensates to form.

      Given this reasoning, we consider it unnecessary to further emphasize the size differences in the main text of the article, as we believe the underlying explanation is clear and supported by the data. Nonetheless, we are open to incorporating additional clarifications if the reviewer deems it necessary.

      The authors might reconsider referring to Treacle as a scaffold. Ultimately, the scaffold for the nucleolus is the rDNA with its bound proteins. Scaffold proteins, by definition, bind multiple protein partners and facilitate the formation of multiprotein complexes, a role not really attributed to homotypic LLPS.

      We thank the reviewer for raising this important point regarding the use of the term "scaffold" in relation to Treacle. We fully acknowledge that rDNA, along with its associated protein complexes, serves as the primary structural scaffold for the nucleolus. However, we believe that referring to Treacle as a scaffold is appropriate and justified within the specific context of our study.

      First, we emphasize that we describe Treacle as a scaffold specifically for nucleolar fibrillar centers (FCs), rather than for the nucleolus as a whole. This distinction is important, as our work focuses on the role of Treacle in organizing FC components, rather than the broader structural organization of the nucleolus.

      Second, as the reviewer notes, scaffold proteins are defined by their ability to bind multiple protein partners and facilitate the formation of multiprotein complexes. Our findings demonstrate that Treacle's condensation properties promote the binding and retention of key rDNA-associated protein partners, including RPA194, UBF, and Fibrillarin, within the FCs. This activity aligns with the functional definition of a scaffold protein, as Treacle supports the spatial organization and cooperative interactions of FC components essential for rRNA transcription and processing. Therefore, while we appreciate the reviewer's observation regarding the central role of rDNA as a nucleolar scaffold, we maintain that the use of the term "scaffold" to describe Treacle's role in organizing FCs is consistent with its demonstrated functional properties.

      If authors decide to add the "Ideas and Speculation" subsection to their Discussion, it may be interesting to discuss the following outstanding questions: does Treacle undergo homotypic or heterotypic LLPS? Does its overexpression favor homotypic interactions? How does it segregate FC and DFC compartments -by exclusion? How does phase-separated Treacle interact with other proteins?

      We thank the reviewer for these insightful questions. While we believe that adding a dedicated "Ideas and Speculation" subsection would be redundant, we have already addressed the questions regarding Treacle’s homotypic or heterotypic LLPS and its interactions with other proteins in the revised "Discussion" section. Additionally, we have included a new section in the manuscript specifically focused on investigating the role of Treacle condensation in its interactions with protein partners, further expanding on these points.

      In Materials and Methods, smFISH section -"probes were designed as described (Yao et al, 2019) and labeled with FITS on the 3'ends" - was it meant to say FITC (i.e. Fluorescein)?

      We thank the reviewer for catching this error. This was indeed a typo, and we have corrected it to "FITC (i.e., Fluorescein)" in the revised text.

      Reviewer #2 (Recommendations for the Authors):

      Regarding recombinant Treacle, the main concern is that the authors may not be observing the condensation of Treacle itself. The quality of the purchased recombinant Treacle is unclear (this reviewer could not find Treacle listed on the vendor website despite using the supplied catalog number or vapors search terms). Furthermore, it is not clear if the condensates observed are Treacle or potentially the Dextran crowder. Only small percentages (>1%-5%) of either Dextran or PEG are needed to induce phase separation in two-component mixtures of these polymers. PEG may be in the Treacle storage butter. In addition to clarifying the State of recombinant Treacle, these concerns could be further assuaged by direct visualizing of Treacle forming condensates (via fluorescent n-terminal tagging) and filling in more of the phase space to observe the loss of condensates at a threshold concentration of Treacle. In general, the gold standard for establishing condensation of a given protein is mapping the full binodal phase diagram diagram of the protein. Understanding that protein is a limited resource, most groups simply map the lower concentration arm of the binodal, and this is sufficient to characterize a protein as having intrinsic condensation behavior. A similar mapping effort of Treacle would be welcomed. 

      We thank the reviewer for their thoughtful comments and for highlighting concerns regarding the interpretation of our experiments with commercial recombinant Treacle. We recognize the importance of ensuring that the observed condensation properties are intrinsic to Treacle and not influenced by potential contaminants, storage buffer components, or tags on the protein.

      To address these concerns, we have re-evaluated the condensation properties of Treacle using a recombinant fragment independently purified in our laboratory. Specifically, we expressed and purified a Treacle fragment (amino acids 291–426), which includes two S/E-rich low-complexity regions (LCRs) and two linker regions, in E. coli. The protein was expressed as a TEV-cleavable maltose-binding protein (MBP) fusion, purified under native conditions via amylose resin, and subjected to TEV cleavage. This was followed by ion-exchange chromatography and extensive dialysis to remove any remaining impurities. These additional steps ensured that the purified Treacle fragment was of high purity and free from confounding components, such as polyethylene glycol (PEG). We have included detailed descriptions of this protocol in the revised manuscript.

      Using this purified Treacle fragment, we confirmed its intrinsic condensation behavior in vitro. In the presence of 5% PEG8000 as a crowding agent, the fragment formed liquid-like condensates that exhibited spherical morphology and dynamic fusion events, key hallmarks of liquid-liquid phase separation (LLPS). Additionally, we demonstrated that the condensation of this Treacle fragment was sensitive to changes in pH and salt concentration but unaffected by 1,6-hexanediol treatment, suggesting that the condensates are stabilized predominantly by electrostatic interactions (Fig. 4B of the revised manuscript). Importantly, these findings provide robust evidence that Treacle possesses intrinsic phase-separation properties. All results from the commercial Treacle protein used in the initial version of the manuscript have been replaced with data obtained using this independently purified recombinant fragment.

      We undestand that the condensation behavior of the fragment may not fully capture the behavior of full-length Treacle. Nevertheless, the in vitro experiments provide valuable mechanistic insights into the biophysical properties of Treacle. Furthermore, as emphasized in the revised manuscript, our study primarily focuses on understanding the condensation and functional role of Treacle in a cellular context, where we observe its critical involvement in organizing nucleolar structure and regulating rRNA transcription. These cellular experiments highlight the biological relevance of Treacle’s condensation behavior.

      With regard to mapping the binodal phase diagram of Treacle, we concur with the reviewer that such an effort would be ideal for a more comprehensive characterization of Treacle’s condensation properties. However, the limited availability of purified protein currently precludes a detailed mapping effort. Despite this limitation, we believe the qualitative assessments of Treacle’s condensation under varying conditions, now included in the revised manuscript, sufficiently demonstrate its intrinsic ability to phase-separate.

      In conclusion, we are grateful for the reviewer’s feedback, which has allowed us to refine our methodology and strengthen the evidence supporting the intrinsic condensation properties of Treacle. We are confident that the revised manuscript provides a robust and thorough characterization of Treacle’s phase-separation behavior and its functional role in the cell, addressing the reviewer’s concerns. Thank you for your constructive recommendations, which have significantly improved the quality of our work.

      Replacing 'liquid-phase' and 'liquid' with 'liquid-like' would make the language consistent with other papers in the field and more accurately reflect the degree of material state analysis carried out in the study.

      We thank the reviewer for this insightful recommendation. In response to the suggestion, we have revised the manuscript to replace the terms "liquid-phase" and "liquid" with "liquid-like" throughout the text. This change ensures consistency with terminology commonly used in the field and more accurately reflects the degree of material state analysis performed in our study. We believe this adjustment improves the clarity and precision of our findings, aligning the manuscript with standard practices in the field. Thank you for helping us enhance the quality of the presentation.

      The 'unclear' nature of the condensation behavior of the FC phase of the nucleolus is listed as a motivation for carrying out the study in the introduction; the authors could note here two recent papers that have investigated the nature of FC condensation: Jaberi-Lashkari et al. 2023 and King et al. 2024. The reviewer notes that while these were both pre-printed in late 2022, they were only recently published.

      We thank the reviewer for bringing these recent studies to our attention. In response to the suggestion, we have cited the papers by Jaberi-Lashkari et al. (2023) and King et al. (2024) in both the introduction and discussion sections of the revised manuscript. These references are highly relevant to the context of our study and provide valuable insights into the condensation behavior of the FC phase of the nucleolus. We agree that incorporating these works strengthens the framing of our study and situates it more effectively within the broader field. Thank you for this constructive recommendation.

      The statement that Treacle is "the main molecule present in the FC" is a substantial claim that does not need to be made to promote the author's case, nor is it well supported by the provided reference (Gal et al., 2022).

      We thank the reviewer for pointing out this overstatement in our original manuscript. In response, we have revised the text to provide a more accurate and well-supported description. Specifically, we have replaced the claim that Treacle is "the main molecule present in the FC" with a statement highlighting its direct interactions with UBF and RNA Pol I, as well as its colocalization with these proteins within the FC. This revision ensures alignment with the provided references and more accurately reflects the current understanding of Treacle's role in the FC. We appreciate the reviewer's attention to this detail, which has helped us improve the clarity and accuracy of our manuscript.

      The statement that "Treacle is one of the most intrinsically disordered proteins" is vague and unnecessarily grand. Treacle is a fully intrinsically disordered protein; these comprise 5% of the human proteome (Tsang et al. 2020), so Treacle is, indeed, unusual in that regard.

      We thank the reviewer for highlighting the vague and unnecessarily broad nature of the original statement. In response, we have revised the text to provide a more precise and accurate description of Treacle's structural properties. Specifically, we replaced the claim that "Treacle is one of the most intrinsically disordered proteins" with the statement that "According to protein structure predictors (e.g., AlphaFold, IUPred2, PONDR, and FuzDrop), Treacle is a fully intrinsically disordered protein." This wording reflects the unique nature of Treacle while remaining scientifically accurate and supported by reliable computational predictions. We appreciate the reviewer's feedback, which has allowed us to improve the rigor and clarity of our manuscript.

      A comment on the implications of the immobile pool of Treacle (which appears to be ~50% in WT and across a range of mutants) would be welcome. Additionally, the limitations of FRAP for interrogating material properties of condensed material in living systems are provided in Goetz and Mahamid, 2020. In this paper, the authors review instances where the ultrastructure of condensate is known and where FRAP data is available. They show that crystalline assemblies can recover faster than apparently liquid, spherical assemblies. A comment in the text about how these limitations apply to this study would be welcome.

      We appreciate the reviewer’s insightful comments regarding the interpretation of the immobile pool of Treacle and the limitations of FRAP for characterizing material properties in living systems. As noted in our response to the public review, we believe the ~50% recovery rate after photobleaching observed in our experiments is best explained by the redistribution of Treacle molecules within the condensate, rather than significant exchange with the surrounding phase. This interpretation is strongly supported by the full- and half-FRAP analyses included in the revised manuscript, which demonstrated internal mixing dynamics within the condensates.

      There appears to be a typo in the following sentence: "The highly positively charged CD serves as the nucleation center for RD but exhibits ambivalent phase properties, transitioning from LLPS to LSPS in the absence of rRNA." The LLPS to LSPS behavior was observed for mutants to the central domain (RD), not the c-terminal domain (CD).

      Throughout the authors report single snapshots of representative cells and single line traces. Analysis of the key morphological feature across the population of cells would help the reader understand how widespread the observed phenotype is.

      We thank the reviewer for raising this important point regarding the representation of morphological features across the cell population. To address this concern, we have included widefield micrographs of cell fields in the revised figures to provide a more comprehensive view of the phenotypes observed.

      The statement that "The phase behavior of polymers is determined by interactions through associative motifs, referred to as stickers, separated by spacers, which are not the primary driving forces for phase separation" could be improved by pointing out that this is potentially incomplete for describing the kind of condensation that highly charged polymers undergo. The high charge and charge segregation of Treacle suggest that it is a blocky polyampholyte and that it condenses by coacervation. Models of associative polymers can be useful for describing coacervation, however, the driving forces for coacervation are less understood and have been proposed to include an entropic component (see Sathyavageeswaran et al. 2024, Sing and Perry 2020 and work from their groups as well as the Obermayer (Columbia) and Terrell (U. Chicago) Groups).

      We thank the reviewer for highlighting this important aspect of the phase behavior of charged polymers and for suggesting relevant references. In response, we have revised the discussion section of the manuscript to include a more nuanced explanation of the condensation mechanisms for highly charged polymers such as Treacle. Specifically, we now describe Treacle as a blocky polyampholyte, suggesting that its condensation behavior may be driven by coacervation mechanisms.The relevant references have been added to the discussion section of the revised manuscript.

      In addition to the above, the authors may consider citing two recent publications from the Pappu group (King et al. Cell 2024 and King et al. Nucleus 2024) that directly investigate the condensation potential of K-rich and E/D-rich' grammars' on nucleolar proteins and show that, like the authors, the K-rich region is essential for localization and is conserved across nucleolar proteins.

      We thank the reviewer for bringing these relevant publications to our attention. The suggested references from the Pappu group (King et al., Cell 2024, and King et al., Nucleus 2024) have been added to the introduction and discussion sections of the revised manuscript, and their findings have been appropriately integrated into our analysis.

      The authors could consider replacing the use of LLPS with a more generic term such as "condensation" or "biomolecular condensation." LLPS of polymers is a segregative transition driven by its incompatibility with the surrounding solvent. As indicated, Treacle is likely to be undergoing some form of coacervation (which is predominantly an associative tradition), which can be genetically described as condensation. See Pappu et al. 2023 for more details.

      We thank the reviewer for their insightful suggestion. Following the reviewer's recommendation, we have replaced the term "LLPS" with "condensation" or "coacervation" throughout the manuscript, where appropriate. Additionally, we have referenced Pappu et al. (2023) and other to provide further context and clarity regarding the distinctions between these terms.

      The authors cite Yao et al. 2019, but do not cite the follow-up study (Wu et al. 2021) or provide a statement on how the Chan group finds a role for the RGG domain of FBL in keeping the certain canonical markers of the FC and DFC de-mixed.

      We thank the reviewer for pointing out these important references. The relevant citations, including Wu et al. (2021), have been added to the manuscript.

      Reviewer #3 (Recommendations for the Authors):

      The following comment is true but could be broadened to include examples of structured regions promoting biomolecular condensation. "In biological systems, phase separation is mainly a characteristic of multivalent or intrinsically disordered proteins (Banani et al, 2017; Shin & Brangwynne,2017; Uversky, 2019)."

      We have expanded the statement as recommended by the reviewer: "In biological systems, phase separation is facilitated by a combination of multivalent interactions mediated by intrinsically disordered proteins and site-specific interactions that drive percolation."

      Related to Figure 1.

      The authors report Treacle-dependent EU incorporation (Figure 1D), but are there any changes more broadly to nucleolar number or size as a consequence? How do the authors interpret that the quantitative effect of AMD treatment is more extreme than Treacle depletion (Figure 1E).

      We thank the reviewer for raising these important points. Regarding nucleolar number and morphology, we did not observe a change in the number of nucleoli upon Treacle depletion. However, nucleoli appeared more regularly rounded under these conditions, which we interpret as a consequence of the decreased rDNA transcription activity caused by Treacle depletion. A similar rounding of nucleoli is also observed upon actinomycin D (AMD) treatment, which is consistent with reduced transcriptional activity.

      As for the more pronounced effect of AMD compared to Treacle depletion on EU incorporation, this can be explained by the fundamentally different mechanisms through which these conditions affect transcription. Treacle depletion reduces the local concentration of transcription factors at rDNA sites, thereby impairing transcription initiation and elongation to a certain extent. However, under Treacle depletion, RNA polymerase I still retains the ability to bind to the promoter and support a residual level of transcription. In contrast, AMD acts as a potent intercalator in GC-rich regions of rDNA, physically blocking the ability of RNA polymerase I to move along rDNA, resulting in near-complete cessation of rRNA synthesis.

      Related to Figure 2.

      The authors observe that AMD leads to coalescence of individual Treacle-2S+ bodies (e.g. Figure 2E) - does this suggest that ongoing rRNA transcription is required to prevent such events?

      Thank you for your thoughtful question. Indeed, our observations strongly suggest that ongoing rRNA transcription is required to prevent the coalescence of Treacle-2S+ bodies, as observed upon AMD treatment. This interpretation aligns with the findings of Tetsuya Yamamoto et al., who demonstrated that nascent ribosomal RNA (pre-rRNA) acts as a surfactant to suppress the growth and fusion of fibrillar centers (FCs) in the nucleolus. Their work highlighted that nucleolar condensates formed via liquid-liquid phase separation (LLPS) tend to grow to minimize surface energy, provided sufficient components are available. However, the transcription of prerRNA stabilizes FCs by maintaining multiple microphases, preventing coalescence unless transcription is inhibited.

      According to Yamamoto et al., nascent pre-rRNAs tethered to FC surfaces by RNA Polymerase I generate lateral pressure that counteracts interfacial tensions, effectively suppressing FC fusion. This activity is analogous to the surfactant properties of molecules in physical systems. When transcription is inhibited (e.g., by AMD), the loss of nascent rRNA allows condensates to coalesce, consistent with the behavior we observe.

      We further propose that the AMD-induced coalescence of Treacle-2S+ bodies reflects the loss of this surfactant-like effect, as transcriptional activity ceases. This theory is also supported by the observation that Treacle condensates in the nucleoplasm, where rRNA transcription is absent, form larger structures. Collectively, these insights highlight the critical role of ongoing rRNA transcription in maintaining the structural integrity and dynamic organization of nucleolar substructures.

      Related to Figure 3.

      In the figure panels B-H the DAPI signal in gray obscures the Treacle localization, especially in Figure 3H. A non-merged image for each of these examples for the Treacle localization would be very helpful.

      We thank the reviewer for this observation. To address this, we have included wide-field images without the DAPI overlay for the deletion mutant lacking the 1121-1488 region. These are now presented in Supplementary Figure S5G of the revised manuscript.

      Related to Figure 5.

      Only a single representative nucleus is shown in the PLA analysis presented in Figure 5B.

      Quantification to assess the robustness of this response with the addition of VP16 is needed. The authors use ChIP and immunocytochemistry as orthogonal methods but it would be best to therefore show both for each manipulation that is performed - the immunostaining of TOPBP1 in the Treacle KD cells in S5A should be in the main Figure 5 to complement transformation of constructs as in Figure 5D.

      We appreciate the reviewer’s comment. To address this, we performed a quantitative analysis of PLA fluorescence signals in control and etoposide-treated cells, and the results are now presented in Supplementary Figure S8C. Additionally, as recommended, we have transferred the results of the immunocytochemistry of TOPBP1 in Treacle KD and Treacle KN cells to the main figure, now included as Figures 7D-E in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary.

      In this meticulously conducted study, the authors show that Drosophila epidermal cells can modulate escape responses to noxious mechanical stimuli. First, they show that activation of epidermal cells evokes many types of behaviors including escape responses. Subsequently, they demonstrate that most somatosensory neurons are activated by activation of epidermal cells, and that this activation has a prolonged effect on escape behavior. In vivo analyses indicate that epidermal cells are mechanosensitive and require stored-operated calcium channel Orai. Altogether, the authors conclude that epidermal cells are essential for nociceptive sensitivity and sensitization, serving as primary sensory noxious stimuli.

      Strengths.

      The manuscript is clearly written. The experiments are logical and complementary. They support the authors' main claim that epidermal cells are mechanosensitive and that epidermal mechanically evoked calcium responses require the stored-operated calcium channel Orai. Epidermal cells activate nociceptive sensory neurons as well as other somatosensory neurons in Drosophila larvae, and thereby prolong escape rolling evoked by mechanical noxious stimulation.

      Weaknesses.

      Core details are missing in the protocols, including the level of LED intensity used, which are necessary for other researchers to reproduce the experiments. For most experiments, the epidermal cells are activated for 60 s, which is long when considering that nocifensive rolling occurs on a timescale of milliseconds. It would be informative to know the shortest duration of epidermal cell activation that is sufficient for observing the behavioral phenotype (prolongation of escape behavior) and activation of sensory neurons.

      (1) We agree with the reviewer that the LED intensity is an important detail of the experimental paradigm. We updated the methods to include intensity measurements for the stimuli used throughout the manuscript.

      (2) The Reviewer asks about the shortest duration of epidermal cell activation sufficient for observing the behavior phenotype. We note in the manuscript that behavioral responses to optogenetic epidermal stimulation are apparent within 2 seconds of stimulus (see Figure 2F); this is consistent with our calcium imaging data in which C4da response reaches its maximum within 2-3 sec of stimulation.

      Reviewer #1 (Recommendations):

      (1) The epidermal cells in this study are activated for 60 s. In the real world, the nociceptive stimulation (a poke, such as penetration by the ovipositor of a parasitic wasp) that evokes escape rolling is short. Does optogenetic activation of 1 s or less still evoke rolling? For example, it is unclear in Figure 4K how long the epidermal cells need to be activated before the poke stimulus prolongs rolling. Is it possible to test behavior and GCaMP activity in sensory neurons when epidermal cells are briefly (1 second) activated?

      As described above, behavioral responses to optogenetic epidermal stimulation are apparent within 2 seconds of stimulus (see Figure 2F); this is consistent with our calcium imaging data in which C4da response reaches its maximum within 2-3 sec of stimulation. The kinetics are consistent with a role for epidermal cells in modulating neuronal responses to nocifensive stimuli, and similar to the response kinetics observed in mammalian epidermal cells that modulate neuronal touch and pain responses  (Maksimovic et al., 2014; Woo et al., 2014; Mikesell et al., 2022).

      (2) The protocol for optogenetic screening states that the authors used a 488-nm LED. Why was a 488-nm LED used instead of the 610-nm LED for Chrimson activation? No information (except figure 4K) about the light intensity is provided in the figure legend or the protocol section. Please state the LED intensity used for all optogenetic experiments (GCaMP imaging, behavioral experiments, etc.).

      We used 488 nm light for the initial screen for technical reasons. The screen was conducted by students at the MBL Neurobiology course (hence the affiliation; student authors are included in the manuscript), and the only LED available to us at that time delivered insufficient illumination at longer wavelenths to be useful. We chose to include the student’s data because (1) we found that the 488 nm light alone did not induce rolling in our setup, (2) we repeated and extended the studies with the epidermal drivers using a higher resolution imaging platform and longer wavelength stimulation (all studies other than Fig. 1), and (3) we observed qualitatively similar results when we repeated stimulation with all drivers using 561 nm light.

      We agree that the LED intensity is an important detail of the experimental paradigm. We updated the methods to include intensity measurements for the stimuli used throughout the manuscript. We also include the intensities here:

      - 30 μW/mm^2 for calcium imaging experiments Fig 3B-E, Fig 4A, Fig 3S1A-D, Fig 4S1A

      - 300 μW/mm^2 for behavior studies in Fig 2B-E, Fig 1S6, Fig 2S1, Fig 3E-F, Fig 3S2A-C

      - 25 μW/mm^2 for behavior studies in Fig 4E-J

      - 1.16 μW/mm^2 for behavior studies in Fig 4K

      (3) Lines 150 - 152: Although the authors refer to "a stereotyped behavior sequence" in Fig 2D, there are no data supporting this claim in Fig 2. Rather, the data appear to represent proportions of different types of behavior at each time point, rather than behavior sequences. If the authors wish to claim that the data show stereotyped behavior sequences, they should analyze the data using a different method (e.g., Markov models).

      We agree that in the absence of additional analysis we should avoid commenting on stereotypy of behavior sequences; we therefore adjusted the text to reflect the tendency of nociceptive behaviors to precede non-nociceptive behaviors. The raster plots shown in Supplemental Fig. 2A illustrate this point: in larvae exhibiting nociceptive behaviors, these behaviors appear first, followed by backing and frequently freezing. As one quantitative readout of this sequence we show that the latency of rolling (nociceptive) is shorter compared with backing or freezing (non-nociceptive) (Fig. 2F, Fig. S2G).

      (4) Figure 3A-E: a cursory glance at the data suggests that the most responsive sensory neurons are C1da, with all sensory neurons activated. However, at the behavioral level, only some sensory neurons are activated. If all sensory were activated by Chrimson, what behavioral phenotypes would the authors expect to see? Would it be the same as epidermal activation?

      The Reviewer raises an interesting question, but we intentionally avoid comparing the response properties among sensory neurons because of differences in driver strength. Likewise, extrapolating “activation” at the behavioral level is exceedingly difficult if/when multiple sensory neurons are simultaneously activated. In response to the Reviewer’s specific question, when all da neurons are activated simultaneously, larvae largely exhibited hunching rather than rolling (Hwang et al., 2007). We find that epidermal stimulation rarely elicits hunching; instead, epidermal stimulation generally triggers nocifensive behaviors followed by non-nocifensive behaviors such as backing and freezing, suggesting an order or priority in neurons activated by epidermal cells (or different response times). Defining the mechanisms by which epidermal cells communicate with different types of sensory neurons is therefore a top priority for future studies.

      (5) Figure 3S2; The behavior phenotypes between Fig. 3E, F and Fig 3S2 seems a slightly different. I suggest adding some comments in different behavior phenotype depending on the different GAL4. Specifically, is there increased freezing in some genotypes (e.g., ppk-LexA or NompC-lexA)? Can you show this without TNT data? Is this a background effect or specific GAL4 phenotype?

      We currently do not have the driver-only control for this experiment, but our effector-only control experiment (see Fig. 3S2A) suggests that larvae carrying the AOP-TNT insertion exhibit enhanced nociceptive behavioral responses. This point is addressed in our manuscript by the following (copied from the figure legend):

      “We note that although baseline rolling probability is elevated in all genetic backgrounds containing the AOP-LexA-TnT insertion, silencing C4da and C3da neurons significantly attenuates responses to epidermal stimulation.”

      (6) Calcium-free solution is used in Figure 3. Why do the authors still observe calcium influx? Does this mean that internal calcium stores are released? If so, does the calcium influx represent an action potential? How do the authors focus their LED stimulation to activate epidermal cells and avoid activation of the imaging laser?

      The specimens were imaged in calcium-free solution to minimize movement artifacts. However, the CNS is wrapped by glial cells and over short timescales such as those used for the imaging we speculate that extracellular calcium persists in the CNS.

      (7) It is unclear when animals begin to crawl after the epidermal cells are mechanically stimulated. How do the authors distinguish between peristaltic crawling and a poke by Orai receptors? Although the in vitro experiments beautifully show radial tensions, it is unclear to what extent A-P axis tension (peristaltic crawling) and radial tension (poke) differ. It might be helpful to explain in the discussion section how epidermal cells are selectively activated.

      The Reviewer raises an interesting question about the types and thresholds of forces required to elicit epidermal responses. We cannot eliminate the possibility that peristaltic crawling (or crawling through a 3D substrate) stimulates epidermal cells to a certain degree. Indeed, our results demonstrate a dose-dependent response of Drosophila epidermal cells and human keratinocytes to radial stretch. However, we do not have any information about selectivity in response to different stimuli, though we agree that this is an intriguing avenue for future studies. For example, we don't know whether stretch-responsive cells are more or less responsive to poke. But, a salient feature of our studies is the recruitment of greater numbers of responders with increasing stimulus intensity, therefore we added the following statement to the discussion to clarify our model:

      “Finally, we find that epidermal cells exhibit a dose-dependent response to radial stretch; we therefore anticipate that the output of epidermal cells is likewise dependent on the stimulus intensity.  Hence, rather than a fixed threshold beyond which epidermal cells are selectively activated, we hypothesize that increasing stimulus intensities drive increasing signal outputs to neurons.”

      (8) Some Protocols are missing. For example, in Figure 4, many stimulus combinations were used to test behavior. How were stimuli of different modalities applied to the animals? Further details need to be provided in the protocols.

      We thank the Reviewer for identifying this oversight. The methods section of our original submission detailed most of the stimulus combinations but omitted the opto + mechano combination (4F). We updated our methods to correct these omissions.

      (9) It might be helpful if the authors could provide a sample video for each behavior to clarify how they were each defined.

      Our manuscript includes a table with a detailed description of the behaviors (Table S2), and we added two annotated videos that show representative behavioral responses to optogenetic nociceptor or epidermis stimulation.

      (10) A supplementary summary table of genotypes might be helpful for the reader.

      Experimental genotypes are provided in the figure legends, and a detailed list of all alleles used in the study as well as their source is provided in supplemental table S1.

      Reviewer #2 (Public Review):

      Summary.

      The authors provide compelling evidence that stimulation of epidermal cells in Drosophila larvae results in the stimulation of sensory neurons that evoke a variety of behavioral responses. Further, the authors demonstrate that epidermal cells are inherently mechanoresponsive and implicate a role for store-operated calcium entry (mediated by Stim and Orai) in the communication to sensory neurons.

      Strengths.

      The study represents a significant advance in our understanding of mechanosensation. Multiple strengths are noted. First, the genetic analyses presented in the paper are thorough with appropriate consideration to potential confounds. Second, behavioral studies are complemented by sophisticated optogenetics and imaging studies. Third, identification of roles for store-operated calcium entry is intriguing. Lastly, conservation of these pathways in vertebrates raise the possibility that the described axis is also functional in vertebrates.

      Weaknesses.

      The study has a few conceptual weaknesses that are arguably minor. The involvement of store-operated calcium entry implicates ER calcium store release. Whether mechanical stimulation evokes ER calcium release in epidermal cells and how this might come about (e.g., which ER calcium channels, roles for calcium-induced calcium release etc.) remains unaddressed. On a related note, the kinetics of store-operated calcium entry is very distinct from that required for SV release. The link between SOC and epidermal cells-neuron transmission is not reconciled. Finally, it is not clear how optogenetic stimulation of epidermal cells results in the activation of SOC.

      (1) The involvement of store-operated calcium entry implicates ER calcium store release. Whether mechanical stimulation evokes ER calcium release in epidermal cells and how this might come about (e.g., which ER calcium channels, roles for calcium-induced calcium release etc.) remains unaddressed.

      Our studies suggest that mechanically evoked responses in epidermal cells involve both ER calcium release and store-operated calcium entry. Notably, we show that depletion of ER calcium stores before mechanical stimulation, by treating with thapsigargin, reduces (but does not eliminate) mechanically evoked calcium responses in fly epidermal cells (Fig. 6C-6F). Likewise, fly epidermal cells and human keratinocytes both exhibit mechanically evoked calcium responses in the absence of extracellular calcium (10mM EGTA to chelate all free calcium ions). These data support a model whereby mechanical stimuli trigger calcium release from ER stores and influx. Indeed, several cell types have been shown to display mechanically evoked release of calcium from stores. For example, mechanical stimulation of enteroendocrine cells of the gut epithelium results in both calcium release from ER stores and calcium influx across the plasma membrane (Knutson et al., 2023). Similar to our findings, Knutson et al found that depleting stores decreased mechanically evoked calcium signals by over 70% in these gut epithelial stores. In our revised manuscript we have more clearly emphasized these points.

      We agree with the reviewer that deciphering the mechanisms by which mechanical stimuli promote ER calcium release and subsequent store-operated calcium entry is an exciting topic to explore. One potential mechanism is the activation of a mechanosensitive receptor that promotes calcium release from the ER via calcium-induced calcium release or IP3 production, as has been proposed for enteroendocrine cells. A recent paper demonstrated that the ER itself is mechanosensitive and that mechanical stimuli promotes calcium release via the opening of calcium-permeable ion channels in the ER membrane (Song et al., 2024). Determining the relative contributions of store-operated calcium entry and ER calcium release and deciphering their underlying mechanisms will require a thorough investigation of ER calcium channels and receptors, thus we believe this would be beyond the scope of the present manuscript and merits publication on its own. However, we now include this in our discussion as an exciting new direction we aim to pursue.

      (2) The kinetics of store-operated calcium entry is very distinct from that required for SV release. The link between SOC and epidermal cells-neuron transmission is not reconciled.

      The Reviewer raises an interesting point regarding the mode of epidermal cell-neuronal communication. We demonstrated a requirement for dynamin-dependent vesicle release from epidermal cells in mechanical sensitization. However, the nature of the vesicular pool, the mode and kinetics of release, and the type of neuromodulator released remain to be characterized. Hence, it’s not clear that kinetics of synaptic vesicle release is an appropriate comparison. Our studies do demonstrate that behavioral responses to optogenetic epidermal stimulation are relatively slow – on the order of seconds – which is not incompatible with the kinetics of store-operated calcium entry. Furthermore, the primary functional output we define for epidermal mechanosensory responses, mechanical nociceptive sensitization, is apparent 10 sec following the stimulus and persists for minutes in our behavior assays. Consistent with this model, studies of the mammalian touch dome have shown that touch-sensitive Merkel cells secrete neurotransmitters to modulate neurons and promote sustained action potential firing on a similar timescale. Likewise, mechanically evoked ER calcium-release promotes sustained secretion of serotonin from enterochromaffin cells.

      (3) It is not clear how optogenetic stimulation of epidermal cells results in the activation of SOC.

      We appreciate the opportunity to clarify our results. We demonstrate that optogenetic epidermal stimulation elicits behavioral responses in larvae and calcium responses in somatosensory neurons, but we do not claim that optogenetic epidermal stimulation elicits SOC. Our optogenetic studies demonstrate the capacity for epidermal stimulation to modulate somatosensory function, but we characterize contributions of SOC only to mechanical stimuli which are more physiologically relevant. However, it is worth noting that CsChrimson is a calcium-permeable channel, suggesting that an increase in intracellular calcium may trigger epidermal-evoked neuronal responses and behaviors during optogenetic stimulation.

      References

      Hwang, RY, Zhong, L, Xu, Y, Johnson, T, Zhang, F, Deisseroth, K, and Tracey, WD (2007). Nociceptive neurons protect Drosophila larvae from parasitoid wasps. Curr Biol 17, 2105–2116.

      Knutson, KR, Whiteman, ST, Alcaino, C, Mercado-Perez, A, Finholm, I, Serlin, HK, Bellampalli, SS, Linden, DR, Farrugia, G, and Beyder, A (2023). Intestinal enteroendocrine cells rely on ryanodine and IP3 calcium store receptors for mechanotransduction. J Physiol 601, 287–305.

      Maksimovic, S, Nakatani, M, Baba, Y, Nelson, AM, Marshall, KL, Wellnitz, SA, Firozi, P, Woo, S-H, Ranade, S, Patapoutian, A, et al. (2014). Epidermal Merkel cells are mechanosensory cells that tune mammalian touch receptors. Nature 509, 617–621.

      Mikesell, AR, Isaeva, O, Moehring, F, Sadler, KE, Menzel, AD, and Stucky, CL (2022). Keratinocyte PIEZO1 modulates cutaneous mechanosensation. Elife 11, e65987.

      Song, Y, Zhao, Z, Xu, L, Huang, P, Gao, J, Li, J, Wang, X, Zhou, Y, Wang, J, Zhao, W, et al. (2024). Using an ER-specific optogenetic mechanostimulator to understand the mechanosensitivity of the endoplasmic reticulum. Dev Cell 59, 1396-1409.e5.

      Woo, S-H, Ranade, S, Weyer, AD, Dubin, AE, Baba, Y, Qiu, Z, Petrus, M, Miyamoto, T, Reddy, K, Lumpkin, EA, et al. (2014). Piezo2 is required for Merkel-cell mechanotransduction. Nature 509, 622–626.

    1. Author response:

      We appreciate the reviewers’ constructive comments and suggestions. We plan the following revisions to address the public reviews.

      Regarding model selection (from Reviewers 1 and 3)

      We will test whether the latent cause model has a better explanatory power for the observed reinstatement data compared with at least two other models, including the Rescorla-Wagner model. For each model, the prediction errors across all trials and those in the test 3 trial (reinstatement) will be calculated for individual animals. The explanatory power of the models will be discussed based on these results. 

      Regarding model validation (from Reviewers 1, 2, and 3)

      We acknowledge the reviewers’ concerns about potential parameter overfitting and misinterpretation. First, the simulation in the latent cause model will be run under other possible conditions to test whether our original condition can be justified, then clarify how certain parameters affect the predicted CR. Second, we will confirm if the prediction errors are comparable between experimental groups, present the correlation between parameters, and discuss this result in the revision. 

      To evaluate the effect of context in explaining reinstatement in the latent cause model, simulations of CR in test 3 when only context or tone is presented will also be performed and discussed with the behavioral data.

      Regarding the interpretation of the behavioral data (from Reviewers 1, 2, and 3) We will clarify our interpretation of the behavioral data by incorporating the additional analyses mentioned above; for example, to clarify the contribution of context in test 3, we will provide data on the CR before the tone presentation in our revision. In addition, how we expected and interpreted the reversal Barnes maze results from the memory modification characteristics estimated in the reinstatement test will be further discussed.

      Regarding the application of the latent cause model to the reversal Barnes maze task (from Reviewers 1, 2)

      We acknowledge the reviewers’ suggestions to apply the latent cause model to our Barnes maze results to strengthen the link and consistency. To further clarify the reason for including Barnes maze results, we will explicitly discuss how associative learning is involved in spatial learning in the revision. However, we will not be able to directly apply the latent cause model for the Barnes maze data for the following reasons. As we noted in the Results and Discussion, the latent cause model was built on associative learning and cannot be directly applied to the Barnes maze data. The cognitive processes in the Barnes maze task involve maintaining spatial representation of the environment, integrating own position and expected goal, and evaluating potential actions. Importantly, the chosen actions in this task directly affect subsequent observations, while an animal’s response based on an expected outcome typically does not alter future observation in a simple associative learning paradigm. 

      Thus, although associative learning (e.g., associations between the spatial cue and the location of the escape box) is certainly a critical building block and contributes to performance in the Barnes maze task, this mechanism alone cannot fully explain the animal’s navigation in the maze. We agree that having solid modeling results in the reversal Barnes maze task is an important direction, but extending the latent cause model for this purpose is beyond the scope of this study. We have suggested some possible approaches in the Discussion and will elaborate further on these conceptual distinctions and how latent cause framework assists in the interpretation of results.

    1. Author response:

      We thank the reviewers for their insightful feedback. Incorporating their recommendations will greatly enhance our manuscript for resubmission. Based on the review, it seems a major challenge to the interpretation of our study surrounds whether locomotion, itself, is responsible for increased ACC activity during our task. This was a shared concern for us during our analysis. We included data in our initial submission hoping to address these concerns. Specifically, we show that post-action activity outlasts movement termination, in most cases, on the order seconds after termination (Supplementary Fig 2). Likewise, post-action activity is not tied to shuttle initiations as ACC activity onset can vary greatly before and after initiation (Supplementary Fig 2). Lastly, the unique nature of action content neurons further supports a distinction from locomotor activity. They selectively fire for specific directions and, as a result, do not fire during movement in opposite directions. Despite these findings, we agree with reviews that inclusion of additional analyses, such as examining firing rates in respect to locomotion speed and acceleration/deceleration, will greatly strengthen our claim of ACC’s role in post-action activity. In our resubmission, we will seek to perform such an analysis, among others, to elucidate completely the role of locomotion in ACC post-action activity.

      Reviewers also pointed out an overall lack of details surrounding our task, analysis, statistical methods and experimental approaches. We will consider all the recommendations from the reviewers and integrate them into our resubmission to provide more detailed information. Notably, we will adjust our approach in describing our task. Reviewers discussed some criticism regarding the perceived novelty of the task as it shares many similarities with previous discrimination-avoidance tasks. The distinction with our task is regarding the nuance of how the meaning (safety vs shock) of the context and sensory stimuli dynamically changes based on the current environment (context x sound). This requires not only the discrimination of contextual and sensory stimuli but also the inter-modal integration of stimuli, which varies throughout the task. Sound A/B leads to different outcomes depending on the context, and similarly, the meaning of the context shifts in a sound-dependent manner.

      Lastly, in our follow-up submission we will work to include more robust analyses to utilize our temporal sensitivity of our recordings. We also will provide greater clarity on how each individual animal contributes to our overall findings. To conclude, we would like to once again thank our reviewers for their feedback and evaluation of our manuscript. We look forward to making the necessary adjustments for our future submission.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors tackled the public concern about E-cigarettes among young adults by examining the lung immune environment in mice using single-cell RNA sequencing, discovering a subset of Ly6G- neutrophils with reduced IL-1 activity and increased CD8 T cells following exposure to tobacco-flavored e-cigarettes. Preliminary serum cotinine (nicotine metabolite) measurements validated the effective exposure to fruit, menthol, and tobacco-flavored e-cigarettes with air and PG/VG serving as control groups. They also highlighted the significance of metal leaching, which fluctuated over different exposure durations to flavored e-cigarettes, underscoring the inherent risks posed by these products. The scRNAseq analysis of e-cig exposure to flavors and tobacco demonstrated the most notable differences in the myeloid and lymphoid immune cell populations. Differentially expressed genes (DEGs) were identified for each group and compared against the air control. Further sub-clustering revealed a flavor-specific rise in Ly6G- neutrophils and heightened activation of cytotoxic T cells in response to tobacco-flavored e-cigarettes. These effects varied by sex, indicating that immune changes linked to e-cig use are dependent on gender. By analyzing the expression of various genes and employing gene ontology and gene enrichment analysis, they identified key pathways involved in this immune dysregulation resulting from flavor exposure. Overall, this study affirmed that e-cigarette exposure can suppress the neutrophil-mediated immune response, subsequently enhancing T cell toxicity in the lung tissue of mice.

      Strengths:

      This study used single-cell RNA sequencing to comprehensively analyze the impact of e-cigarettes on the lung. The study pinpointed alterations in immune cell populations and identified differentially expressed genes and pathways that are disrupted following e-cigarette exposure. The manuscript is well written, the hypothesis is clear, the experiments are logically designed with proper control groups, and the data is thoroughly analyzed and presented in an easily interpretable manner. Overall, this study suggested novel mechanisms by which e-cigs impact lung immunity and created a dataset that could benefit the lung immunity field.

      We thank the reviewer for identifying the strengths of our work.

      Weaknesses:

      The authors included a valuable control group - the PG/VG group, since PG/VG is the foundation of the e-liquid formulation. However, most of the comparative analyses use the air group as the control. Further analysis comparing the air group to the PG/VG group, and the PG/VG group to the individual flavored e-cig groups will provide more clear insights into the true source of irritation. This is done for a few analyses but not consistently throughout the paper. Flavor-specific effects should be discussed in greater detail. For example, Figure 1E shows that the Fruit flavor group exhibits more severe histological pathology, but similar effects were not corroborated by the single-cell data.

      We thank the reviewer for this query. We agree that PG/VG group is the foundation of the e-liquid formulation and hence comparisons with this group is of significance to understand the effect of individual flavors on the cell population. Though we compared the flavored e-cig groups with PG/VG group, we did not discuss it in detail within the manuscript to avoid confusions in interpretation for such a big dataset. However, we will include the comparisons with the PG/VG group as a Supplement File in our revised manuscript to facilitate proper interpretation of our omics data to interested readers.

      While we agree that flavor-specific effects might be of interest, we did not delve into exploring them in detail as the fruit flavored e-liquids have now been regulated for sale in the US. Thus, from regulatory point of view, the effects of tobacco- and menthol-flavored e-liquids hold most interest. Since at the time of conducting this study, fruit flavors were in the market, we have still included the data. However, studying it further was not the focus of this work. Nevertheless, interested readers of our manuscript can have access to our dataset to allow further analyses and interpretation of our results.

      The characterization of Ly6g+ vs Ly6g- neutrophils is interesting and potentially very impactful. Key results like this from scRNAseq analyses should be validated by qPCR and flow cytometry.

      Also, a recent study by Ruscitti et al reported Ly6g+ macrophages in the lung which can potentially confound the cell type analysis. A more detailed marker gene and sub-population analysis of the myeloid clusters could rule out this potential confounding factor.

      We agree with the reviewer that the loss of Ly6G on neutrophils is a very interesting find and we are in process of designing neutrophil specific experiments to study the impact of e-cig exposure on neutrophil maturation and function which will be discussed in subsequent work by our group. However, to address the concerns raised by the reviewer, we are staining the lung tissue samples from air-and differently flavored e-cig aerosol exposed mouse lungs with Ly6G and S100A8 (universal marker for neutrophil) to see the infiltration of Ly6g+ vs Ly6g- neutrophils within the lungs of exposed and unexposed mice. This would also address the question if these populations were neutrophils or belong to another myeloid origin as suggested by recent publications. We will share the results from our findings in the revised manuscript and update our interpretations accordingly with better validations.

      Reviewer #2 (Public review):

      This study provides some interesting observations on how different flavors of e-cigarettes can affect lung immunology, however there are numerous flaws including a low number of replicates and a lack of effective validation methods which reduces the robustness and rigor of the findings.

      Strengths:

      The strength of the study is the successful scRNA-seq experiment which gives good preliminary data that can be used to create new hypotheses in this area.

      We appreciate the reviewer for recognizing the strength of this work.

      Weaknesses:

      The major weakness is the low number of replicates and the limited analysis methods. Two biological n per group is not acceptable to base any solid conclusions. Any validatory data was too little (only cell % data) and did not always support the findings (e.g. Figure 4D does not match 4C). Often n seems to be combined and only one data point is shown, it is not at all clear how the groups were analyzed and how many cells in each group were compared.

      We thank the reviewer for the critique to allow us to improve our analyses. We understand that the low number of replicates in this work makes the analyses difficult to draw solid conclusions, but this was a pilot study to understand the changes in the mouse lung upon acute exposures to flavored e-cig aerosols at a single cell level. So far, the e-cig field has been primarily focused on conducting toxicological studies to help regulatory bodies to set standards and enforce laws to better regulate the manufacture, sale and distribution of e-cig products. However, adolescents and young adults are still getting access to these products, and there is little to no understanding of how this may affect the lung health upon acute and chronic exposures. Single cell technology is a powerful tool to analyze the gene expression changes within cell populations to study cell heterogeneity and function. Yet, it is a costly tool, owing to which, conducting such analyses on large sample sizes is not ideal. This pilot study was designed to get some initial leads for future studies involving larger sample sizes and chronic exposures. Further, we still intend to share our results with the scientific community due to the value of such a dataset for a wider audience interested in learning about the mechanistic underpinnings of e-cig exposures in vivo.

      We understand that the validations are limited in our current work and so we are in process of conducting some immunostaining to validate a few targets made through this work. We also want to add here that validating single cell findings using any of the classical methods of experimentation including ELISA, qPCR or flow cytometry is sometimes difficult as many of these techniques still investigate the tissue while the changes shown in single cell analyses are mainly pertaining to a single cell type. This could be a probable reason for the scRNA seq results not aligning with our findings from flow cytometry. The data/findings from this pilot study have now allowed us to be better informed to design an effective flow panel for our future studies. In terms of the statistics and the number of cells for each analysis, we will share the detailed account and information for each to allow better interpretation of our results.

      Only 71,725 cells means only 7,172 per group, which is 3,586 per animal - how many of these were neutrophils, T-cells, and macrophages? This was not shown and could be too low.

      We do agree that the number of cells could be too low, but to avoid this we never studied the gene expression variations at the finest level of cell identity. We classified the cell clusters into general annotations -myeloid, lymphoid, endothelial, stromal and epithelial- and identified the changes in the gene expressions. Of these, only two clusters (myeloid and lymphoid) with more than ~1000 cells per cell type per group were studied in detail. We will include the cell count information to allow better interpretation of our results in the revised manuscript.

      The dynamic range of RNA measurement using scRNAseq is known to be limited - how do we know whether genes are not expressed or just didn't hit detection? This links into the Ly6G negative neutrophil comment, but in general, the lack of gene expression in this kind of data should be viewed with caution, especially with a low n number and few cells.

      This is a well-made point, and we thank the reviewer for this comment. We agree that the dynamic range RNA measurement is limited and for low cell numbers that could lead to bias. We are in process of validating the findings regarding the presence of Ly6G+ and Ly6G- cells in our control and treated lungs, the outcome of which will be discussed in the revised manuscript. We will also provide the cell number for the Ly6G- cell cluster for each sample with more detailed discussion of our findings. Due to the small sample size and cell capture, few limitations are hard to overcome which will be further elaborated upon in our revisions.

      There is no rigorous quantification of Ly6G+ and Ly6G- cells in the flow cytometry data.

      We understand that flow-based quantification of our scRNA seq findings would be interesting. However, flow cytometry and single cell suspension to perform sequencing were performed parallelly for this study. We used a basic flow panel using single markers to identify individual immune cell type. We did identify changes in the Ly6G population in our treated and control samples using scRNA seq and intend to include it as a marker for our future studies using flow cytometry. But unfortunately, the same analyses could not be performed for the current batch of samples. We will still include results from IHC staining to identify the Ly6G+ and Ly6G- population in the lung tissues from control and treated mice in revised manuscript to address some of the concerns raised here.

      Eosinophils are heavily involved in lung biology but are missing from the analysis.

      We used RBC lysis buffer to remove the excess RBCs during lung digestion for preparation of single cell suspension for scRNA seq in this study. Reports suggest that RBC lysis could adversely affect the eosinophil number and function. We did not identify any cell cluster, representing markers for eosinophils through our scRNA seq data and we believe that our lung digestion protocol could be the reason for the same. We have studied the eosinophil number changes through flow cytometry in these samples and have found significant changes as well. However due to our inability to find cell clusters for eosinophil through scRNA seq data, we did not include these results in the final manuscript. To avoid confusions and maintain transparency we will include our results from flow cytometry experiments in the revised manuscript.

      The figures had no titles so were difficult to navigate.

      We will make necessary adjustments to the data representation and include the titles to enable easy navigation of the Figures.

      PG/VG is not defined and not introduced early enough.

      We agree that PG/VG is an important control to compare in e-cig studies. This was the reason why this group was included, and we performed comparisons with this group for scRNA seq studies as well. However, to reduce the complexity of the study, we only shared the comparisons with Air control in this manuscript. We will include the comparisons made with PG/VG group as a Supplementary File in the revised manuscript to allow the interested readers have access to the study results and make necessary interpretations for future research.

      Neutrophils are not well known to proliferate, so any claims about proliferation need to be accompanied by validation such as BrdU or other proliferation assays.

      We thank the reviewer for this suggestion; however, we cannot perform the BrDU or other proliferation assay on neutrophils for now. We are planning to include these in the study designs of our future work, however we have limitations of funds to continue further experimentation to support this claim for this study. We mention clearly that this is only a scRNA seq finding and requires further study to avoid over-interpretation of our results.

      It was not clear how statistics were chosen and why Table S2 had a good comparison (two-way ANOVA with gender as a variable) but this was not used for other data particularly when looking at more functional RNA markers (Table S2 also lacks the interaction statistic which is most useful here).

      We thank the reviewer for bringing this concern. We understand that this is a valid point and will include all the necessary information regarding the statistics and other related parameters in the revised manuscript.

      Many statistics are only vs air control, but it would be more useful as a flavor comparison to see these vs PG/VG. In some cases, the carrier PG/VG looks worse than some of the flavors (which have nicotine).

      We will include the comparisons with PG/VG as supplementary file in our revised manuscript, however we do not intend to describe all those changes in detail in the main manuscript.

      The n number is a large issue, but in Figures such as 4, 6, and 7 it could be a bigger factor. The number of significant genes identified has been determined by chance rather than any real difference, e.g. Is Il1b not identified in Fruit flavor vs air because there wasn't enough n, while in Air vs Tobacco, it randomly hit the significance mark. This is but an example of the problems with the analysis and conclusions.

      While we agree in part with the concern raised here, we wish to point out that there are limitations to every experiment. In our opinion, an omics study is not necessarily aimed to find the changes at transcript level with absolute certainty, rather to identify probable cell and gene targets to validate with subsequent work. We never claim that our findings are absolute outcomes but rather add the limitation of sample number and need for further research at every step. The strength of this work is to be the first study of its kind looking at changes in the lung cell population at single cell level upon e-cig aerosol exposure. This study has provided us with interesting gene and cell targets that we are now validating with future work. We still strongly believe that a dataset like this is a useful resource for a wider audience to allow efficient study designs and hence it is befitting to be published and discussed amongst our peers.

      The data in Figure 7A is confusing, if this is a comparison to air, then why does air vs air not equal 1? Even if this was the comparison to the average of air between males and females, then this doesn't explain why CCL12 is >1 in both. Is this z-score instead? Regardless the data is difficult to interpret in this format.

      We thank the reviewer for pointing this out. We realize that the data might be difficult to understand due to scaling of the color codes for the heatmap. We will change the graphical representation and include actual number for fold change in our revised manuscript to allow easy interpretation of these results.

      Individual n was not shown for almost all experiments - e.g. Figure 1D - what is this representative of? Figure 2D - is this bulk-grouped data for all cells and all mice? The heatmaps are also pooled from 2n and don't show the variability.

      While we have included a pictorial representation of the n number in Figure 1A and mentioned n number in the Figure legends for each figure, we understand that it maybe difficult to navigate. We will attempt to address this in a better manner in the revised manuscript.

      However, with respect to the second comment we would like to differ from the reviewer’s opinion. Each scRNA seq data had 2 samples – one for male and another for female which has been clearly shown in the current figures. The pooling of cells as mentioned in the comment happened at the stage of preparation of cell suspension from each sex/group at the start of the sequencing. We do not have any means to show the variability amongst pooled samples, which we acknowledge as a shortcoming of our work. So, in terms of representation of the heatmaps and data analyses we have included all the needed information to uphold transparency of our study design and data visualization for each figure and would like to stick to the current representations.

      Reviewer #3 (Public review):

      This work aims to establish cell-type specific changes in gene expression upon exposure to different flavors of commercial e-cigarette aerosols compared to control or vehicle. Kaur et al. conclude that immune cells are most affected, with the greatest dysregulation found in myeloid cells exposed to tobacco-flavored e-cigs and lymphoid cells exposed to fruit-flavored e-cigs. The up-and-down-regulated genes are heavily associated with innate immune response. The authors suggest that a Ly6G-deficient subset of neutrophils is found to be increased in abundance for the treatment groups, while gene expression remains consistent, which could indicate impaired function. Increased expression of CD4+ and CD8+ T cells along with their associated markers for proliferation and cytotoxicity is thought to be a result of activation following this decline in neutrophil-mediated immune response.

      Strengths:

      (1) Single-cell sequencing data can be very valuable in identifying potential health risks and clinical pathologies of lung conditions associated with e-cigarettes considering they are still relatively new.

      (2) Not many studies have been performed on cell-type specific differential gene expression following exposure to e-cig aerosols.

      (3) The assays performed address several factors of e-cig exposure such as metal concentration in the liquid and condensate, coil composition, cotinine/nicotine levels in serum and the product itself, cell types affected, which genes are up- or down-regulated and what pathways they control.

      (4) Considerations were made to ensure clinical relevance such as selecting mice whose ages corresponded with human adolescents so that the data collected was relevant.

      We thank the reviewer for identifying the key strengths of our work and listing it in a concise and well-rounded fashion.

      Weaknesses:

      The exposure period of 1 hour a day for 5 days is not representative of chronic use and this time point may be too short to see a full response in all cell types. The experimental design is not well-supported based on the literature available for similar mouse models.

      This study was not designed to study the effects of chronic exposures on lung tissues. We were interested in delineating the effect of acute exposures for which the proposed study design was chosen. Previous work by our group has performed similar exposures and has been well received by the community. We understand that chronic exposures will be interesting to look at, however that was not the purpose of this pilot study. We will now explicitly mention this aspect in the revised manuscript.

      Several claims lack supporting evidence or use data that is not statistically significant. In particular, there were no statistical analyses to compare results across sex, so conclusions stating there is a sex bias for things like Ly6G+ neutrophil percentage by condition are observational.

      We thank the reviewer for this observation, and we will include the necessary validations and details of the sex-based statistical analyses in the revised version of this manuscript.

      Statistical analyses lack rigor and are not always displayed with the most appropriate graphical representation.

      We thank the reviewer and will include all the necessary statistical details with more details in the revised manuscript.

      Overall, the paper and its discussion are relatively limited and do not delve into the significance of the findings or how they fit into the bigger picture of the field.

      We are in process of performing a few validatory experiments and intend to include few other pieces of data to this manuscript to add to the overall merit of our findings. However as pointed out by the reviewer themselves the strength of this work is in the first ever scRNA seq analyses of mouse exposed to differently flavored e-cig aerosols in vivo. We also show cell-specific differential gene expression and address some of the major queries made around e-cig research including release of metals on a day-to-day basis from the same coil. The limited sample number make it difficult to draw solid conclusions from this work, which has been discussed as a shortcoming. However the major strength of this work is not in identifying specific trends but rather to explore the possible cell and gene targets to expand the study for longer (chronic) exposures with a larger sample group.

      The manuscript lacks validation of findings in tissue by other methods such as staining.

      We are conducting some studies and will include the validatory experiments and staining in the revised manuscript to support our findings.

      This paper provides a foundation for follow-up experiments that take a closer look at the effects of e-cig exposure on innate immunity. There is still room to elaborate on the differential gene expression within and between various cell types.

      We thank the reviewer for this observation. The cell numbers for some cell clusters (especially epithelial cells) were too low. So, though we have performed the differential gene expression analyses on all the cell clusters, we refrained from discussing it in the manuscript to avoid over interpretation of our results. Only clusters with high enough (~1000) cells per sex per group were used to plot the heatmaps. We will also include the cell numbers for each cell type in the revisions to allow better interpretation of our data. Furthermore, the raw data from this study will be freely available to the public upon publication of this manuscript. This would enable the interested readers to access the raw data and study the cell types of interest in detail based on their study requirements. This data will be a useful resource for all in this community to inform and design future studies.

    1. Author response:

      Reviewer #1:

      A) The presentation of the paper must be strengthened. Inconsistencies, mislabelling, duplicated text, typos, and inappropriate colour code should be changed.

      We will revise the manuscript to correct the abovementioned issues.

      B) Some claims are not supported by the data. For example, the sentence that says that "adolescent mice showed lower discrimination performance than adults (l.22) should be rewritten, as the data does not show that for the easy task (Figure 1F and Figure 1H).

      We will carefully review, verify claims, and correct conclusions where needed.

      C) In Figure 7 for example, are the quantified properties not distinct across primary and secondary areas?

      We will analyse the data in Figure 7 separately for AUDp and secondary auditory cortices to test regional differences. Additionally, we will provide a table summarizing key neuronal firing properties for each area during passive recordings to clarify how activity varies across cortical subregions and developmental stages.

      D) Some analysis interpretations should be more cautious. (..) A lower lick rate in general could reflect a weaker ability to withhold licking- as indicated on l.164, but also so many other things, like a lower frustration threshold, lower satiation, more energy, etc).

      We will address issues around lick bias including alternative explanations, such as differences in motivation or impulsivity.

      Reviewer #2:

      A) For some of the analyses that the authors conducted it is unclear what the rationale behind them is and, consequently, what conclusion we can draw from them.

      We will edit the discussion and clarify these points. In addition, we will adjust and extend the methodology section to clarify the rationale of our analysis.

      B) The results of the optogenetic manipulation, while very interesting, warrant a more in-depth discussion.

      We agree that the effects observed in our optogenetic manipulation warrant further discussion. We will extend on the analysis and discussion of ACx silencing.

      Reviewer #3:

      A) One fact that could help shed light on this would be to know how often the animals licked the spout in between trials. Finally, for the head-fixed version of the task, only d' values are reported. Without the corresponding hit and false alarm rates (and frequency of licking in the intertrial interval), it's hard to know what exactly the animals were doing.

      We recognize the need for a more nuanced analysis for the head-fixed version of the task. We will extend the behavioral analysis and provide more details to clarify these points.

      B) There are some instances where the citations provided do not support the preceding claim. For example, in lines 64-66, the authors highlight the fact that the critical period for pure tone processing in the auditory cortex closes relatively early (by ~P15). However, one of the references cited (ref 14) used FM sweeps, not pure tones, and even provided evidence that the critical period for this more complex stimulus occurred later in development (P31-38). Similarly, on lines 72-74, the authors state that "ACx neurons in adolescents exhibit high neuronal variability and lower tone sensitivity as compared to adults." The reference cited here (ref 4) used AM noise with a broadband carrier, not tones.

      We appreciate the reviewer pointing out instances where our citations may not fully support our claims. We will carefully review the relevant citations and revise them to ensure they accurately reflect the findings of the cited studies. We will update references in lines 64–66 and 72–74 to better align with the specific stimulus types and developmental timelines discussed.

      C) Given that the authors report that neuronal firing properties differ across auditory cortical subregions (as many others have previously reported), why did the authors choose to pool neurons indiscriminately across so many different brain regions?

      We agree that pooling neurons from multiple auditory cortical regions could potentially obscure region-specific differences. However, we addressed this concern by analyzing regional differences in neuronal firing properties, as shown in Supplementary Figures S4-1 and S4-2, and Supplementary Tables 2 and 3. Additionally, we examined stimulus-related and choice-related activity across regions and found no significant differences, as presented in Supplementary Figure S4-3. Please see our response to Reviewer 1, where we further elaborate on this point.

      D) And why did they focus on layers 5/6? (Is there some reason to think that age-related differences would be more pronounced in the output layers of the auditory cortex than in other layers?)

      We acknowledge that other cortical layers are also of interest and may contribute differently to auditory processing across development. Our focus on layers 5/6 was motivated by both methodological considerations and biological relevance. These layers contain many of the principal output neurons of the auditory cortex, and are therefore well positioned to influence downstream decision-making circuits. We will clarify this rationale in the revised manuscript and note the limitations of our approach.

    1. Author response:

      Reviewer #1 (Public Review):

      The work of Umetani et al. monitors the death of about 100,000 cells caused by lethal antibiotic treatments in a microfluidic device. They observe that the surviving bacteria are either in a dormant or in a non-dormant state prior to the antibiotic treatment. They then study the relative abundances of these different persister cells when varying the physiological state of the culture. In agreement with previous observations, they observe that late stationary phase cultures harbor a high number of dormant persister cells and that this number goes down as the culture is more exponential but remains non-zero, suggesting that cultures at the exponential phase contain different types of persister bacteria. These results were qualitatively similar in a rich and poor medium. Further characterization of the growing persister bacteria shows that they often form Lforms, have low RpoS-mcherry expression levels and grow only slightly more slowly than the non-persister bacteria. Taken together, these results draw a detailed view of persister bacteria and the way they may survive extensive antibiotic treatments. However, in order to represent a substantial advance on previous knowledge, a deeper analysis of the persister bacteria should be done.

      We thank the reviewer for suggesting the addition of more detailed analyses of persister cells. As we wrote in our response to Essential Revision 1, we now include a new section titled “Response of growing persisters to Amp exposure is heterogeneous” (Page 11-12) and present the results of the detailed analyses of single-cell dynamics of growth and cell morphology over the course of the pre-exposure, exposure, and post-exposure periods (Fig. 2D and H, Fig. 4B and D, Fig. 4 – figure supplement 1 and 2, Fig. 5B and D, Fig. 5 – figure supplement 1, Fig. 8B and D, and Figure 8 – figure supplement 1). The new results characterize differential responses to Amp treatment among growing persister cells (Fig. 4A-D, Fig. 4 – figure supplement 1, Fig. 4 – figure supplement 2A, Fig. 5A-D, and Fig. 5 – figure supplement 1), comparable division rates of MG1655 between non-surviving cells and persister cells growing prior to antibiotic treatments (Fig. 4E and Fig. 8E), except for the post-exponential phase cell populations of MF1 to Amp treatment in the LB medium and the post-exponential phase cell populations of MG1655 to Amp treatment in the M9 medium (Fig. 4 – figure supplement 2B and Fig. 5E) and the presence of persister cells to CPFX that avoid filamentation after the treatment (Fig. 8C and D, and Fig. 8 – figure supplement 1). We believe that these new analyses would provide new insights into the diverse dynamics and survival modes of antibiotic persistence at the single-cell level and represent important contributions to the field.

      Reviewer #2 (Public Review):

      The main question asked by Umenati et al. is whether persister cells to ampicillin arise preferentially from dormant, non-dividing cells or from cells that are actively growing before antibiotic exposure. The authors tracked persister cells generated from populations at different growth phases and culture media using a microfluidic device coupled to fluorescence microscopy, which is a challenge due to the low frequency of these persister cells. One of the main conclusions is that the majority of persisters arising in exponentially-growing populations originated from actively-dividing cells before the antibiotic treatment, reinforcing the idea that dormancy is not a prerequisite for persister formation. The authors made use of a fluorescent reporter monitoring RpoS activity (RpoS-mCherry fusion) and observed that RpoS levels in these persister cells were low. In the few lineages that exhibited no growth before the ampicillin treatment, RpoS levels were low as well, indicating that RpoS is not a predictive marker for persistence. By performing the same experiment with early and late stationary phase cultures, the authors observed that the proportion of persister cells that originated from dormant cells before the ampicillin treatment is significantly increased under these conditions. In the late stationary phase condition, dormant cells were expressing high levels of RpoS. The authors suggested that RpoS-mCherry proteins form aggregates which were suggested by the authors to be a characteristic of 'deep dormancy'. These cells were mostly unable to restart growth after the antibiotic removal while others with the lowest levels of RpoS tended to be persister. Confirming that these cells indeed contain protein aggregates as well as determining the physiological state of these cells appears to be crucial.

      We thank reviewer #2 for pointing out the critical issue with the RpoS-mCherry fusion that we used to quantify RpoS expression levels in single cells in the original manuscript. As explained in our reply to the comments below, we performed a suggested experiment and confirmed that the RpoS function was impaired by tagging it with mCherry. To resolve this issue, we repeated almost all the experiments using the wild-type strain MG1655 and confirmed the reproducibility of the main results (Fig. 3, Fig. 3 – figure supplement 1, and Fig. 7). Due to this change of the main strain used in this study, we removed the results on the correlation between RpoS expression and the persistence trait in the revised manuscript because it may not reflect the relationship of intact RpoS. However, we decided to still keep and show some of the results with the MF1 strain, such as the population killing curves and the survival mode analyses, because they also provide insight into the role of RpoS in antibiotic persistence. In particular, we found both beneficial and detrimental effects of RpoS on antibiotic persistence, depending on culture conditions and duration of antibiotic treatment (Fig. 1 – figure supplement 3 and Fig. 6 – figure supplement 1). Therefore, we have included these results and related discussions in the revised manuscript.

      Reviewer #3 (Public Review):

      In their manuscript, Umetani, et al. address the question of the origin of persister bacteria using single-cell approaches. Persistence refers to a physiological state where bacteria are less sensitive to antibiotherapy, although they have not acquired a resistance mutation; importantly, the concept of persistence has been refined in the past decade to distinguish it from tolerance where bacteria are only transiently insensitive. Since persister cells are very rare in growing populations (typically 1e-5 or 1e-6), it is very challenging to observe them directly. It had been proposed that individual cells surviving antibiotics are not growing at the start of the treatment, but recent studies (nicely reviewed in the introduction) where persister bacteria were observed directly do not support this link. Following a similar line, the authors nonetheless still aim at "investigating whether non-growing cells are predominantly responsible for bacterial persistence". Based on new experimental data, they claim the contrary that most surviving cells were "actively growing before drug exposure" and that their work "reveals diverse survival pathways underlying antibiotic persistence".

      We thank the reviewer for this helpful comment, which suggested to us that some revisions in our Introduction would better place our study in the context of previous understanding of antibiotic persistence. As mentioned in our response to Essential Revision 4 and the second comment of Reviewer 1's Recommendations for the authors, we have modified the Introduction to more appropriately place our study in the context of the field.

      The main strengths of the manuscript are in my opinion:

      - To report on direct observation of E. coli persisters to ampicillin (200µg/mL) in 5 different growth media (typically 20 persisters or more per condition, one condition with 12 only), which constitutes without a doubt an experimental tour de force.

      - To aim at bridging the population level and the single-cell level by measuring relevant variables for each and analyzing them jointly.

      - To demonstrate that in most conditions a large fraction of surviving cells was actively growing before drug exposure.

      In addition, although it is well-known that E. coli doesn't need to maintain its rod shape for surviving and dividing, I found very remarkable in their data the extent to which morphology can be affected in persister cells and their progeny, since this really challenges our understanding of E. coli's "lifestyle" (these swimming amoeba-like cells in Supp Video 11 are mind-blowing!).

      We are grateful to the reviewer for the articulation of the strength of this study. 

      Unfortunately, these positive aspects are counter-balanced by several shortcomings in the way experiments are analyzed and interpreted, which I explain below. Moreover, the manuscript is written in a way that makes it very hard to find important information on how experiments are done and is likely to leave the reader with an impression of confusion about what the main findings actually are.

      We thank the reviewer for pointing out these important issues regarding the original manuscript. Please see our replies below regarding how we corresponded to each specific comment to resolve the issue. To make the experimental methods and procedures more accessible and interpretable, we have added more explanations of the experimental details to the Results and Methods sections. Furthermore, since we understood that some of the confusions came from the insufficient explanation of the preculture procedures for the microfluidic experiments, we have modified the schematic illustration of the method shown in Fig. S1 in the original manuscript and moved it as the first main figure in the revised manuscript (Fig. 1C and D). We have also added an illustration that explains the cultivation procedures for the batch culture experiments as Fig.

      6A. 

      My major concerns are the following:

      (1) The main interpretation framework proposed by the authors is to assess whether cells not growing before drug exposure (so-called "dormant") are more or less likely to survive the treatment than growing ones ("non-dormant"). Fig 2A and Fig 3G show the main conclusions of the article from this perspective, that growing cells can survive the treatment and that the fraction of persisters in a given condition is not explained by the fraction of "dormant" cells, respectively. With this analysis, the authors essentially assume that "dormant" cells are of the same type in their different conditions, which ignores the progress in this field over the last decade (Balaban et al. 2019). I argue on the contrary that the observation of "diverse modes of survival in antibiotic persistence" is expected from their experimental design. In particular, the sensitivity of E. coli to beta-lactams such as ampicillin is expected to be much lower during the lag out of the stationary phase, a phenomenon which has been coined "tolerance"; hence in the Late Stationary condition, two subpopulations coexist for which different response to ampicillin is expected. I propose steps toward a more compelling interpretation of the experimental data. Should this point be taken seriously by the authors, it, unfortunately, implies a major rewriting of the article, including its title.

      We thank the reviewer for bringing to our attention the point that may have caused confusion in the original manuscript. 

      The primary purpose of this manuscript was not to assess whether non-growing cells prior to drug exposure are more or less likely to survive treatment than growing cells. Rather, we wanted to examine how different persister cell dynamics emerge at the single-cell level depending on previous cultivation history, growth media, and antibiotic types. We believe that this point is clearer in the revised manuscript with the newly added single-cell dynamics data (Fig. 2D, 2H, 4B, 4D, Fig. 4 – figure supplement 1 and 2A, Fig. 5B, 5D, Fig. 5 – figure supplement 1, Fig. 8B, 8D, and Fig. 8 – figure supplement 1). 

      We also did not mean to imply that "dormant cells" were of the same type under different conditions, as we were aware of the diversity of cellular states of non-growing cells, as well as the reduced sensitivity of cells to antibiotics during the lag out of stationary phase. We believe that one of the reasons this point may have been unclear is that in the previous version we had referred to all cells that were not growing prior to antibiotic treatment as "dormant cells", a term that is often used in a more restricted way to refer to cells under prolonged growth arrest. Therefore, in the revised manuscript, we have avoided the term "dormant cells" and instead simply referred to these as "non-growing cells". Accordingly, we have changed the title of the paper from "Observation of non-dormant persister cells reveals diverse modes of survival in antibiotic persistence" to "Observation of persister cell histories reveals diverse modes of survival in antibiotic persistence".

      To further address these points, we have improved the description of the experimental procedures for the single-cell measurements (see the reviewer's next comment as well). The nongrowing persisters of the MF1 strain found in the post-exponential phase cell populations must be of a different type than those found in the post-early and post-late stationary phase cell populations due to the experimental design. All early and late stationary phase cells were maintained in a non-growing state by flowing conditioned media prepared from the early and late stationary phase cultures until the start of the time-lapse measurements. Thus, aside from potential physiological heterogeneity, the non-growing cells prior to drug treatment are all long lagging cells. On the other hand, for the post-exponential phase condition, we maintained exponential growth conditions during the period from the start of the second pre-culture to the start of antibiotic treatment, including the period during sample preparation for time-lapse measurements. Given the exponential dilution by growth of cell populations, the non-growing persisters are unlikely to be long lagging cells (see our response to Reviewer 2's third comment  in "Recommendations for the authors"). We now describe these experimental procedures in more detail in the Results section (L161-178, L287-297). In addition, we discuss the diversity of cellular states of both non-growing and growing cells in Discussion, citing literature (L545-557).

      (2) The way the authors describe their experiments with bacteria in the stationary phase is very problematic. For instance, they write that they "sampled cells from early and late stationary phases (...) and exposed them to 200 μg/mL of Amp in both batch and single-cell cultures." For any reader in a hurry (hence skipping methods and/or supplementary figure), this leads to believe that bacteria sampled in the stationary phase were exposed to the drug right away (either by adding the drug to the stationary phase sample, or more classically by transferring cells to fresh media with antibiotics). However, it turns out that, after sampling and loading in the microfluidic device, bacteria are grown 2 h in LB (or 4 h in M9) - I don't know what to think of such a blatant omission. The names chosen for each condition should reflect their most important aspects, here "stationary" is simply not appropriate - maybe something like "post early stationary" instead. In any case, I believe that this point highlights further the misconception pointed out in 1 and implies that the average reader will be at best confused, and probably misled.

      We again thank the reviewer for pointing out the insufficient explanation of the method for the single-cell measurements and the helpful recommendation regarding our nomenclature for different conditions. As mentioned above, we now present the previous supplementary figure that schematically explains the experimental procedure as the first main figure to clarify how we prepared the cells loaded into the microfluidic device for single-cell measurements (Fig. 1C and D). Also, following the reviewer's suggestion, we now refer to the conditions as "post-exponential phase," "post-early stationary phase," and "post-late stationary phase" in the revised manuscript. 

      We included a 2-hour (or 4-hour in M9) cultivation period in fresh medium in batch cultures for measuring killing curves to make the cultivation conditions prior to antibiotic treatment as similar as possible between batch and microfluidic experiments. We have clarified the presence of preexposure cultivation of post-early stationary and post-late stationary phase cell populations in the fresh medium before treating them with antibiotics (L264-269, Fig. 6A), so that readers can more easily recognize the experimental conditions.

      (3) Figures 4 and 5 are of very minor significance, and the methodology used in Fig 4 is questionable. The authors measure the abundance of an Rpos-mCherry translational fusion because its "high expression has been suggested to predict persistence". The rationale for this (that an RpoS-mCherry fusion would be a proxy for intracellular ppGpp levels, and in turn predict persistence) has never been firmly established, and the standards used in the article where this reporter was introduced (Maisonneuve, Castro-Camargo, and Gerdes 2013) are notoriously low (which eventually led to its retraction) - I don't know what to think of the fact that the authors cite a review by this group rather than their retracted article. While transcriptional fusions of promoters regulated by RpoS have been proposed to measure its regulatory activity (Patange et al. 2018), the combination of self-regulation and complex post-translational regulation of rpoS makes the physical meaning of the reporter used here completely unclear. Moreover, this translational fusion is introduced without doing any of the necessary controls to demonstrate that the activity of RpoS is not impaired by the addition of the fluorescent protein. Fig 5 simply reports the existence of persisters to ciprofloxacin growing before the treatment. This might be a new observation but it is not unexpected given that a similar observation has been made with a similar drug, ofloxacin (Goormaghtigh and van Melderen 2019), as pointed out in the introduction. There is no further quantitative claim on this.

      We thank the reviewer for pointing out the issue of the RpoS-mCherry fusion. As we mentioned in our response to Essential Revision 2 and also to the comment from reviewer #2, we have tested the sensitivity of this fluorescent reporter strain to oxidative stress and confirmed that it is as sensitive as the rpoS strain (Fig. 1 – figure supplement 1C). Therefore, the RpoS function seems to be defective in this strain, as now explained in Results (L69-79). After confirming the problem with the RpoS-mCherry fusion, we removed all analyses and related arguments that relied on the RpoS expression level (previous Figure 4). In addition, we repeated almost all the experiments with the original MG1655 strain to confirm that the observed results are not specific to the problematic reporter strain. 

      Regarding the experiments with CPFX, we have added a more detailed analysis of single cell dynamics and found that, contrary to the reported results for ofloxacin, not all persistent cells show filamentation after drug withdrawal (Fig. 8C and D, Fig. 8 – figure supplement 1). In addition, we performed new microfluidic experiments in which we treated post-late stationary phase cells with CPFX (Fig. 3). In contrast to the Amp treatment result and the previous study that reported the persistence of post-stationary phase cell populations to ofloxacin (ref. 20), all the persisters for which we identified the pre-exposure growth traits in this condition grew normally prior to CPFX treatment. These newly added analyses and experiments clarify the significance of the CPFX experiments. 

      (4) The authors don't mention the dead volume nor the speed of media exchange in their device. Hopefully, it is short compared to the duration of the treatment; however, it is challenging to remove all antibiotics after the treatment and only 1e-3 or 1e-4 of the treatment concentration is already susceptible to affecting regrowth in fresh media. If this is described in another article, it would be worth adding a comment in the main text.

      We thank the reviewer for bringing up this important point. We have added the perfusion chamber volume and medium flow rate information in the Methods section (L809-817).   

      In the study in which two of the authors participated, the medium exchange rate across the semipermeable membrane was evaluated in a similar device with similar microchamber dimensions (ref. 26). There, we confirmed that the medium exchange was completed within 5 min, which is much shorter than the period of antibiotic treatment and post-antibiotic treatment periods for observing regrowth. We have also included this information in the main text with the reference (L58-63).

      Despite the relatively high medium exchange rate, we cannot formally exclude the possibility that a small amount of antibiotic may remain in the device, e.g. due to non-specific adsorption on the internal surface of the microchambers. In such cases, the residual antibiotics may influence the physiological states of the cells and the regrowth kinetics in the post-exposure periods, as suggested by the reviewer. However, the frequencies of persister cells in the cell populations in our single-cell measurements are comparable to those in the batch culture measurements. Therefore, the removal of antibiotic drugs in our device is at least as efficient as in the batch culture assay. To clarify this point, we have added a paragraph to the Discussion with a reference that reviews the influence of antibiotics at concentrations significantly lower than the MICs (L482-

      489).    

      (5) Fig 2A supports the main finding that a significant fraction of bacteria surviving the treatment are growing before drug exposure, but it uses a poorly chosen representation.

      - In order to compare between conditions, one would like to see the fraction of each type in the population.

      - The current representation (of a fraction of each type among surviving cells) requires a side-byside comparison with a random sample (which will practically be equivalent to the fraction of each type among killed cells) in order to be informative.

      We have changed the style of the previous Fig. 2A to show the fraction of each type in the population instead of the fraction of each type among surviving cells (Fig. 3 and Fig. 3-figure supplement 1).

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study aims to identify the proteins that compose the electrical synapse, which are much less understood than those of the chemical synapse. Identifying these proteins is important to understand how synaptogenesis and conductance are regulated in these synapses. The authors identified more than 50 new proteins and used immunoprecipitation and immunostaining to validate their interaction of localization. One new protein, a scaffolding protein, shows particularly strong evidence of being an integral component of the electrical synapse. However, many key experimental details are missing (e.g. mass spectrometry), making it difficult to assess the strength of the evidence.

      Strengths:

      One newly identified protein, SIPA1L3, has been validated both by immunoprecipitation and immunohistochemistry. The localization at the electrical synapse is very striking.<br /> A large number of candidate interacting proteins were validated with immunostaining in vivo or in vitro.

      Weaknesses:

      There is no systematic comparison between the zebrafish and mouse proteome. The claim that there is "a high degree of evolutionary conservation" was not substantiated.

      We agree that we should have included a comprehensive comparison of proteins captured in the different species.  We are assembling this table and it will be included in the revised manuscript.  There is, indeed, significant conservation of many of the proteins enriched in both species.

      No description of how mass spectrometry was done and what type of validation was done.

      Since the mass spec was outsourced to a core facility, we had not included methodological details.  We have requested these and will include full details in the revised version of the manuscript.  In terms of “validation,” enrichment of proteins at electrical synapses was determined based on capture relative to control samples (non-transgenic zebrafish retinas or non-transgenic mouse retinas infected with the dGBP-TurboID virus) captured and processed at the same time.  Actual validations based on protein co-localization and pull-downs is the subject of the rest of the manuscript, and could only be done for a fraction of the identified proteins.  This type of validation can be pursued in many future studies. 

      The threshold for enrichment seems arbitrary.

      Yes, the thresholds are somewhat arbitrary.  This is due to the fact that experiments that captured larger total amounts of protein (mouse retina samples) had higher signal-to-noise ratio than those that captured smaller total amounts of protein (zebrafish retina).  This allowed us to use a more stringent threshold in the mouse dataset to focus on high probability captured proteins. 

      Inconsistent nomenclature and punctuation usage.

      We have scanned through the manuscript and updated terms that were used inconsistently in the interim revision of the manuscript.

      To describe the mass spec procedure, we will get in touch with the mass spec facility and provide the details in the next round of submission.

      The description of figures is very sparse and error-prone (e.g. Figure 6).

      In Figure 1B, there is very broad non-specific labeling by avidin in zebrafish (In contrast to the more specific avidin binding in mice, Figure 2B). How are the authors certain that the enrichment is specific at the electrical synapse?

      The enrichment of the proteins we identified is specific for electrical synapses because we compared the abundance of all candidates between Cx35b-V5-TurboID and wildtype retinas. Proteins that are components of electrical synapses, will only show up in the Cx35b-V5-TurboID condition. The western blot (Strep-HRP) in figure 1C shows the differences in the streptavidin labeling and hence the enrichment of proteins that are part of electrical synapses. Moreover, while the background appears to be quite abundant in sections, biotinylation is a rare posttranslational modification and mainly occurs in carboxylases: The two intense bands that show up above 50 and 75 kDa.  The background mainly originates from these two proteins.

      In Figure 1E, there is very little colocalization between Cx35 and Cx34.7. More quantification is needed to show that it is indeed "frequently associated."

      We agree that “frequently associated” is too strong as a statement. We corrected this and instead wrote “that Cx34.7 was only expressed in the outer plexiform layer (OPL) where it was associated with Cx35b at some gap junctions” in line 150. There are many gap junctions at which Cx35b is not colocalized with Cx34.7. 

      Expression of GFP in HCs would potentially be an issue, since GFP is fused to Cx36 (regardless of whether HC expresses Cx36 endogenously) and V5-TurboID-dGBP can bind to GFP and biotinylate any adjacent protein.  

      Thank you for this suggestion! There should be no Cx36-GFP expression in horizontal cells, which means that the nanobody cannot bind to anything in these cells. Moreover, to recognize specific signals from non-specific background, we included wild type retinas throughout the entire experiments. This condition controls for non-specific biotinylation.

      Figure 7: the description does not match up with the figure regarding ZO-1 and ZO-2.

      It appears that a portion of the figure legend was left out of the submitted version of the manuscript.  We have put the legend for panels A through C back into the manuscript in the interim revision.

      Reviewer #2 (Public review):

      Summary:

      This study aimed to uncover the protein composition and evolutionary conservation of electrical synapses in retinal neurons. The authors employed two complementary BioID approaches: expressing a Cx35b-TurboID fusion protein in zebrafish photoreceptors and using GFP-directed TurboID in Cx36-EGFP-labeled mouse AII amacrine cells. They identified conserved ZO proteins and endocytosis components in both species, along with over 50 novel proteins related to adhesion, cytoskeleton remodeling, membrane trafficking, and chemical synapses. Through a series of validation studies¬-including immunohistochemistry, in vitro interaction assays, and immunoprecipitation - they demonstrate that novel scaffold protein SIPA1L3 interacts with both Cx36 and ZO proteins at electrical synapse. Furthermore, they identify and localize proteins ZO-1, ZO-2, CGN, SIPA1L3, Syt4, SJ2BP, and BAI1 at AII/cone bipolar cell gap junctions.

      Strengths:

      The study demonstrates several significant strengths in both experimental design and validation approaches. First, the dual-species approach provides valuable insights into the evolutionary conservation of electrical synapse components across vertebrates. Second, the authors compare two different TurboID strategies in mice and demonstrate that the HKamac promoter and GFP-directed approach can successfully target the electrical synapse proteome of mouse AII amacrine cells. Third, they employed multiple complementary validation approaches - including retinal section immunohistochemistry, in vitro interaction assays, and immunoprecipitation-providing evidence supporting the presence and interaction of these proteins at electrical synapses.

      Weaknesses:

      The conclusions of this paper are supported by data; however, some aspects of the quantitative proteomics analysis require clarification and more detailed documented. The differential threshold criteria (>3 log2 fold for mouse vs >1 log2 fold for zebrafish) will benefit from biological justification, particularly given the cross-species comparison. Additionally, providing details on the number of biological or technical replicates used in this study, along with analyses of how these replicates compare to each other, would strengthen the confidence in the identification of candidate proteins. Furthermore, including negative controls for the histological validation of proteins interacting with Cx36 could increase the reliability of the staining results.

      While the study successfully characterized the presence of candidate proteins at the electrical synapses between AII amacrine cells and cone bipolar cells, it did not compare protein compositions between the different types of electrical synapses within the circuit. Given that AII amacrine cells form both homologous (AII-AII) and heterologous (AII-cone bipolar cell) electrical synapses-connections that serve distinct functional roles in retinal signaling processing-a comparative analysis of their molecular compositions could have provided important insights into synapse specificity.

      Reviewer #3 (Public review):

      Summary:

      This study by Tetenborg S et al. identifies proteins that are physically closely associated with gap junctions in retinal neurons of mice and zebrafish using BioID, a technique that labels and isolates proteins proximal to a protein of interest. These proteins include scaffold proteins, adhesion molecules, chemical synapse proteins, components of the endocytic machinery, and cytoskeleton-associated proteins. Using a combination of genetic tools and meticulously executed immunostaining, the authors further verified the colocalizations of some of the identified proteins with connexin-positive gap junctions. The findings in this study highlight the complexity of gap junctions. Electrical synapses are abundant in the nervous system, yet their regulatory mechanisms are far less understood than those of chemical synapses. This work will provide valuable information for future studies aiming to elucidate the regulatory mechanisms essential for the function of neural circuits.

      Strengths:

      A key strength of this work is the identification of novel gap junction-associated proteins in AII amacrine cells and photoreceptors using BioID in combination with various genetic tools. The well-studied functions of gap junctions in these neurons will facilitate future research into the functions of the identified proteins in regulating electrical synapses.

      Thank you for these comments.

      Weaknesses:

      I do not see major weaknesses in this paper. A minor point is that, although the immunostaining in this study is beautifully executed, the quantification to verify the colocalization of the identified proteins with gap junctions is missing. In particular, endocytosis component proteins are abundant in the IPL, making it unclear whether their colocalization with gap junction is above chance level (e.g. EPS15l1, HIP1R, SNAP91, ITSN in Figure 3B).

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      By way of background, the Jiang lab has previously shown that loss of the type II BMP receptor Punt (Put) from intestinal progenitors (ISCs and EBs) caused them to differentiate into EBs, with a concomitant loss of ISCs (Tian and Jiang, eLife 2014). The mechanism by which this occurs was activation of Notch in Put-deficient progenitors. How Notch was upregulated in Put-deficient ISCs was not established in this prior work. In the current study, the authors test whether a very low level of Dl was responsible. But co-depletion of Dl and Put led to a similar phenotype as depletion of Put alone. This result suggested that Dl was not the mechanism. They next investigate genetic interactions between BMP signaling and Numb, an inhibitor of Notch signaling. Prior work from Bardin, Schweisguth and other labs has shown that Numb is not required for ISC self-renewal. However the authors wanted to know whether loss of both the BMP signal transducer Mad and Numb would cause ISC loss. This result was observed for RNAi depletion from progenitors and for mad, numb double mutant clones. Of note, ISC loss was observed in 40% of mad, numb double mutant clones, whereas 60% of these clones had an ISC. They then employed a two-color tracing system called RGT to look at the outcome of ISC divisions (asymmetric (ISC/EB) or symmetric (ISC/ISC or EB/EB)). Control clones had 69%, 15% and 16%, respectively, whereas mad, numb double mutant clones had much lower ISC/ISC (11%) and much higher EB/EB (37%). They conclude that loss of Numb in moderate BMP loss of function mutants increased symmetric differentiation which lead caused ISC loss. They also reported that numb<sup>15</sup> and numb<sup>4</sup> clones had a moderate but significant increase in ISC-lacking clones compared to control clones, supporting the model that Numb plays a role in ISC maintenance. Finally, they investigated the relevance of these observation during regeneration. After bleomycin treatment, there was a significant increase in ISC-lacking clones and a significant decrease in clone size in numb<sup>4</sup> and numb<sup>15</sup> clones compared to control clones. Because bleomycin treatment has been shown to cause variation in BMP ligand production, the authors interpret the numb clone under bleomycin results as demonstrating an essential role of Numb in ISC maintenance during regeneration.

      Strengths:

      (i) Most data is quantified with statistical analysis

      (ii) Experiments have appropriate controls and large numbers of samples

      (iii) Results demonstrate an important role of Numb in maintaining ISC number during regeneration and a genetic interaction between Mad and Numb during homeostasis.

      Weaknesses:

      (i) No quantification for Fig. 1

      Quantification of Fig.1 has been added. 

      (ii) The premise is a bit unclear. Under homeostasis, strong loss of BMP (Put) leads to loss of ISCs, presumably regardless of Numb level (which was not tested). But moderate loss of BMP (Mad) does not show ISC loss unless Numb is also reduced. I am confused as to why numb does not play a role in Put mutants. Did the authors test whether concomitant loss of Put and Numb leads to even more ISC loss than Put-mutation alone.

      We have tested the genetic interaction between put and numb using Put RNAi and Numb RNAi driven by esg<sup>ts</sup>. According to the results in this study and our previously published data, put mutant clone or esg<sup>ts</sup> > Put-RNAi induced a rapid loss of ISC (whin 8 days). We did not observe further enhancement of stem cell loss phenotype in Put and Numb double RNAi guts.

      (iii) I think that the use of the word "essential" is a bit strong here. Numb plays an important role but in either during homeostasis or regeneration, most numb clones or mad, numb double mutant clones still have ISCs. Therefore, I think that the authors should temper their language about the role of Numb in ISC maintenance.

      We have revised the language and changed “essential” to important”.

      Reviewer #2 (Public review):

      Summary:

      This work assesses the genetic interaction between the Bmp signaling pathway and the factor Numb, which can inhibit Notch signalling. It follows up on the previous studies of the group (Tian, Elife, 2014; Tian, PNAS, 2014) regarding BMP signaling in controlling stem cell fate decision as well as on the work of another group (Sallé, EMBO, 2017) that investigated the function of Numb on enteroendocrine fate in the midgut. This is an important study providing evidence of a Numb-mediated back up mechanism for stem cell maintenance.

      Strengths:

      (1) Experiments are consistent with these previous publications while also extending our understanding of how Numb functions in the ISC.

      (2) Provides an interesting model of a "back up" protection mechanism for ISC maintenance.

      Weaknesses:

      (1) Aspects of the experiments could be better controlled or annotated:

      (a) As they "randomly chose" the regions analyzed, it would be better to have all from a defined region (R4 or R2, for example) or to at least note the region as there are important regional differences for some aspects of midgut biology.

      Thank you for the suggestion. In fact, we conducted all the analyses in region 4, we have added statement to clarify this in the revised manuscript.

      (b) It is not clear to me why MARCM clones were induced and then flies grown at 18{degree sign}C? It would help to explain why they used this unconventional protocol.

      We kept the flies at 18°C to avoid spontaneous clone.

      (2) There are technical limitations with trying to conclude from double-knockdown experiments in the ISC lineage, such as those in Figure 1 where Dl and put are both being knocked down: depending on how fast both proteins are depleted, it may be that only one of them (put, for example) is inactivated and affects the fate decision prior to the other one (Dl) being depleted. Therefore, it is difficult to definitively conclude that the decision is independent of Dl ligand.

      In our hand, Dl-RNAi is very effective and exhibited loss of N pathway activity (as determined by the N pathway reporter Su(H)-lacZ ) after RNAi for 8 days (Fig. 1D). Therefore, the ectopic Su(H)-lacZ expression in Punt Dl double RNAi (fig. 1E) is unlikely due to residual Dl expression. Nevertheless, we have changed the statement “BMP signaling blocks ligand-independent N activity” to” Loss of BMP signaling results in ectopic N pathway activity even when Dl is depleted”

      (3) Additional quantification of many phenotypes would be desired.

      (a) It would be useful to see esg-GFP cells/total cells and not just field as the density might change (2E for example).

      We focused on R4 region for quantification where the cell density did not exhibit apparent change in different experimental groups. In addition, we have examined many guts for quantification. It is very unlikely that the difference in the esg-GFP+ cell number is caused by change in cell density.

      (b) Similarly, for 2F and 2G, it would be nice to see the % of ISC/ total cell and EB/total cell and not only per esgGFP+ cell.

      Unfortunately, we didn’t have the suggested quantification. However, we believe that quantification of the percentage of ISC or EB among all progenitor cells, as we did here, provides a meaningful measurement of the self-renewal status of each experimental group.

      (c) Fig1: There is no quantification - specifically it would be interesting to know how many esg+ are su(H)lacZ positive in Put- Dl- condition compared to WT or Put- alone. What is the n?

      Quantification of Fig.1 has been added. 

      (d) Fig2: Pros + cells are not seen in the image? Are they all DllacZ+?

      Anti-Pros and anti-E(spl)mβ-CD2 were stained in the same channel (magenta).  Pros+ exhibited “dot-like” nuclear staining while CD2 staining outlined the cell membrane of EBs. We have clarified this in the revised figure legend.

      (e) Fig3: it would be nice to have the size clone quantification instead of the distribution between groups of 2 cell 3 cells 4 cell clones.

      Because of the heterogeneity of clone size for each genotype, we chose to group clones based on their sizes ( 2, 3-6, 6-8, >8 cells) and quantified the distribution of individual groups for each genotype, which clearly showed an overall reduction in clone size for mad numb double mutant clones. We and others have used the same clone size analysis in previous studies (e.g., Tian and Jiang, eLife 2014).

      (f) How many times were experiments performed?

      All experiments were performed at least 3 times.

      (4) The authors do not comment on the reduction of clone size in DSS treatment in Figure 6K. How do they interpret this? Does it conflict with their model of Bleo vs DSS?

      Guts containing numb<sup>4</sup> clones treated with DSS exhibited a slight reduction of clone size, evident by a higher percentage of 2-cell clones and lower percentage of > 8 cell clones. This reduction is less significant in guts containing numb<sup>15</sup> clones. However, the percentage of Dl<sup>+</sup>-containing clones is similar between DSS and mock-treated guts. It is possible that ISC proliferation is lightly reduced due to numb<sup>4</sup> mutation or the genetic background of this stock.

      (5) There is probably a mistake on sentence line 314 -316 "Indeed, previous studies indicate that endogenous Numb was not undetectable by Numb antibodies that could detect Numb expression in the nervous system".

      We have modified the sentence.

      Reviewer #3 (Public review):

      Summary:

      The authors provide an in-depth analysis of the function of Numb in adult Drosophila midgut. Based on RNAi combinations and double mutant clonal analyses, they propose that Numb has a function in inhibiting Notch pathway to maintain intestinal stem cells, and is a backup mechanism with BMP pathway in maintaining midgut stem cell mediated homeostasis.

      Strengths:

      Overall, this is a carefully constructed series of experiments, and the results and statistical analyses provides believable evidence that Numb has a role, albeit weak compared to other pathways, in sustaining ISC and in promoting regeneration especially after damage by bleomycin, which may damage enterocytes and therefore disrupt BMP pathway more. The results overall support their claim.

      The data are highly coherent, and support a genetic function of Numb, in collaborating with BMP signaling, to maintain the number and proliferative function of ISCs in adult midguts. The authors used appropriate and sophisticated genetic tools of double RNAi, mutant clonal analysis and dual marker stem cell tracing approaches to ensure the results are reproducible and consistent. The statistical analyses provide confidence that the phenotypic changes are reliable albeit weaker than many other mutants previously studied.

      Weaknesses:

      In the absence of Numb itself, the midgut has a weak reduction of ISC number (Fig. 3 and 5), as well as weak albeit not statistically significant reduction of ISC clone size/proliferation. I think the authors published similar experiments with BMP pathway mutants. The mad<sup>1-2</sup> allele used here as stated below may not be very representative of other BMP pathway mutants. Therefore, it could be beneficial to compare the number of ISC number and clone sizes between other BMP experiments to provide the readers with a clearer picture of how these two pathways individually contribute (stronger/weaker effects) to the ISC number and gut homeostasis.

      Thanks for the comment. We have tested other components of BMP pathway in our previously study (Tian et al., 2014). More complete loss of BMP signaling (for example, Put clones, Put RNAi, Tkv/Sax double mutant clones or double RNAi) resulted in ISC loss regardless the status of numb, suggesting a more predominant role of BMP signaling in ISC self-renewal compared with Numb. We speculate that the weak stem cell loss phenotype associated with numb mutant clones in otherwise wild type background could be due to fluctuation of BMP signaling in homeostatic guts.

      The main weakness of this manuscript is the analysis of the BMP pathway components, especially the mad<sup>1-2</sup> allele. The mad RNAi and mad<sup>1-2</sup> alleles (P insertion) are supposed to be weak alleles and that might be suitable for genetic enhancement assays here together with numb RNAi. However, the mad<sup>1-2</sup> allele, and sometimes the mad RNAi, showed weakly increased ISC clone size. This is kind of counter-intuitive that they should have a similar ISC loss and ISC clone size reduction.

      We used mad<sup>1-2</sup> and mad RNAi here to test the genetic interaction with numb because our previous studies showed that partial loss of BMP signaling under these conditions did not cause stem cell loss, therefore, may provide a sensitized background to determine the role of Numb in ISC self-renewal. The increased proliferation of ISC/ clone size associated with mad<sup>1-2</sup> and mad RNAi is due to the fact that reduction of BMP signaling in either EC or EB non-autonomously induces stem cell proliferation. However, in mad numb double mutant clones, there was a reduction in clone size due to loss of ISC in many clones.

      A much stronger phenotype was observed when numb mutants were subject to treatment of tissue damaging agents Bleomycin, which causes damage in different ways than DSS. Bleomycin as previously shown to be causing mainly enterocyte damage, and therefore disrupt BMP signaling from ECs more likely. Therefore, this treatment together with loss of numb led to a highly significant reduction of ISC in clones and reduction of clone size/proliferation. One improvement is that it is not clear whether the authors discussed the nature of the two numb mutant alleles used in this study and the comparison to the strength of the RNAi allele. Because the phenotypes are weak and more variable, the use of specific reagents is important.

      We have included information about the two numb alleles in the “Materials and Methods”. numb<sup>15</sup> is a null allele, and the nature of numb<sup>4</sup> has not been elucidated. According to Domingos, P.M. et al., numb<sup>15</sup> induced a more severe phenotype than numb<sup>4</sup> did. Consistently, we also found that more numb<sup>15</sup> mutant clones were void of stem cell than numb<sup>4</sup> mutant clones.

      Furthermore, the use of possible activating alleles of either or both pathways to test genetic enhancement or synergistic activation will provide strong support for the claims.

      Activation of BMP (esgts>Tkv<sup>CA</sup>) alone induced stem cell tumor (Tian et al., 2014) whereas overexpression of Numb did not induce increase stem cell number although overexpression of Numb in wing discs produced phenotypes indictive of inhibition of N (our unpublished observation), making it difficult to test the synergistic effect of activating both BMP and Numb.

      Reviewer #1 (Recommendations for the authors):

      - Cartoon of RGT in Fig 4 needs to be improved. We need to know what chromosome harbors the esgts. It is not sufficient to simply put the location of the ubi-GFP and ubi-RFP (on 19A) and not show the location of other components of the RGT system.

      Thank you for the suggestion. We have revised the cartoon in Fig. 4 to include all three pairs of chromosomes and indicate where the esgts driver and UAS-RNAi are located. In addition, we have included the genotypes for all the genetic experiments in the Method section.

      - Quantification of the results in Fig. 1

      Quantification of Fig.1 has been added. 

      - The authors need to explain the premise more carefully (see above) and explain whether or not they tested put, numb double knockdowns.

      We have explained why not testing put numb double RNAi (see above).

      Reviewer #2 (Recommendations for the authors):

      The number of times the experiments have been performed would be useful to include.

      This information has been added in the figure legends.

    1. Author response:

      We thank the reviewers for their thoughtful comments on our submitted manuscript.

      The major point from all three reviewers was that the sensory inputs may be more complex than simply ASH and AWC, since mutations in osm-9 and tax-4 will affect many more sensory neurons. We fully agree. The differential effects of osm-9 and ta_x-_4 allowed us to recognize that there were two distinct afferent pathways operating simultaneously, mediating repulsion and attraction separately. However, it remains to be determined which sensory neurons are contributing to each pathway. We have planned a full analysis of the sensory inputs, not limited to just ASH and AWC, using neuron-specific rescue and neuron-specific chemogenetic inactivation (using HisCl1). While this analysis falls outside the scope of the present study, we will perform the inactivations of ASH and AWC and include the data for the revised version of this study. We expect to demonstrate whether ASH and AWC inputs are sufficient or whether other sensory neurons make significant contributions. Additionally, we will include chemotaxis dose-response data for osm-9 mutants as part of this analysis and make the minor corrections in data presentation requested.

    1. Author response:

      The following is the authors’ response to the current reviews.

      We are disappointed that the reviewers do not acknowledge that our data constitute a major step forward for the field. We will prepare a revised version that takes care of the remaining small issues concerning the technical descriptions and a detailed response to the current round of comments. We will also add a summary of the major new findings of our study.


      The following is the authors’ response to the original reviews.

      We appreciate the time of the reviewers and their detailed comments, which have helped to improve the manuscript.

      Our study presents the largest systematic dataset so far on the evolution of sex-biased gene expression in animals. It is also the first that explores the patterns of individual variation in sex-biased gene expression and the SBI is an entirely new procedure to directly visulize these variance patterns in an intuitive way.

      Also, we should like to point out that our study contradicts recent conclusions that had suggested that a substantial set of sex-biased genes has conserved functions between humans and mice and that mice can therefore be informative for gender-specific medicine studies. Our data suggest that only a very small set of genes are conserved in their sex-biased expression between mice and humans in more than one organ.

      In the revised version we have made the following major updates:

      - added a rate comparison of gene regulation turnover between sex-biased and non-sex-biased genes

      - added additional statistics to the variance comparisons and selection tests

      - added a regulatory module analysis that shows that much of the gene turnover happens within modules

      - added a mosaic pattern analysis that shows the individual complexity of sex-biased patterns

      - extended introduction and discussion

      Reviewer #1 (Public Review):<br /> The authors describe a comprehensive analysis of sex-biased expression across multiple tissues and species of mouse. Their results are broadly consistent with previous work, and their methods are robust, as the large volume of work in this area has converged toward a standardized approach.

      I have a few quibbles with the findings, and the main novelty here is the rapid evolution of sex-biased expression over shorter evolutionary intervals than previously documented, although this is not statistically supported. The other main findings, detailed below, are somewhat overstated.

      (1) In the introduction, the authors conflate gametic sex, which is indeed largely binary (with small sperm, large eggs, no intermediate gametic form, and no overlap in size) with somatic sexual dimorphism, which can be bimodal (though sometimes is even more complicated), with a large variance in either sex and generally with a great deal of overlap between males and females. A good appraisal of this distinction is at . This distinction in gene expression has been recognized for at least 20 years, with observations that sex-biased expression in the soma is far less than in the gonad.

      For example, the authors frame their work with the following statement:

      "The different organs show a large individual variation in sex-biased gene expression, making it impossible to classify individuals in simple binary terms. Hence, the seemingly strong conservation of binary sex-states does not find an equivalent underpinning when one looks at the gene-expression makeup of the sexes"

      The authors use this conflation to set up a straw man argument, perhaps in part due to recent political discussions on this topic. They seem to be implying one of two things. a) That previous studies of sex-biased expression of the soma claim a binary classification. I know of no such claim, and many have clearly shown quite the opposite, particularly studies of intra-sexual variation, which are common - see https://doi.org/10.1093/molbev/msx293, https://doi.org/10.1371/journal.pgen.1003697, https://doi.org/10.1111/mec.14408, https://doi.org/10.1111/mec.13919, https://doi.org/10.1111/j.1558-5646.2010.01106.x for just a few examples. Or b) They are the first to observe this non-binary pattern for the soma, but again, many have observed this. For example, many have noted that reproductive or gonad transcriptome data cluster first by sex, but somatic tissue clusters first by species or tissue, then by sex (https://doi.org/10.1073/pnas.1501339112, https://doi.org/10.7554/eLife.67485)

      Figure 4 illustrates the conceptual difference between bimodal and binary sexual conceptions. This figure makes it clear that males and females have different means, but in all cases the distributions are bimodal.

      I would suggest that the authors heavily revise the paper with this more nuanced understanding of the literature and sex differences in their paper, and place their findings in the context of previous work.

      We are sorry that our introduction seems to have been too short to make our points sufficiently clear. Of course, overlapping somatic variation has been shown for morphological characters, but we were aiming to assess this at the sex-biased transcriptome level. Previous studies looking at sex-biased genes were usually limited by the techniques that were available at their times, resulting in a focus on gonads in most studies and almost all have too few individuals included to study within-group variation. We detail this below for the papers that are mentioned by the referee. In view of this, we cite them now as examples for the prevalent focus on gonadal comparisons in most studies. Only Scharmann et al. 2021 on plant leaf dimorphism is indeed relevant for our study with respect to its general findings and we make now extensive reference to it. In addition, we have generally modified the introduction and substantially extended the discussion to make our points clear.

      Snell-Rood 2010: the paper focuses on sex-specific morphological structures in beetles. It samples six somatic tissues for four individuals each of each class. Analysis is done via microarray hybridizations. While categorial differences were traced, variability between individuals was not discussed. By today´s standards, microarrays have anyway too much technical variability to even consider such a discussion.

      Pointer et al. 2013: this paper studies three sexual phenotypes in a bird species, females, dominant males and subordinate males. Tissues include telencephalon, spleen and left gonad. The focus of the analysis is on the gonads, since only few sex-biased genes were found in spleen and brain (according to suppl. Table S1, 0 for the spleen and 2 for the brain). No inferences could be made on somatic variation.

      Harrison 2015: this paper focuses on gonads plus spleen in six bird species with between 2-6 individuals for each sex collected. In the spleen, only one female biased gene and no male biased gene was detected. Hence, the data do not allow to infer patterns of somatic variation.

      Dean et al. 2016: this paper compares four categories of fish caught around nests, with four to seven individuals per category. Only gonads were analyzed, hence no inferences could be made about somatic variability between individuals.

      Cardoso et al. 2017: this paper test categories of fish with alternative reproductive tactics based on brain transcriptomes. While it uses 9-10 individuals per category, it uses pools for sequencing with two pools per category. This does not allow to make any inference on individual variation.

      Todd et al 2017: this paper focuses on three categories of a fish species, females and dominant and sneaker males. It uses brain and gonads as samples with five individuals each for each category. For the brain, more different genes were found between the two types of males, rather than between females and males (3 and 9 respectively). The paper focuses on individual gene descriptions and does not mention somatic variation.

      Scharmann 2021: the paper focuses on 10 species of plants with sexually dimorphic leafs. 5-6 individuals were sampled per sex. The major finding is that sex-biased gene expression does not correlate with the degree of sexual dimorphism of the leafes. The study shows also a fast evolution of sex-biased expression and states that signatures of adaptive evolution are weak. But it does not discuss variance patterns within populations.

      (2) The authors also claim that "sexual conflict is one of the major drivers of evolutionary divergence already at the early species divergence level." However, making the connection between sex-biased genes and sexual conflict remains fraught. Although it is tempting to use sex-biased gene expression (or any form of phenotypic dimorphism) as an indicator of sexual conflict, resolved or not, as many have pointed out, one needs measures of sex-specific selection, ideally fitness, to make this case (https://doi.org/10.1086/595841, 10.1101/cshperspect.a017632). In many cases, sexual dimorphism can arise in one sex only without conflict (e.g. 10.1098/rspb.2010.2220). As such, sex-biased genes alone are not sufficient to discriminate between ongoing and resolved conflict.

      We imply sexual conflict as a driver of genomic divergence patterns in a similar way as it has been done by many authors before (e.g. Mank 2017a, Price et al. 2023, Tosto et al. 2023). While we fully appreciate the point of the referee, we do not really see where we deviate from the standard wording that is used in the context of genomic data. In such data, it is of course usually assumed that they represent solved conflicts (Figure 1D in Cox and Calsbeek) where selection differentials would not be measurable anyway. (Please note also that the phylogenetic approach used in Oliver and Monteiro 2010 becomes rather problematic in view of introgressive hybridization patterns in butterflies), We have extended the discussion to address this.

      (3) To make the case that sex-biased genes are under selection, the authors report alpha values in Figure 3B. Alpha value comparisons like this over large numbers of genes often have high variance. Are any of the values for male- female- and un-biased genes significantly different from one another? This is needed to make the claim of positive selection.

      Sorry, we had accidentally not included the statistics in the final version of the figure. We have added this now in the supplementary table but have also generally changed the statistical approach and the design of the figure.

      Reviewer #2 (Public Review):

      The manuscript by Xie and colleagues presents transcriptomic experiments that measure gene expression in eight different tissues taken from adult female and male mice from four species. These data are used to make inferences regarding the evolution of sex-biased gene expression across these taxa. The experimental methods and data analysis are appropriate; however, most of the conclusions drawn in the manuscript have either been previously reported in the literature or are not fully supported by the data.

      We are not aware of any study that has analyzed somatic sex-biased expression in such a large and taxonomically well resolved closely related taxa of animals. Only the study by Scharman et al. 2021 on plant leaves comes close to it, but even this did not specifically analyze the intragroup variation aspects. Of course, some of our results confirm previous conclusions, but we should still like to point out that they go far beyond them.

      There are two ways the manuscript could be modified to better strengthen the conclusions.

      First, some of the observed differences in gene expression have very little to no effect on other phenotypes, and are not relevant to medicine or fitness. Selectively neutral gene expression differences have been inferred in previous studies, and consistent with that work, sex-biased and between-species expression differences in this study may also be enriched for selectively neutral expression differences. This idea is supported by the analysis of expression variance, which indicates that genes that show sex-biased expression also tend to show more inter-individual variation. This perspective is also supported by the MK analysis of molecular evolution, which suggests that positive selection is more prevalent among genes that are sex-biased in both mus and dom, and genes that switch sex-biased expression are under less selection at the level of both protein-coding sequence and gene expression.

      We have now revisited these points by additional statistical analysis of the variance patterns and an extended discussion under the heading "Neutral or adaptive?". 

      As an aside, I was confused by (line 176): "implying that the enhanced positive selection pressure is triggered by their status of being sex-biased in either taxon." - don't the MK values suggest an excess of positive selection on genes that are sex-biased in both taxa?

      There are different sets of genes that are sex-biased in these two taxa - hence this observation is actually a strong argument for selection on these genes. We have changed the correspondiung text to make this clearer.

      Without an estimate of the proportion of differentially expressed genes that might be relevant for broader physiological or organismal phenotypes, it is difficult to assess the accuracy and relevance of the manuscript's conclusions. One (crude) approach would be to analyze subsets of genes stratified by the magnitude of expression differences; while there is a weak relationship between expression differences and fitness effects, on average large gene expression differences are more likely to affect additional phenotypes than small expression differences.

      We agree that it remains a challenge to show functional effects for the sex-biased genes. The argument that they should have a function is laid out above (and stated in many reviews on the topic). To use the expression level as a proxy of function does not seem justified, given the current literature. For example, genes that are highly conected in modules are not necessrily highly expressed (e.g. transcription factors). Also, genes may be highly expressed in a rare cell type of an organ and have an important funtion there, but this would not show up across the RNA of the whole organ. The most direct functional relationship between sex-biased expression and phenotype comes from the human data in Naqvi et al. 2019 - which we had cited.

      Another perspective would be to compare the within-species variance to the between-species variance to identify genes with an excess of the latter relative to the former (similar logic to an MK test of amino acid substitutions).

      Such an analysis was actually our intial motivation for this study. However, the new (and surprising!) result is that the status of being sex-biased shows such a high turnover that not many genes are left per organ where one could even try to make such a test. However, we have extended the variance analysis with reciprocal gene sets (as we had done it for the MK test) and extended the discussion on the topic, including citation of our prior work on these questions.

      Second, the analysis could be more informative if it distinguished between genes that are expressed across multiple tissues in both sexes that may show greater expression in one sex than the other, versus genes with specialized function expressed solely in (usually) reproductive tissues of one sex (e.g. ovary-specific genes). One approach to quantify this distinction would be metrics like those used defined by [Yanai I, et al. 2005. Genome-wide midrange transcription profiles reveal expression-level relationships in human tissue specification. Bioinformatics 21:650-659.] These approaches can be used to separate out groups of genes by the extent to which they are expressed in both sexes versus genes that are primarily expressed in sex-specific tissue such as testes or ovaries. This more fine-grained analysis would also potentially inform the section describing the evolution/conservation of sex-biased expression: I expect there must be genes with conserved expression specifically in ovaries or testes (these are ancient animal structures!) but these may have been excluded by the requirement that genes be sex-biased and expressed in at least two organs.

      Given that our study focuses on somatic sex-biased genes, we refrain from a comparative analysis of genes that are only expressed in the sex-organs in this paper. With respect to sharing of sex-biased gene expresssion between the somatic tissues, we show in Figure 8 that there are only very few of them (8 female-biased and 3 male-biased). A separate statistical treatment is not possible for this small set of genes.

      There are at least three examples of statements in the discussion that at the moment misinterpret the experimental results.

      The discussion frames the results in the context of sexual selection and sexually antagonistic selection, but these concepts are not synonymous. Sexual selection can shape phenotypes that are specific to one sex, causing no antagonism; and fitness differences between males and females resulting from sexually antagonistic variation in somatic phenotypes may not be acted on by sexual selection. Furthermore, the conditions promoting and consequence of both kinds of selection can be different, so they should be treated separately for the purposes of this discussion.

      We cannot make such a distinction for gene expression patterns - and we are not aware that this was done before in the literature (except gene expression was directly linked to a morphological structure). We have updated this discussion accordingly.

      The discussion claims that "Our data show that sex-biased gene expression evolves extremely fast" but a comparison or expectation for the rate of evolution is not provided. Many other studies have used comparative transcriptomics to estimate rates of gene expression evolution between species, including mice; are the results here substantially and significantly different from those previous studies? Furthermore, the experimental design does not distinguish between those gene expression phenotypes that are fixed between species as compared to those that are polymorphic within one or more species which prevents straightforward interpretation of differences in gene expression as interspecific differences.

      Our statement was in relation to the comparison between somatic and gondadal gene turnover, as well as the comparison to humans. We have now included an additional analysis for a direct comparison with non-sex-biased genes in the same populations (Figure 2B). Note that gene expression variances cannot get fixed anyway, they can only become different in average and magnitude.

      The conclusion that "Our results show that most of the genetic underpinnings of sex differences show no long-term evolutionary stability, which is in strong contrast to the perceived evolutionary stability of two sexes" - seems beyond the scope of this study. This manuscript does not address the genetic underpinnings of sex differences (this would involve eQTL or the like), rather it looks at sex differences in gene expression phenotypes.

      This comes back to the points discussed above about the validity to infer function from sex-biased expression. We have updated the text to clarify this.

      Simply addressing the question of phenotypic evolutionary stability would be more informative if genes expressed specifically in reproductive tissues were separated from somatic sex-biased genes to determine if they show similar patterns of expression evolution.

      Our study is generally focused on somatic gene expression. The comparison with reproductive tissues serves merely as a reference. Since they are of course very different tissues, they should not be compared with each other in the same way. We have now specifically addressed this point in the discussion.

      Reviewer #3 (Public Review):

      This manuscript reports some interesting and important patterns. The results on sex-bias in different tissues and across four taxa would benefit from alternative (or additional) presentation styles. In my view, the most important results are with respect to alpha (fraction of beneficial amino acid changes) in relation to sex-bias (though the authors have made this as a somewhat minor point in this version).

      The part that the authors emphasize I don't find very interesting (i.e., the sexes have overlapping expression profiles in many nongonadal tissues), nor do I believe they have the appropriate data necessary to convincingly demonstrate this (which would require multiple measures from the same individual).

      This is the first study that reports such overlaps and we show that this is not always the case (e.g. liver and kidney data in mice). We are not aware of any preditions of how such patterns would look like and how they would evolve - why should such a new finding not be interesting? Concerning the appropriateness of the data we do not agree with the point the referee makes - see response below.

      This study reports several interesting patterns with respect to sex differences in gene expression across organs of four mice taxa. An alternative presentation of the data would yield a clearer and more convincing case that the patterns the authors claim are legitimate.

      I recommend that the authors clarify what qualifies as "sex-bias".

      This is defined by the statistical criteria that we have applied, following the general standard of papers on this topic.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) "However, already Darwin has pointed out that the phenotypes of the sexes should evolve fast". I think the authors mean that Darwin was quick to point out that sex-specific phenotypes evolve quickly".

      We have modified this text part.

      (2) Non-gonadal is more often referred to as somatic. I would encourage the authors to use this more common term for accessibility.

      We have adopted this term

      (3) Figure 5 is interesting, however, it is difficult to know whether the decreased bimodality in humans compared to mice is biological or technical due to the differences in the underlying data. For example, the mouse samples tightly controlled age and environmental conditions within each species. It is not possible to do that with human samples, and there are very good reasons to think that these factors will affect variance in both sexes.

      Yes, this is certainly true and we know this also from other comparative data between mice and humans. Still, this is human reality vs mouse artificialness. We pick this now up in the discussion.

      (4) Line 273. The large numbers of cells needed for single-cell analysis require that most studies pool multiple samples, however these pools are helpful in themselves. This approach was used by https://doi.org/10.1093/evlett/qrad013 to quantify the degree of sex-bias within cell types across multiple tissues and to compare how bulk and single-cell sex-bias measures compare. Sex-bias in some somatic cell types was very high, even when bulk sex-bias in those tissues was not. This suggests that the bulk data the authors use in this study may in fact obscure the pattern of sex-bias.

      Yes, we agree, and this is exactly how we did the analysis and interpretation, based on the cited paper.

      (5)- Line 379 "Total RNAs were" should be "Total RNA was"

      Corrected

      References cited in this review and which should be included in the manuscript :

      Sam L Sharpe, Andrew P Anderson, Idelle Cooper, Timothy Y James, Alexandra E Kralick, Hans Lindahl, Sara E Lipshutz, J F McLaughlin, Banu Subramaniam, Alicia Roth Weigel, A Kelsey Lewis, Sex and Biology: Broader Impacts Beyond the Binary, Integrative, and Comparative Biology, Volume 63, Issue 4, October 2023, Pages 960-967.

      Included

      Masculinization of Gene Expression Is Associated with Exaggeration of Male Sexual Dimorphism Pointer MA, Harrison PW, Wright AE, Mank JE (2013) Masculinization of Gene Expression Is Associated with Exaggeration of Male Sexual Dimorphism. PLOS Genetics 9(8): e1003697.

      Included

      Erica V Todd, Hui Liu, Melissa S Lamm, Jodi T Thomas, Kim Rutherford, Kelly C Thompson, John R Godwin, Neil J Gemmell, Female Mimicry by Sneaker Males Has a Transcriptomic Signature in Both the Brain and the Gonad in a Sex-Changing Fish, Molecular Biology and Evolution, Volume 35, Issue 1, January 2018, Pages 225-241.

      Included

      Cardoso SD, Gonçalves D, Goesmann A, Canário AVM, Oliveira RF. Temporal variation in brain transcriptome is associated with the expression of female mimicry as a sequential male alternative reproductive tactic in fish. Mol Ecol. 2018; 27: 789-803.

      Included

      Dean, R., Wright, A.E., Marsh-Rollo, S.E., Nugent, B.M., Alonzo, S.H. and Mank, J.E. (2017), Sperm competition shapes gene expression and sequence evolution in the ocellated wrasse. Mol Ecol, 26: 505-518.

      Included

      Emilie C. Snell‐Rood, Amy Cash, Mira V. Han, Teiya Kijimoto, Justen Andrews, Armin P. Moczek, DEVELOPMENTAL DECOUPLING OF ALTERNATIVE PHENOTYPES: INSIGHTS FROM THE TRANSCRIPTOMES OF HORN‐POLYPHENIC BEETLES, Evolution, Volume 65, Issue 1, 1 January 2011.

      Not included, since its technical approach is not really comparable

      Harrison PW, Wright AE, Zimmer F, Dean R, Montgomery SH, Pointer MA, Mank JE (2015) Sexual selection drives evolution and rapid turnover of male gene expression. Proceedings of the National Academy of Sciences, USA 112: 4393-4398.

      Included

      Mathias Scharmann, Anthony G Rebelo, John R Pannell (2021) High rates of evolution preceded shifts to sex-biased gene expression in Leucadendron, the most sexually dimorphic angiosperms eLife 10:e67485.

      Included

      Sexually Antagonistic Selection, Sexual Dimorphism, and the Resolution of Intralocus Sexual Conflict. Robert M. Cox and Ryan Calsbeek , The American Naturalist 2009 173:2, 176-187.

      Included

      Ingleby FC, Flis I, Morrow EH. Sex-biased gene expression and sexual conflict throughout development. Cold Spring Harb Perspect Biol. 2014 Nov 6;7(1):a017632.

      Included

      Oliver JC, Monteiro A 2011. On the origins of sexual dimorphism in butterflies. Proc Biol Sci 278: 1981-1988.

      Included

      Iulia Darolti, Judith E Mank, Sex-biased gene expression at single-cell resolution: cause and consequence of sexual dimorphism, Evolution Letters, Volume 7, Issue 3, June 2023, Pages 148-156.

      Included

      Reviewer #2 (Recommendations For The Authors):

      I am concerned the smoothed density plots in Figure 4 may be providing a misleading sense of the distributions since each distribution is inferred from only 9 values. A boxplot might better represent the data to the reader.

      Boxplots with 9 values are much more difficult to interpret for a reader, this is the very reason why one tends to smoothen them. In this way, they also become similar to the standard plots that are used for showing morphological variation between the sexes. Note that the original data are availble for the individual values, if these are of special interest in some cases. In addition, our new “mosaic” analysis (Figure 6) provides another presentation for readers.

      Line 235: "the overall numbers are lower" I assume this is the number of genes included in the analyses, but this should be explicitly stated.

      Clarified in the text

      The analysis of gene expression from different brain regions in control individuals from the Alzheimer's study (line 273) suffers from low power and it is not clear to me how much taking samples from different brain regions eliminates the issue of different cell types within a sample (the stated motivation for this analysis). While I support publishing negative results, this section does not feel like it adds much to the manuscript and could be cut in my opinion.

      This is actually a study on single cell types, differentiating each of them. We are sorry that the text was apparently unclear about this. Given that there are studies that show the importance of looking at single cell data, we still think that is a suitable analysis. We have updated the text to make it clearer.

      It might be useful to separate out X-linked genes from autosomal genes to see if they show consistent patterns with regard to sex-bias.

      We have added this information in suppl. Table S2 and include some description in the text.

      Reviewer #3 (Recommendations For The Authors):

      Comments follow the order of the Results section:

      (1) The latter half of this line in the Methods is too vague to be helpful: "We have explored a range of cutoffs and found that a sex-bias ratio of 1.25-fold difference of MEDIAN expression values combined with a Wilcoxon rank sum test and Benjamini-Hochberg FDR correction (using FDR <0.1 as cutoff) (Benjamini & Hochberg, 1995) yields the best compromise between sensitivity and specificity". What precisely is meant by "the best compromise between sensitivity and specificity"?

      We explain now that this was based on pre-tests with comparing randomized with actual data. However, we agree that this is in the end a subjective decision, but there is no single standard used in the literature, especially when somatic organs are included. We consider our criteria as rather stringent.

      (2) The 1.25 number for sex bias is, ultimately, an arbitrary cut-off. It is common in this literature to choose some arbitrary level and, in this sense, the authors are following common practice. The choice of 1.25 should be stated in the main text as it is a lower (but not reasonable) value than has been used in many other papers.

      It is not only the cutoff, but also the Wilcoxon test and FDR correction that defines the threshold. See also comment above.

      (3) In truth, dimorphism is continuous rather than discrete (i.e, greater or less than 1.25 fold different). Thus, where possible it would be useful to present results in a fashion that allows readers to see the continuous range of ratios rather than having to worry about whether the patterns are due to the rather arbitrary choices of how genes were binned into sex-bias categories.

      It is necessary to work with cutoffs in such cases - and this is the usual practice for any such paper. But we provide now in Figure 1 Figure supplement 1 plots with the female/male ratio distributions.

      a) Number of genes that are female- / male-biased. I would like to be able to see a version of Figure 1 showing the full distribution of TPM ratios rather than bar graphs of the numbers of (arbitrarily defined) female- and male-biased genes. This will be, of course, a larger figure (a full distribution rather than 2 bars for each species for each organ) and so could be relegated to Supplementary Material (assuming the message of that figure is the same as the current Figure 1).

      This is a very unusual request, given that no other paper has done this either. It would indeed result in a non-managable figure size, or many separate figures that would be difficult to scrutinize. Note that there would be one plot of two (female and male) TPM distributions for each sex-biased gene in each organ and each taxon, leading to hundreds of thousands of plots. We think that by providing the general distributions as plots (see above), and the original data as supplements is sufficient.

      b) Turnover of genes with sex bias. This important issue is addressed in Figure 2. First, it is not precisely clear what "percentages of sums of shared genes for any pairwise comparison" in Figure 2 legend means and no further detail is given in the Methods; this must be made clearer or the info in Figure 2 is meaningless. Regardless, this approach again relies heavily on the arbitrary criterion of defining sex-bias. Thus, I would like to see correlation plots of the log(TPM ratio) between taxa as done in the classic multispecies fly paper of Zhang et al. 2007. In Figure 2 it is quite clear that male-biased genes evolve with respect to sex bias more rapidly than female-biased genes.

      We have provided a better explanation of this analysis. Note that the Zhang et al. 2007 paper was not focussing on somatic expression and covers a much broader evolutionary spectrum. Hence, the results are not comparable. Also, we doubt that it would be so helpful to generate a huge figure with all these plots.

      (4) Is there a simpler explanation for the results in the "Variance patterns" section? The total variance for any variable can be decomposed into the variance within and among "groups". If we use "sex" as the group, then there are genes - labelled sex-biased genes - that were identified as such, in essence, because they have high among-group variance. Given that we then know a priori at the start of this section of sex-biased genes have high among-group variance, is it at all surprising that they have higher total variance than the unbiased genes (which we know a priori have low among-group variance)? Perhaps I misunderstood the point of this section. Maybe it would be more meaningful to examine the WITHIN-SEX variance (averaged across the two sexes) instead.

      We did calculate IQR/median (“normalized variance”) with the nine mice for each gene and each sex in each organ, hence sex is not a variance factor in this calculation. The algorithm steps are outlined in suppl. Table S17. We have now also added a variance calculation for reciprocal gene sets and added an extended discussion of these results.

      (5) Analysis of alpha for sex-biased genes. This was the most interesting part of this manuscript to me.

      (a) More information about what SNVs were used is required.

      i. Were only sites where SPR was fixed used? (If not, how was polarization done?)

      ii. Were sites only considered diverged if they were fixed for different bases in DOM and MUS? (If not, what was the criteria?)

      iii. Using, say, DOM as the focal species, a site must be polymorphic in DOM. But did its status (polymorphic/fixed) in MUS matter?

      We have added a more detailed description on this in the Methods section. For the direct answers of the three questions: (i) yes; (ii) yes; (iii) no, considering that DOM and MUS are two subspecies of Mus musculus separating recently, a variant might occur before separating and there might be gene flow between them.

      (b) A particularly interesting part of the analysis is the investigation of alpha for genes that are NOT sex-biased in one taxa but are sex-biased in the other. At the moment (as I understand it), alpha is only calculated for these genes in the taxa where they are NOT sex-biased (and this alpha value can be compared to the alpha of sex-biased genes and of unbiased genes in that taxa). I would like to see both sets of genes (set 1: those sex-biased in MUS and not in DOM; set 2: those sex-biased DOM and not in MUS) analyzed in each of the 2 species, with results presented in a 2x2 table.

      By definition of these categories, these genes are sex-biased in the respective other taxon, hence the values are already in the table. They are named as “reciprocal”.

      (c) No confidence intervals are given for the alpha values, despite the legend of Figure 3 referring to them.

      These were accidentally omitted - we now included the full table in suppl. Table S6; Figure 3 was modified to show violin plots of the bootstrap distributions

      The author's creation and use of a "sex-bias index" (SBI). My greatest skepticism of this manuscript is with respect to the value of their manufactured index, SBI. Of course, it is possible to create such an index but does this literature really need this index or does this just add to the "clutter" in the literature for this field? Is it helping to illuminate important patterns? This index is presumably some attempt to quantify how "male-like" or "female-like" overall expression is for a given individual (for a given organ). It is calculated as SBI = (MEDIAN of all female-biased tpm) - (MEDIAN of all male-biased tpm).

      (6) A main result that comes from this is that the sexes tend to overlap for these values for most nongonad tissues but are clearly distinct for gonadal tissues. I do not think this result would come as a surprise to almost anyone and I'm far from convinced that this metric is a good way to quantify that point. Let's consider testes vs. ovaries. Compared to non-gonadal tissues, I am reasonably certain that not only are there many more genes that are classified as "sex-biased" in gonads but also the magnitude of sex-bias among these genes is typically much greater than it is for the so-called sex-biased genes in nongonadal tissue (density plots requested in #3a would make this clear). In other words, males and females are, on average, very different with respect to expression in gonads so even allowing for variation within each sex will still result in a clear separation of all individuals of the two sexes. In contrast, males and females are, on average, much less different in, say, heart so when we consider the variation within each sex, there is overlap. One could imagine a variety of different metrics which could be used to make this point. The merits of "SBI" are unclear. It is a novel metric and its properties are poorly understood. (A simple alternative would be looking at individual scores along the axis separating mean/median males and females; almost certainly, for gonads, this would be very similar to PC scores for PC1.)

      As throughout the text, we use gonadal comparisons only as general reference, not as the main result. The main result that we are stressing is the fast turnover of these patterns, including from binary to overlapping for kidney and liver in mouse. We consider this as a new finding. If it comes "not to a surprise to anyone", isn´t it great that one does not have to guess anymore but has finally real data on this?

      We have now also added a mosaic analysis to show that the SBI can be used as summary measure in different presentations.

      The use of a single PC axis is no good alternative, since it throws away the information from the other axis.

      We have now included an explicit discussion on the usefulness of the SBI.

      (7) For simplicity, let's assume all males are identical and all females are identical. Let's imagine that heart and kidney have the exact same set of sex-biased genes. There are 20 female-biased genes; they all happen to be identical in expression level (within tissue) and look like this:

      Female TPM Male TPM TPM ratio (F:M)

      Heart 4 2 2

      Kidney 40 20 2

      And there are 20 male-biased genes that look like this:

      Female TPM Male TPM TPM ratio (F:M)

      Heart 1 3 1/3

      Kidney 10 30 1/3

      Most people would describe these two tissues as equally sex-biased.

      However, the SBIs would be:

      Female SBI Male SBI Sex difference (F - M)

      Heart 4-1 = 3 2 - 3 = -1 4

      Kidney 40-10 =30 20-30 = -10 40

      Is it a desirable property that by this metric these two tissues have wildly different SBI values for each sex as well as for the difference between sexes? (At the very least, shouldn't you make readers aware of these strange properties of SBI so they can decide how much value they put into them?)

      Actually, in this example the simple ratio between the expression levels has a strange property, since it does not reflect a much higher expression of the relevant genes in the kidney. The SBI is actually more suitable for making such cases clear. Of course, this is under the assumption that expression level has a meaning for the phenotype, but this is the general assumption for all RNA-Seq experiment comparisons.

      (8) With respect to Figure 4, why do females often have mean SBI values close to zero or even negative (e.g., kidney, mammary glands)? Is this simply because the female-biased genes tend to have lower TPM than the male-biased genes? It seems that the value zero for this metric is really not very biologically meaningful because this metric is a difference of two things that are not necessarily expected to be equal.

      This is the extra information about the expression levels that is gained via the SBI values (see comment above). However, we noticed that people can get confused about this. We have now added a re-scaling step to focus completely on the variance information in these plots.

      (9) Interpreting variances. A substantial fraction of the latter half of the manuscript focuses on interpreting variances among individual samples. This is problematic because there is no replication within individuals (i.e.., "repeatability"), thus it is impossible to infer the extent of observed variance among individuals of a given group (e.g., among females) is due to true biological differences among individuals or is simply due to noise (i.e., "measurement error" in the broad sense). Is the larger variance for mammary glands than liver or gonads just due to measurement error? What is the evidence?

      This point was of course a major issue during the times where microarrays were used for transcriptome studies. However, the first systematic RNA-Seq studies showed already that the technical replicability is so high, that technical replicates are not required. In fact, practically all RNA-Seq studies are done without technical replicates for this reason.

      (10) Because I have little confidence in the SBI metric (#7-8) and in interpreting within sex variances (#9), I found little value in the human results and how SBI distributions (and degree of overlap between sexes) compare between humans and mice.

      We disagree - the current published status is that there are thousands of sex-biased gene in humans and this has implications for gender-specific medicine (Oliva et al. 2020). Our results show a much more nuanced picture in this respect.

      (11) I found even less value in the single-cell data. It too suffers from the issues above. Further, as the authors more or less state, the data are too limited to say much of value here. It is impossible to tell to what extent the results are simply due to data limitations.

      We have pointed out that it is still valuable to have them. They are good enough to exclude the possibility that only a small set of cells drives the overall pattern across an organ. We have further clarified this in the text.

      (12) The code for data analysis should be posted on GitHub or some other repository.

      The code for the sex-biased gene detection and analysis has been posted on GitHub (see Code availability in the manuscript).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public reviews:

      Reviewer #1:

      Weaknesses:

      As this paper only uses anatomical analyses, no functional interpretations of cell function are tested.

      The aim of this paper was to describe the ultrastructural organization of compound eyes in the extremely small wasp Megaphragma viggianii. The authors successfully achieved this aim and provided an incredibly detailed description of all cell types with respect to their location, volume, and dimensions. As this is the first of its kind, the results cannot easily be compared with previous work. The findings are likely to be an important reference for future work that uses similar techniques to reconstruct the eyes of other insect species. The FIB-SEM method used is being used increasingly often in structural studies of insect sensory organs and brains and this work demonstrates the utility of this method.

      We thank you for your high assessment of our work. Unfortunately, it is hard to test our functional interpretations and check them with electrophysiological methods due to the extremely small size of the animal. Studies on three-dimensional ultrastructural datasets obtained using vEM have just started to appear, and we hope that a lot of data will become available for comparison in the nearest future.

      Reviewer #2:

      Thank you for your work and for your high assessment of our manuscript.

      Reviewer #3:

      Weaknesses:

      The claim that the large dorsal part of the eye is the dorsal rim area (DRA), supported by anatomical data on rhabdomere geometry and connectomics in authors' earlier work, would eventually greatly benefit from additional evidence, obtained by immunocytochemical staining, that could also reveal a putative substrate for colour vision. The cell nuclei that are located in the optical path in the DRA crystalline cone have only a putative optical function, which may be either similar to pore canals in hymenopteran DRA cornea (scattering) or to photoreceptor nuclei in camera-type eyes (focussing), both explanations being mutually exclusive.

      We thank the Reviewer for high assessment of our study and for detailed analysis of our manuscript. Your comments and recommendations are very valued and helped us to improve the text. We understand that immunocytochemical methods could improve our findings and supply additional evidence, but there is no technical possibility for this in present. Megaphragma is a very complicated model organism for such methods. We are currently working on the optimization of the protocol for staining, which is needed because of the high level of autoluminescence and because of insufficient penetration of dyes into the samples.

      Recommendations for the authors:

      Reviewer #1:

      I do not have any major concerns about the content of the paper.

      There are some minor spelling and grammatical errors throughout the text but these can be identified most readily using a spelling/grammar check.

      We have revised the text, checked the spelling, and fixed the grammatical errors throughout the text.

      I suggest consistency when referring to the capitalization of the term 'non-DRA' as it is sometimes 'Non-DRA' in the text.

      We have fixed the term “non-DRA” throughout the text. Thank you.

      Also, check carefully the spelling of headings in the tables as there are a few mistakes in Table 1 and 5 in particular.

      The grammar errors have been fixed.

      Figure 7 legend: an explanation of the abbreviation RPC should be added.

      We have done so.

      Reviewer #2:

      (1) The paper presents the data in great detail, however, since this is the first time the technique has been applied to get whole insect eyes, even if on a small insect, it would be worth outlining in the methods section what innovations in the staining/ scanning or sample preparation allowed these improvements and a roadmap for extending this method to larger insects if possible.

      The whole method, including sample preparation, staining, and scanning, was described in our previous paper (Polilov et al., 2021), where it was presented in every detail. Due to the complicated methodology we suppose that it is not necessary to include all the stages of the technique in the present paper, and thus described it more briefly.

      (2) The optical modelling needs a statement in the discussion providing a disclaimer on parameters like sensitivity, anatomical measurements can provide limits and some measure, but the inherent optics are also key and it is worth qualifying these as only estimates and measurements that give a sense of the variation in morphology, only coupled with optical and potentially neural measurements could one confirm the true sensitivity and acceptance angle.

      In the absence of experimental data or precise computational models of Megaphragma vision, we try to discuss rather carefully the functions of structures based on their morphology, ultrastructure, first-order visual connectome, and analogies with other species. This is reflected in the methods and those sections of our paper that contain functional interpretations.

      Reviewer #3

      (1) The finding that the CNS neurons are enucleated, while the compound eye contains cell nuclei, deserves another word. I would confidentially say that the optical demands of a miniaturized compound eye (the minimal size of the optics due to diffraction, the rhabdomere size, and the minimal thickness of optically insulating granules) are such that further cellular miniaturization is not possible, and the minimal sizes even render the cells that build the eye sufficiently large to accommodate cell nuclei. This is in my opinion a parsimonious explanation, yet speculative and I leave it up to you to embrace it or not.

      We agree with the Reviewer and understand the limiting factors and the optical demands of a miniaturized compound eye. According to our data, nuclei occupy a considerable volume in the eye (in the cells of compound eye there are more nuclei than in the whole brain), and on average the cell volume is larger than in Trichogramma, which is minute, but larger than Megaphragma. But as the Reviewer rightly assumed, it is speculative; therefore, we would like to avoid it.

      (2) Our current understanding of DRA optics and function is limited and I claim that your interpretation of the cell nuclei in the DRA dioptrical apparatuses is inappropriate. Please consider a few articles on hymenopteran DRA, starting with the one below and the citing literature:

      Meyer, E.P., Labhart, T. Pore canals in the cornea of a functionally specialized area of the honey bee's compound eye. Cell Tissue Res. 216, 491-501 (1981). https://doi.org/10.1007/BF00238646

      Honebyee DRA has a milky appearance under a stereomicroscope and can be discerned from the outside. This is due to pore canals in the cornea. I happen to be studying this exact structure and its function right now. I found that the result of those canals is not so much the extended receptor acceptance angles, but rather a minimized light gain. This is counterintuitive, but think of the following. The DRA photoreceptors must encode the limited range of polarization contrasts with a maximal working dynamic range (= voltage) of the photoreceptors, which results in a very steep stimulus-response curve.

      Physiologically such a curve is due to very high transduction gain and a high cell input resistance. In most of the retina, small contrasts are transcoded by LMC neurons, but DRA receptors are long visual fibres and must do the job themselves. The skylight intensity (especially antisolar, where the polarized pattern is maximal) varies little during the day. Hence, the DRA receptors work almost at a fixed intensity range. In order to prevent receptor saturation and keep steep contrast coding, the corneal lenses in DRA have a built-in diffusor ring, which diminishes the light influx. Unfortunately, I have yet to publish this and I may be wrong, of course. But if I look into your data, I see consistently smaller corneal lenses and crystalline cones in the DRA, plus the cell nuclei obstructing the incident light. I think this is similar to the optics of honeybee DRA.

      You do not support your claim that the nuclei additionally focus light by optical calculations, but cite literature on camera-type eyes, which is not OK.

      In any case, I think it is fair to limit the discussion by saying that the nuclei may have an optical role. Further evidence from hymenopteran and vertebrate literature is controversial. “so that the nuclei act as extra collecting lenses, as was reported for rod cells of nocturnal vertebrates (Solovei et al., 2009; Błaszczak et al., 2014)” - please consider omitting this.

      We thank the Reviewer for this piece of advice. And we have rewritten the text, to omit the comparison with vertebrates, but left the citation as an illustration of the fact that nuclei could perform the optical role.

      “Since the nuclei in DRA and non-DRA ommatidia are arranged differently in cone cells, we suggest that the nuclei of the cone cells of DRA ommatidia in M. viggianii perform some optical role, facilitating the specialization of this group of ommatidia. The optical function for nuclei was described for rod cells of nocturnal vertebrates, where chromatin inside the cell nucleus has a direct effect on light propagation (Solovei et al., 2009; Błaszczak et al., 2014; Feodorova et al., 2020).”

      (3) Please consider comparing the structure and function of ectopic receptors with the eyelet in Drosophila (i.e. https://doi.org/10.1523/JNEUROSCI.22-21-09255.2002 )

      We thank the Reviewer for this advice and have included the comparison fragment into the text:

      “The position of ePR, their morphology and synaptic targets look similar to the eyelet (extraretinal photoreceptor cluster) discovered in Drosophila (Helfrich-Förster et al., 2002). Eyelets are remnants of the larval photoreceptors, Bolwig’s organs in Drosophila (Hofbauer, Buchner, 1989). Unlike Drosophila, Trichogrammatidae are egg parasitoids and their central nervous system differentiation is shifted to the late larva and even early pupa (Makarova et al., 2022). According to the available data on the embryonic development of Trichogrammatidae, no photoreceptors cells were found during the larval stages (Ivanova-Kazas, 1954, 1961).”

      According to this, the analogy question remains open.

      (4) Minor remarks:

      “but also to trace the pathways that connect the analyzer with the brain.” - I find the word analyzer a bit stretched here; sure, the DRA is polarization analyzer, but if the main retina was monochromatic, it would only be a detector, not an analyzer.

      The sentence was changed according to the Reviewer’s advice.

      Table I: thikness -> thickness, wigth -> width

      We have fixed these misprints.

      “The cross-section of Non-DRA ommatidia has a strongly spherical shape” - perhaps circular, not spherical. And not necessary to say “strongly”

      The spelling was changed according to the Reviewer’s advice.

      “which can be rarely visualized in the cell's projections not far from the basement membrane.” - I'd suggest saying “which are nearly absent in retinula axons”

      The spelling was changed according to the Reviewer’s advice.

      “The pigment granules of the retinula cells have an elongated nearly oval shape” - please consider replacing 'elongated nearly oval' with 'prolate' (try googling for “prolate” or “oblate spheroids”; the adjective describes precisely what you wanted to say)

      We thank the Reviewer for this piece of advice but prefer to leave our original phrasing, because it is more readily understandable.

      “The results of our morphological analysis of all ommatidia in Megaphragma are consistent with the light-polarization related features in Hymenoptera and other insects” - please add citations, see my comment on the DRA above.

      We have added the citations according to the Reviewer’s advice.

      “The group of short PRs (R1-R6)” - please consider renaming into “short visual fibre photoreceptors” (as opposed to “long visual fibre PRs”; hence SVFs and LVFs). This naming is quite common.

      The naming was changed according to the Reviewer’s advice.

      “The total rhabdom shortening in M. viggianii ommatidia probably favors polarization and absolute sensitivity,” - please see comments on DRA. Wide rhabdom means also a wider acceptance angle.

      Shortening of DRA rhabdoms does not result in their widening compared to other rhabdoms, so it is difficult to say how this may be related to sensitivity. The comments on DRA given earlier have been taken into account.

      “Ommatidia located across the diagonal area of the eye are more sensitive to light” - I don't understand what is diagonal area.

      We have deleted the sentence.

      “Estimated optical sensitivity of the eyes very close to those reported for diurnal hymenopterans with apposition eyes (Greiner et al., 2004; Gutiérrez et al., 2024) and possess around 0.19 {plus minus} 0.04 μm2 sr. M. viggianii have reasonably huge values of acceptance angle Δρ, and thus should result in a low spatial resolution” - please correct English here. “eyes IS very close”, “should result in a low”

      The grammatical errors were fixed.

      Table 6 legend: “SPC - secondary pigment cells.” -> “SPC – secondary pigment cells.”

      Citation “(Makarova et al., 2025).” - probably 2015

      The typos were fixed.

      Methods, FIB-SEM: I can't understand the sentence “The volumetric data of lenses and cones, some linear measurements (lens thickness, cone length, cone width, curvature radius) and to visualize the complete 3D-model of eye we use (measure or reconstruct) the elements from another eye (left).”

      The sentence is a continuation of the previous one. We have rewritten it as follows to clarify the meaning and move it to the 3D reconstruction section:

      “The right eye, on which the reconstruction was performed, has several damaged regions from milling (see Appendix 1С), which hinder the complete reconstructions of lenses and cones on a few ommatidia. According to this, for the volumetric data on lenses and cones, some linear measurements (lens thickness, cone length, cone width, curvature radius), we use (measure or reconstruct) the corresponding elements from the other (left) eye.”

      “The cells of single interfacet bristles were not reconstructed, because of damaging on right eye and worst quality of section on the left.” - please change to “The cells of the single interfacet bristle were not reconstructed, because of damage to the right eye and inferior quality of the sections of the left eye.”

      The text has been changed as follows:

      “The cells of single interfacet bristles were not reconstructed, because of the damage present in the right eye and because of the generally lower quality of this region on the left eye.”

      “Morphometry. Each ommatidia was” -> “Morphometry. Each ommatidium was”

      The grammatical error has been fixed.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #3 (Recommendations for the authors):

      Major concerns:

      P.6, lines 223-224: The sentence sounds like the authors produced all the OVGP1s by themselves in their laboratories, which is not completely true. The recombinant human and mouse OVGP1s were purchased from OriGene. It is suggested that the authors should state and explain clearly here which OVGP1 is produced by their laboratories and that recombinant human and mouse OVGP1s were obtained and purchased from Origene.

      It is already clearly included in the M&M.

      P6, lines 227-229: The authors stated that "Western blots of the three OVGP1recombinants indicated expected sizes based on those of the proteins: 75 kDa for human and murine OVGP1 and around 60 kDa for bovine OVGP1 (Fig. 4B and S6)." I pointed out in my last review report that the size of the recombinant human OVGP1shown by the authors in their manuscript is not in agreement with what has been published previously in literature regarding the molecular weight of native human OVGP1 as well as that of recombinant human OVGP1. The authors did not address the above concern adequately. In fact, recombinant human OVGP1 has been produced a few years ago (Reproduction (2016) 152:561-573) and it has been previously demonstrated that a single protein band of approximately 110-130 kDa was detected for both native human OVGP1 (see Microscopy Research and Technique (1995) 32:57-69) and recombinant human OVGP1 (Reproduction (2016) 152:561-573; Carbohydrate Research (2012) 358:47-55) using antibodies specific for human OVGP1. Molecular weight of the protein core or polypeptide of human OVGP1 is approximately 75 kDa, but the glycosylated form of native human OVGP1 and recombinant human OVGP1 is approximately 110-130 kDa. Therefore, the authors might have been using the recombinant core protein of human OVGP1 instead of the fully glycosylated recombinant OVGP1 in their study. The same concern also applies to the commercially obtained mouse recombinant OVGP1 used by the authors in their study. I would also like to mention that the mature and fully glycosylated OVGP1s in mammals vary in molecular weight (90-95 kDa in domestic animals; 110-150 kDa in primates; 160-350 kDa in rodents). Again, the 75kDa of mouse OVGP1 detected by the authors could be the core protein or polypeptide of mouse OVGP1 instead of the fully glycosylated mouse OVGP1.

      In our study, as previously mentioned, we included commercially available recombinant proteins from Origene for human and murine OVGP1, which are produced in mammalian cells, and we also produced and purified bovine OVGP1 in mammalian cells. Therefore, these proteins should be properly glycosylated. Moreover, we performed Western blot assays favouring the blotting of higher molecular weight proteins, ensuring the optimal conditions for the assay. Additionally, we tested the size of OVGP1 from murine and bovine oviductal fluids on the same blot. During oestrus, the size of OVGP1 from oviductal fluids matches that of the recombinant proteins, and this band is downregulated during anoestrus, confirming the proper size of recombinant protein.

      P.7, lines 236 and 237: Please provide a figure or source to support the statement "...as confirmed by proteomics of the bands along with PEAKS Studio v11.5 search engine peptide identification software."

      It is included in the text the amount of unique peptides obtained by Proteomics for OVGP1 identification over all protein groups identified.

      P.7, lines 243 to 245: The statement "...using rabbit polyclonal antibody to human OVGP1 for bOVGP1 and endogenous OVGP1, and mouse monoclonal antibody against Flag (DDK)-tag for hOVGP1 and mOVGP1." is confusing and might be inaccurate. First of all, I wondered why the authors did not use an antibody against bovine OVGP1 for the recombinant bOVGP1 instead of using a rabbit polyclonal antibody to human OVGP1. Secondly, what does the "endogenous OVGP1" refer to in the statement? Thirdly, the authors in their study used the commercially available recombinant human OVGP1 and recombinant mouse OVGP1 purchased from Origene. Based on the data sheet provided by Origene, the tag used for both recombinant human OVGP1 and recombinant mouse is C-Myc/DDK-tag and not Flag-tag. Can the authors explain these discrepancies?

      Firstly, for the recombinant protein of bOVGP1 we used the same antibody that we used in the Western blot for all the proteins and oviductal fluids because we do not have anti-His tag working for Immunofluorescence (the one we had only worked for Western blot) and neither we do not have any antibody against bovine OVGP1. In the case of human and murine since we had anti-Flag antibody that worked for Western blot and for immunofluorescence, we used this one. However, as has been shown in our figure and supplementary material, the antibody against human OVGP1 works properly for both techniques (Western blot and Immunofluorescence). Secondly, endogenous OVGP1 is referred to the OVGP1 present in the oviductal fluid. Thirdly, as you can see in the datasheet of the protein, the recombinant proteins purchased from Origene contains a c-myc tag (EQKLISEEDL) some amino acids and a ddk-tag (DYKDDDDK). The sequence of ddk is the same of Flag-tag (DYKDDDDK). Since the proteins have both tags we used the antibody against Flag (or ddk) epitope.

      P12, lines 429-432: The newly added statement at the end of the Discussion saying "Additionally, future studies would be valuable to investigate whether incubating oocytes with oviductal fluid (or OVGP1) could reduce polyspermy in porcine IVF and whether ZPs could be leveraged to naturally enhance sperm selection in human ICSI" is very concerning and requires further attention. The statement reflects that the authors do not keep pace with and do not pay attention to what has been published in literature regarding porcine and human OVGP1s. In fact, porcine oviduct-specific glycoprotein (OVGP1) has already been reported to reduce the incidence of polyspermy in pig oocytes (Biology of Reproduction (2000) 63:242-250). Porcine oviductal fluid, used in porcine IVF, has also been found to exert a beneficial effect on oocytes by reducing the incidence of polyspermy without decreasing the penetration rate. (Theriogenology (2016) 86:495-502). Therefore, the studies deemed valuable by the authors to be investigated in the future have, in fact, already been carried out two decades ago by several other laboratories. I am surprised the authors were not aware of these published work in literature. All the above should have been incorporated in the Discussion.

      This sentence is modified in the discussion and the references are included.

      Furthermore, as mentioned earlier, recombinant human OVGP1 has also been produced (Reproduction (2016) 152:561-573), and recombinant human OVGP1 has been found to increase tyrosine phosphorylation of sperm proteins, a biochemical hallmark of sperm capacitation, and potentiate the subsequent acrosome reaction (Reproduction (2016) 152:561-573) as well as increase sperm-zona binding (Journal of Assisted Reproduction and Genetics (2019) 36:1363-1377). These earlier findings should be incorporated into the Discussion.

      Thank you for your comment, but in this work we had not performed any experimental setting related to tyrosine phosphorylation and despite is a very interesting topic is not directly related to this work.

      P.19, lines 678-683: Since the human and mouse recombinant oviductin proteins were purchased from Origene, the authors should be aware of the fact that these commercially available recombinant OVGP1s might not be fully glycosylated. While I appreciate the fact that the authors wanted to briefly describe how the human and mouse recombinant OVGP1s were prepared by the manufacturer, I strongly suggest that the authors should contact Origene, the manufacturer, for all information regarding the procedures for producing the human and mouse recombinant oviductin proteins. For example, the authors stated on lines 680-681 that "A sequence expressing FLAG-tagged epitope proteins (DYKDDDDK) was cloned into an expression vector." According to the data sheet provided by Origene, it appears that both human and recombinant oviductin proteins are C-Myc/DDK-tagged and not FLAG-tagged.

      Thank you for your comment, as according to the sequence of Flag-tag it is matching with the sequence of the tag in the datasheet corresponding to DDK (this is in detail in previous comment). Besides, the protein is tagged also by C-Myc tag. Among both tags, the antibody selected to detect it was anti-Flag tag.

      P.19, lines 692-697: The description of the primary and secondary antibodies used for detection of the various recombinant OVGP1s is also very confusing and not clearly presented. For example, it is mentioned here that "...membranes were...incubated with anti-OVGP1 rabbit monoclonal antibody for OVGP1,..". What specifically does "OVGP1" refer to here? The authors then stated that anti-Histamine Tag antibody was used to detect bOVGP1 and mOVGP1 and anti-Flag antibody was used to detect hOVGP1. As pointed out earlier, the human and mouse recombinant OVGP1s were produced using C-Myc/DDK tag and not His-tag or Flag-tag. Can the authors clarify these discrepancies?

      We apologise for the complexity of the antibodies, we included in this paragraph the ones used to Western blot for both figures: anti- human OVGP1 was used for the principal figure that contains the three recombinant proteins and oviductal fluids; and the anti-Histidine and anti-Flag antibodies that are included in supplementary figure, specifically for recombinant bovine OVGP1 (Histidine tag) and for recombinant murine and human OVGP (DDK tag). A clarifying sentence has been included in the text.

      P.31, lines 1143-1149: Figure 10 is not mentioned anywhere in the main text of the manuscript. Rewrite the second half of the sentence "...; being this specificity lost when OVGP1 is heterologous to the ZP (right diagram)." Which sounds awkward and grammatically not correct.

      The figure is already mentioned in the text, thank you for your comment. The sentence is also corrected.

      Other comments: P.1, the statement of "All authors contributed equally to this work" on line 14 can be deleted because detailed and specific contributions from each authors are listed in lines 1009-1017 on page 27.

      Both authors contributed equally to this work, now is clear in authors contribution section.

      P.2, lines 43 and 44: Do the authors mean "sperm-oocyte binding protein" instead of "sperm-oocyte fusion protein" in the sentence? "Fusion protein" is a protein composed of two or more domains encoded by different genes, or a hybrid molecule created by combining two different proteins for various purposes. I believe the term "fusion protein" is wrongly used in the sentence which should be rephrased with a proper term.

      Done.

      P2, line 73: Remove the comma after the word "Both".

      Done.

      P.5, line 179: "...mice ZP..." should be written as "...mouse ZP...".  

      Done.

      P.6, heading of 3rd paragraph on line 207: The term "binding" will be a better term than "fusion" used in the heading because the results do not actually show the fusion of the OVGP1 proteins with the ZP glycoprotein. Instead, binding of the OVGP1 proteins to the ZP occurred.

      Done.

      P.6, lines 215-217: Authors, please provide a reference or references to support the statement "Region A, corresponding to the amino acid end, shows high identity among monotremes, marsupials and placentals."

      In the text was indicated a review (29) which includes the supporting idea of this statement for Figure 4. Moreover, we have included some if the references used for the description of the domains when performing the sequence alignment of Figure S5.

      P.6, line 230 and line 233 on P.7: Authors, please be consistent in the use of either American English or British English. The word "oestrus" is British English whereas "estrus" is American English.

      Done.

      P.7, line 264: The word "sticking" used here means non-specific binding. I believe the author means specific binding here. If so, a more appropriate word should be used here instead of "sticking".

      Done.

      P.7, lines 267-269: This newly added sentence sounds very awkward and should be completely rewritten.

      Done.

      P.8, line 288: This reviewer finds it difficult to understand the meaning of the heading. The heading should be rephrased to bring out exactly what the authors want to say in well-written English.

      Done.

      P.8, line 290: The word "would" should be replaced by "could" in the sentence.

      Done.

      P.13, line 437: Authors, please provide the location of Sigma-Aldrich.

      Done.

      P.13, line 457: Here, the authors used "1800 rpm" to indicate the centrifugation speed but used the g-force elsewhere in the Materials and Methods. Please be consistent. The g-force is preferred.

      Done.

      P.14, lines 483-485: The procedure of sacrificing the cats should be provided in the Materials and Methods

      Cats weren’t sacrificed they were vasectomized. It is now included in the text.

      P.17, line 628: "...the ZPs were exposed or no exposed to..." should be written as "...the ZPs were either exposed or not exposed to...".

      Done.

      P.17, line 629: "...each groups were incubated with..." should be "...each group was incubated with...".

      Done.

      P.19, line 700: "As loading control, was used the primary antibody....." is not a complete sentence and it needs to be rewritten.

      Done.

      P.20, lines 744-754: For scanning electron microscopy and image processing, the procedures of prior treatment of the oocytes with and without oviductal fluid and OVGP1 should be included here.

      Done.

      P.21, line 756: It is stated here that "Two hundred isolated ZPs were treated with Clostridium perfringens neuraminidase....". However, it is not clear whether two hundred isolated ZPs of both porcine and murine ZPs were treated. Authors, please clarify.

      We used 200 isolated ZPs of each specie, bovine and murine. It is classified in the text.

      P.28, lines 1039 and 1040: The author only mentioned the use of bovine and murine sperm here. What about human sperm?

      Done.

      P.29, line 1076: "...in mammalian cells..." is very vague. Be specific what exactly the mammalian cells were.

      Done.

      P.29, line 1079: "Oviductal fluid from ovulated cows or anoestrus cows." is not a complete sentence and it needs to be rewritten.

      Done.

    1. Author response:

      Conflation of control, difficulty and reward rate

      In response to the comment of control being conflated with task difficulty (and thus reward rate) that the reviewer feels is not adequately discussed in the paper, we will add more to this point in our discussion, especially in relation to previous literature. It is important to note, however, that our measure of perceived difficulty was included in analyses assessing the fluctuations in stress and control. Subjective control still had a unique effect on the experience of stress over and above perceived difficulty, suggesting that subjective control explains variance in stress beyond what is accounted for by perceived difficulty. We will also include additional analyses in which we include the win rate (i.e. percentage of all trials won) as a covariate when assessing the relationship between subjective control, perceived difficulty and subjective stress, which shows that win rate does not predict stress, but subjective control and perceived difficulty still uniquely predict subjective stress. The results of this will be added and elaborated further in the discussion.

      Neutral video condition

      In response to the comment of the neutral video condition not being active enough, we believe that any task with action-outcome contingencies would have a degree of controllability. To better distinguish experiences of control (WS task) to an experience of no/neutral control (i.e., neither high nor low controllability), we decided to use a task in which no actions were required during the task itself, although concentration was still required (attention checks regarding the content of the videos and ratings of the videos).

      The suggestion of having a high arousal video condition would indeed be interesting to test how experiencing ‘neutral’ control and high(er) stress levels preceding the stressor task influences stress buffering and stress relief. This is a good suggestion for future work that we can include in the discussion section.

      The TSST version (online and anticipatory)

      We will add more information regarding prior literature that the Trier Social Anticipatory Stress test has found physiological and psychological correlates (e.g. Nasso et al., 2019, Schlatter et al., 2021, Steinbeis et al., 2015), suggesting that the anticipation is still a valid stress manipulation despite participants not performing the actual speech task. Further, the TSST had a significant impact on subjective stress in the expected direction demonstrating that it was effective at eliciting subjective stress.

      Internal consistency

      We will parcellate the timepoints differently (not just odd/even sliders) to test the internal consistency, for example a random split or first half/second half.

      Effect of win-loss domain in Study 2

      We will run additional analyses testing the interaction of Domain (win or loss) with stressor intensity when predicting the stress buffering and stress relief effects. To test whether the loss domain is more valuable at mitigating experiences of stress than the win condition, we will run additional analyses with just the high control conditions (WS task) to test for a Domain*Time interaction, as we cannot test a Control*Domain*Time interaction in the full model given that we do not have ‘Domain’ for the video (neutral control) condition.

      Stress relief analyses

      Regarding the stress relief analyses (timepoints 2 and 3) and ‘baseline’ stress (timepoint 1), we will add to the manuscript that there is no significant difference in stress ratings between the high control and neutral control (collapsed across stress and domain) after the WS/video task, hence why we do not think it’s necessary to include in the stress relief model. Nevertheless, we will include a sensitivity analysis in the supplementary material to test the Timepoint*Control interaction (of stress relief – timepoints 2 and 3) when including timepoint 1 stress as a covariate.

      Clarity

      We will add more clarity in the methods section regarding within- and between-subject manipulations. We will also add Figure S4 to the main manuscript and expand Figure 1 to include both Studies 1 and 2 and a timeline of when subjective stress was assessed throughout the experiment.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      Busch and Hansel present a morphological and histological comparison between mouse and human Purkinje cells (PCs) in the cerebellum. The study reveals species- specific differences that have not previously been reported despite numerous observations of these species. While mouse PCs show morphological heterogeneity and occasional multi-innervation by climbing fibers (CFs), human PCs exhibit a widespread, multi-dendritic structure that exceeds expectations based on allometric scaling. Specifically, human PCs are significantly larger, and exhibit increased spine density, with a unique cluster-like morphology not found in mice.

      Strengths:

      The manuscript provides an exceptionally detailed analysis of PC morphology across species, surpassing any prior publication. Major strengths include a systematic and thorough methodology, rigorous data analysis, and clear presentation of results. This work is likely to become the go-to resource for quantitation in this field. The authors have largely achieved their aims, with the results effectively supporting their conclusions.

      We are grateful to this reviewer for their thoughtful assessment that this work will be a go-to resource for the field.

      Weaknesses:

      There are a few concerns that need to be addressed, specifically related to details of the methodology as well as data interpretation based on the limits of some experimental approaches. Overall, these weaknesses are minor.

      We thank this reviewer for their careful reading of the manuscript and for highlighting limitations and weaknesses in the methodology. We are in full agreement that while interpretation is somewhat limited, there is still value in their description. As detailed below in response to this reviewer’s recommendations, we provide more description of our imaging resolution. This additional detail clarifies that our quantitation is appropriate for the scale of the objects being measured and provides critical information to help readers assess the findings as they may pertain to their own work.

      Reviewer #2 (Public review):

      Summary:

      This manuscript aims to follow up on a previously published paper (Busch and Hansel 2023) which proposed that the morphological variation of dendritic bifurcation in Purkinje cells in mice and humans is indicative of the number of climbing fiber inputs, with dendritic bifurcation at the level of the soma resulting in a proportion of these neurons being multi-innervated. The functional and anatomical climbing fiber data was obtained solely from mice since all human tissue was embalmed and fixed, and the extension of these findings to human Purkinje cells was indirect. The current comparative anatomy study aims to resolve this question in human tissue more directly and to further analyse in detail the properties of adult human Purkinje cell dendritic morphology.

      Strengths:

      The authors have carried out a meticulous anatomical quantification of human Purkinje cell dendrites, in tissue preparations with a better signal-to-noise ratio than their previous study, comparing them with those from mice. Importantly, they now present immunolabelling results that trace climbing fiber axons innervating human PCs. As well as providing detailed analyses of spine properties and interesting new findings of human PC dendritic length and spine types, the work confirms that human PCs that have two clearly distinct dendritic branches have an approximately x% chance of receiving more than one CF input, segregated across the two branches. Albeit entirely observational, the data will be of widespread interest to the cerebellar field, in particular, those building computational models of Purkinje cells.

      We thank this reviewer for their positive and considered assessment of our work. We enthusiastically agree that while these data are descriptive in nature, they may be of interest across modalities of cerebellar research and will provide a more detailed framework for cross-species comparisons and single cell computational modeling, which remains a critical tool to explore the human case given the inaccessibility of physiological experimentation.

      Weaknesses:

      The work is, by necessity, purely anatomical. It remains to be seen whether there are any functional differences in ion channel expression or functional mapping of granule inputs to human PCs compared with the mouse that might mitigate the major differences in electronic properties suggested.

      We are in full agreement with the reviewer that the focused anatomical description of this manuscript could not make strong assertions about function given that cellular and circuit physiology is determined by many additional factors that remain unexamined. We appreciate that the reviewer acknowledges that this is out of necessity as those factors are inaccessible to experimentation at the current time; however, we are enthusiastic that our current findings will motivate future work that will shed light on these critical additional features of the system, both in rodents and humans.

      Reviewer 1 (Recommendations for the authors):

      PCs are now known to be genetically diverse, with unique PC types found only in humans. Could this cellular diversity contribute to the differences observed between species in this study? This possibility should be at least discussed in the context of the findings.

      We agree that this is a fascinating possibility. The perhaps most detailed recent study (Sepp et al., Nature 625, 2024) – in a conservative assessment – describes four developmental PC subtypes in mice that are identical in humans. The study points out that the subtype ratio changes over the course of development, though. Taken together with the possibility of additional human-specific subtypes, a genetic basis for morphological as well as physiological diversity arises. This is now discussed on p. 7. It needs to be kept in mind, however, that other factors, such as push-pull influences during tissue growth, might also play a role.

      The human tissue used in this study was obtained from elderly individuals, while the mouse tissue was not. It is unclear whether the age difference might influence the findings, and this warrants further discussion or control.

      We share this concern, in particular regarding the spine / spine cluster analysis as here tissue quality and or degenerative effects might play a role. We additionally analyzed a tissue sample from a 37 year-old human, and observed the same spine clusters as in the other human brains. This is now described on p. 4 of the revised manuscript.

      The study includes spine size comparisons, but it is not clear if the point spread function (PSF) of the microscope provides the necessary resolution for these quantitative assessments. For instance, are multi-headed spines truly multi-headed, or could this be an artifact of limited resolution?

      This is an important point. We addressed it by calculating the Rayleigh limit (more conservative than the Abbe limit) as 248.4nm for the equipment and conditions used (Methods, p. 22). On pages 3-5, we updated our Results section accordingly to point out what quantifications are well supported and discuss the limitations (p. 3-5).

      Reviewer 2 (Recommendations for the authors):

      This is nice work which must have been very time-consuming. It would be good to make sure that the technical details are properly discussed, to quantify the data properly. Please include details of how you measured the resolution of the microscope used to evaluate spine size.

      See our response to the last comment of Referee 1 above.

      The figure panels are mostly satisfactory, but they are exceptionally crowded and will probably be difficult to read at the final size. Some work tidying these would be worth it. In Figure 3B, include mention of open and blue triangles in legend. In 3E, the dendritic branches are shown at a different gray scale. You have not done this elsewhere, so probably good to mention it in the legend.

      Figure 3 and its legend have been updated / improved accordingly.

      The definition of horizontal and vertical is not absolutely clear. Perhaps re-assess this bit of the text. Does it mean that you did not include cells that were neither vertical nor horizontal?

      We categorized those PCs as ‘vertical’ that have a >30° angle relative to the PC layer, and those as ‘horizontal’ that have a <30° angle relative to the PC layer. All PCs are covered by these categories. This is now described on p. 5.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #2 suggested the addition of new data to address the following points:

      Reviewer #2: 

      (1) Oncogenic GOF - the main data shown for GOF are the survival curve and enhanced metastasis. Often, GOF is exemplified at the cellular level as enhanced migration and invasion, which are standard assays to support the GOF. As such, the authors should perform these assays using either tumor cells derived from the mice or transformed fibroblasts from these mice. This will provide important and confirmatory evidence for GOF for Y217C. 

      We thank the referee for this comment. Our previous data indicated accelerated tumor progression and increased metastasis in Trp53<sup>Y217C/Y217C</sup> mice, which provided in vivo evidence of an oncogenic gain of function (GOF) for the p53<sup>Y217C</sup> mutant. However, we agree that it was important to provide additional evidence of GOF at the cellular level. 

      Many cellular assays were previously used to evaluate the GOF of p53 mutants, including those listed by the referee. Importantly, Zhao et al. recently showed that a common property of several p53 mutants proposed to have oncogenic GOF is their capacity to promote chromosomal instability (Zhao et al. (2024) Nat. Commun. 15, 180). For the revision of our manuscript, we compared the frequencies of chromosomal alterations occurring spontaneously in WT, Trp53<sup>Y217C/Y217C</sup> and Trp53<sup>-/-</sup> mouse embryonic fibroblasts (MEFs). Chromosome breaks, radial chromosomes and DMs were more frequent in Trp53<sup>Y217C/Y217C</sup> MEFs than in WT or Trp53<sup>-/-</sup> MEFs, providing clear evidence of a GOF promoting chromosomal instability. This new result is presented in Figure 2G and mentioned in the revised abstract. 

      Furthermore, as pointed out by referee #1 in a confidential comment, increased NF-kB signaling provides evidence of p53 GOF. Accordingly, Zhao et al. proposed that the capacity of p53<sup>G245D</sup> and p53<sup>R273H</sup> to promote chromosomal instability ultimately led to activation of a noncanonical NF-kB signaling that would promote tumor cell invasion and metastasis. Consistent with their work, we now report that the GSEA of Trp53<sup>Y217C/Y217C</sup> and Trp53<sup>-/-</sup> thymocytes revealed an upregulation of non-canonical NF-kB signaling in Trp53<sup>Y217C/Y217C</sup> thymic cells (a new result presented in Figure 5F and Supplementary Figure S13).  These new data lead us to mention in the revised discussion that “similar mechanisms might underlie the oncogenic properties of the p53<sup>Y217C</sup>, p53<sup>G245D</sup> and p53<sup>R273H</sup> mutants”.

      (2) Novel target gene activation - while a set of novel targets appears to be increased in the Y217C cells compared to the p53 null cells, it is unclear how they are induced. The authors should examine if mutant p53 can bind to their promoters through CHIP assays, and, if these targets are specific to Y217C and not the other hot-spot mutations. This will strengthen the validity of the Y217C's ability to promote GOF. 

      We respectfully disagree with the referee when he/she considers that the validity of p53<sup>Y217C</sup>’s ability to promote a GOF would be strengthened by showing that p53<sup>Y217C</sup> binds to the promoters of genes upregulated in Trp53<sup>Y217C/Y217C</sup> cells. In fact, Pal et al. recently performed the experiment proposed by the referee, by integrating RNAseq and ChIPseq data from MCF10A cells expressing p53<sup>Y220C</sup>, the human equivalent of p53<sup>Y217C</sup>,  and found that 95% of the genes upregulated upon p53<sup>Y220C</sup> expression were upregulated indirectly, without p53<sup>Y220C</sup> binding to their promoters (Pal et al. (2023) NPJ Breast Cancer 9, 78). Consistent with our data, Pal et al. notably found that the expression of p53<sup>Y220C</sup> increased cell migration and invasion, which correlated with an increased expression of S100A8 and S100A9. They found that the promoters of S100A8 and S100A9 were however not bound by p53<sup>Y220C</sup>, indicating an indirect mechanism for their upregulated expression. Furthermore, the study by Zhao et al. mentioned above also suggested an indirect mechanism of GOF, because the upregulation of inflammation-related genes by a mutant p53 protein was proposed to result from signaling cascades triggered by chromosomal instability. Our data appear consistent with both studies, because p53<sup>Y217C</sup> was undetectable or barely detectable in the chromatin fraction of Trp53<sup>Y217C/Y217C</sup> cells, and because Trp53<sup>Y217C/Y217C</sup> cells exhibited increased chromosome instability and increased NFB signaling compared to Trp53<sup>-/-</sup> cells, which may suggest indirect mechanisms for p53<sup>Y217C</sup> GOF. 

      Nevertheless, we agree with the referee that it was important to provide stronger evidence of p53<sup>Y217C</sup> GOF in the revised manuscript.  In that regard, we were intrigued by the perinatal death of most Trp53<sup>Y217C/Y217C</sup> females, which provided evidence of unexpected teratogenic effects of the mutant. We had proposed that these female-specific teratogenic effects likely resulted from pro-inflammatory GOF of p53<sup>Y217C</sup>. This hypothesis relied on the RNAseq pro-inflammatory signature in Trp53<sup>Y217C/Y217C</sup> thymic cells, and on the fact that the glycoprotein CD44, known to drive inflammation, had been identified as a key gene in open neural tube defects. However, we had not tested this hypothesis experimentally. In the revised version of the manuscript, we tested this hypothesis. We mated Trp53<sup>+/Y217C</sup> female mice with Trp53<sup>Y217C/Y217C</sup> males, then administered supformin (LCC-12), a potent CD44 inhibitor known to attenuate inflammation in vivo, to pregnant mice by oral gavage. The administration of subformin led to a five-fold increase in the proportion of weaned Trp53<sup>Y217C/Y217C</sup> females in the progeny, suggesting that reducing inflammation in utero rescued some of the Trp53<sup>Y217C/Y217C</sup> female embryos. This new result is presented in Figure 5G and Supplementary Table S6, and mentioned in the abstract. 

      We believe that these new results, as well as the additional GSEA analyses revealing increased NFkB signaling in Trp53<sup>Y217C/Y217C</sup> cells, further emphasize the importance of inflammation in the GOF of the p53<sup>Y217C</sup> mutant. Accordingly, we slightly modified the title of our article, to include the notion that Trp53<sup>Y217C</sup> is an inflammation-prone mouse model. We also end the article by summarizing the effects of p53<sup>Y217C</sup> in vivo, in a new Supplementary Table S7 that compares the LOF effects of a p53 KO with the (LOF+GOF) effects of the p53<sup>Y217C</sup> mutant. 

      (3) Dominant negative effect - the authors' claim of lack of DN effect needs to be strengthened further, as most p53 hot-spot mutations do exhibit DN effect. At the minimum, the authors should perform additional treatment with nutlin and gamma irradiation (or cytotoxic/damaging agents) and examine a set of canonical p53 target genes by qRT-PCR to strengthen their claim. 

      Our previous data indicated identical tumor onset and survival in Trp53<sup>+/Y217C</sup> and Trp53<sup>+/-</sup> mice, leading us to conclude that, at least for spontaneous tumorigenesis, there was no evidence of a Dominant Negative Effect (DNE) in vivo. Here, we followed the referee’s suggestion and evaluated the possibility of a DNE in response to stress, by comparing WT, Trp53<sup>+/Y217C</sup> and Trp53<sup>+/-</sup> MEFs or thymocytes. We analyzed different types of stress (Nutlin, Doxorubicin, girradiation) and different types of cellular responses (transactivation of classical p53 target genes, cell cycle arrest, apoptosis), and the results lead us to conclude that there is little if any DNE also in response to various stresses. These new data are mentioned in a paragraph evaluating the possibility of DNE or GOF at the cellular level, and presented in a new Supplementary Figure S6.

    1. Author response:

      We thank the reviewers of this manuscript for their thoughtful and detailed feedback, and agree that they bring up valid points. We also thank them for their suggestions on how to improve this study. We intend to revise this manuscript to help address these concerns and in the future will submit a revised version that will hopefully be improved in terms of the clarity of the text and rigor of the experimental findings.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      In this valuable study, García-Vázquez et al. provide solid evidence suggesting that G2 and S phases expressed protein 1 (GTSE1), is a previously unappreciated non-pocket substrate of cyclin D1-CDK4/6 kinases. To this end, this study holds a promise to significantly contribute to an improved understanding of the mechanisms underpinning cell cycle progression. Notwithstanding these clear strengths of the article, it was thought that the study may benefit from establishing the precise role of cyclin D1-CDK4/6 kinase-dependent GTSE1 phosphorylation in the context of cell cycle progression, …

      We do not claim, as editors and reviewers appear to have interpreted, that GTSE1 is phosphorylated by cyclin D1-CDK4 in the G1 phase of the cell cycle under normal physiologic conditions.  Indeed, we agree with the existing literature indicating that in cells that do not express high levels of cyclin D1, GTSE1 is expressed predominantly during S and G2 phase (hence the name GTSE1, which stands for G-Two and S phases expressed protein 1) and is phosphorylated by mitotic cyclins in early mitosis.  Even during G1, when the levels of cyclin D1 peak, GTSE1 is not phosphorylated in normal cells.  This could be due to either a higher affinity between GTSE1 and mitotic cyclins as compared to D-type cyclins or to a higher concentration of mitotic cyclins compared to D-type cyclins.  In the current manuscript, we show that higher levels of cyclin D1 can drive the sustained phosphorylation of GTSE1 across all cell cycle points. To reach this conclusion, we do not rely only on the overexpression of exogenous cyclin D1. In fact, we observe similar effect when we deplete endogenous AMBRA1, resulting in the stabilization of endogenous cyclin D1 in all cell cycle phases (see Figure 2G and Figure supplement 3B).  As we had already mentioned in the Discussion section, we propose that GTSE1 is phosphorylated by CDK4 and CDK6 particularly in pathological states, such as cancers displaying overexpression of D-type cyclins (i.e., it is possible that the overexpression overcomes the lower affinity of the cyclin D-GTSE1 complex). In turn, phosphorylation of GTSE1 induces its stabilization, leading to increased levels that, as expected based on the existing literature, contribute to enhanced cell proliferation.  So, the role of the cyclin D1-CDK4/6 kinase-dependent GTSE1 phosphorylation is to stabilize GTSE1 independently of the cell cycle.  In sum, our study suggests that overexpression of cyclin D1, which is often observed in cancers cells beyond the G1 phase, induces phosphorylation of GTSE1 at all points in the cell cycle. 

      … obtaining more direct evidence that cyclin D1-CDK4/6 kinase phosphorylate indicated sites on GTSE1 (e.g., S454) …

      We show that treatment of cells with palbociclib completely abolished the effect of cyclin D1-CDK4 on the GTSE1 shift observed using Phos-tag gels (Figure 2H).  Moreover, mutagenesis analysis shows that S91, S262, and S724 are phosphorylated in a cyclin D1-CDK4-dependent manner (Figure 2F and Figure supplement 3A). Compared to wild-type GTSE1, a triple mutant (S91A/S262A/S724A) displayed loss of slower-migrating bands upon co-expression of cyclin D1-CDK4, suggesting diminished phosphorylation. Nevertheless, a residual slow-migrating band persisted, prompting further mutations of the triple GTSE1 mutant in S331 and S454 (individually), which do not have a CDK-phosphorylation consensus, but were identified in several published phospho-proteomics studies. From these two quadruple mutants, only the that containing the S454A mutation demonstrated a complete abrogation of any shift in phos-tagTM gels (Figure 2F). These studies suggest that four major sites (S91, S262, S454, and S724) are phosphorylated (either directly and/or indirectly) in a cyclin D1-CDK4-dependent manner.

      … and mapping a degron in GTSE1 whose function may be blocked by cyclin D1-CDK4/6 kinase-dependent phosphorylation.

      We show that stabilization or overexpression of cyclin D1, which is often observed in human cancers, promotes GTSE1 phosphorylation on S91, S262, S454, and S724, resulting in GTSE1 stabilization.  Similarly, a phospho-mimicking mutant with the 4 serine residues replaced with an aspartate at positions 91, 261, 454, and 724 display increased half-life. While we appreciate the editor’s suggestion and agree on these being interesting questions, we would like to respectfully point out that mapping the GTSE1 degron and understanding how it is affected by cyclin D1-CDK4/6-dependent phosphorylation is outside the scope of the current project and will require an extensive set of experiments and tools. Accordingly, the three reviewers did not ask to map the GTSE1 degron.  We plan on addressing these interesting questions as part of a follow-up study.

      Reviewer #1 (public review):

      Summary:

      García-Vázquez et al. identify GTSE1 as a novel target of the cyclin D1-CDK4/6 kinases. The authors show that GTSE1 is phosphorylated at four distinct serine residues and that this phosphorylation stabilizes GTSE1 protein levels to promote proliferation.

      Strengths:

      The authors support their findings with several previously published results, including databases. In addition, the authors perform a wide range of experiments to support their findings.

      Weaknesses:

      I feel that important controls and considerations in the context of the cell cycle are missing. Cyclin D1 overexpression, Palbociclib treatment and apparently also AMBRA1 depletion can lead to major changes in cell cycle distribution, which could strongly influence many of the observed effects on the cell cycle protein GTSE1. It is therefore important that the authors assess such changes and normalize their results accordingly.

      We have approached the question of GTSE1 phosphorylation to account for potential cell cycle effects from multiple angles: 

      (i) We conducted in vitro experiments with purified, recombinant proteins and shown that GTSE1 is phosphorylated by cyclin D1-CDK4 in a cell-free system (Figure 2A-C). These experiments provide direct evidence of GTSE1 phosphorylation by cyclin D1-CDK4 without the influence of any other cell cycle effectors. 

      (ii) We present data using synchronized AMBRA1 KO cells (new Figure 2G and Figure supplement 3B).  In agreement with what we had shown previously (Simoneschi et al., Nature 2021, PMC8875297), AMBRA1 KO cells progress faster in the cell cycle but they are still synchronized as shown, for example, by the mitotic phosphorylation of Histone H3, peaking at 32 hours after serum readdition like in parental cells. Under these conditions we observed that while phosphorylation of GTSE1 in parental cells is evident in the last two time points, AMBRA1 KO cells exhibited sustained phosphorylation of GTSE1 across all cell cycle phases.  This was evident enough when using Phos-tag gels as in the top panel of the old Figure 2G. We now re-run one the biological triplicates of the synchronized cells using higher concentration of Zn<sup>+2</sup>-Phos-tag reagent and lower voltage to allow better separation of the phosphorylated bands.  Under these conditions, GTSE1 phosphorylation is better appreciable (top panel of the new Figure 2G). This experiment provides evidence that high levels of cyclin D1 in AMBRA1 KO cells affect GTSE1 phosphorylation independently of the specific points in the cell cycle. 

      (iii) The relative short half-life of GTSE1 (<4 hours) makes its levels sensitive to acute treatments such as Palbociclib or acute AMBRA1 depletion. The effects of these treatments on GTSE1 levels are measurable within a time frame too short to significantly affect cell cycle progression. For example, we used cells with fusion of endogenous AMBRA1 to a mini-Auxin Inducible Degron (mAID) at the N-terminus. This system allows for rapid and inducible degradation of AMBRA1 upon addition of auxin, thereby minimizing compensatory cellular rewiring. Again, we observed an increase in GTSE1 levels upon acute ablation of AMBRA1 (i.e., in 8 hours) (Figure 3B), when no significant effects on cell cycle distribution are observed (please see Simoneschi et al., Nature 2021, PMC8875297 and Rona et al., Mol. Cell 2024, PMC10997477).

      Altogether, the above lines of evidence support our conclusion that GTSE1 is a target of cyclin D1-CDK4, independent of cell cycle effects.

      In conclusion, we do not claim that GTSE1 is phosphorylated by cyclin D1-CDK4 in the G1 phase of the cell cycle under normal physiologic conditions.  Indeed, we agree with the existing literature indicating that in cells that do not express high levels of cyclin D1, GTSE1 is expressed predominantly during S and G2 phase (hence the name GTSE1, which stands for G-Two and S phases expressed protein 1) and is phosphorylated by mitotic cyclins in early mitosis.  Even during G1, when the levels of cyclin D1 peak, GTSE1 is not phosphorylated in normal cells. This could be due to either a higher affinity between GTSE1 and mitotic cyclins as compared to D-type cyclins or to a higher concentration of mitotic cyclins compared to D-type cyclins.  In the current manuscript, we show that higher levels of cyclin D1 can drive the sustained phosphorylation of GTSE1 across all cell cycle points. To reach this conclusion, we do not rely only on the overexpression of exogenous cyclin D1. In fact, we observe similar effect when we deplete endogenous AMBRA1, resulting in the stabilization of endogenous cyclin D1 in all cell cycle phases (see Figure 2G and Figure supplement 3B).  As we had already mentioned in the Discussion section of the original submission, we propose that GTSE1 is phosphorylated by CDK4 and CDK6 particularly in pathological states, such as cancers displaying overexpression of D-type cyclins (i.e., it is possible that the overexpression overcomes the lower affinity of the cyclin D1-GTSE1 complex). In turn, phosphorylation of GTSE1 induces its stabilization, leading to increased levels that, as expected based on the existing literature, contribute to enhanced cell proliferation.  In sum, our study suggests that overexpression of cyclin D1, which is often observed in cancers cells beyond the G1 phase, induces phosphorylation of GTSE1 at all points in the cell cycle.    

      Reviewer #2 (public review):

      Summary:

      The manuscript by García-Vázquez et al identifies the G2 and S phases expressed protein 1(GTSE1) as a substrate of the CycD-CDK4/6 complex. CycD-CDK4/6 is a key regulator of the G1/S cell cycle restriction point, which commits cells to enter a new cell cycle. This kinase is also an important therapeutic cancer target by approved drugs including Palbocyclib. Identification of substrates of CycD-CDK4/6 can therefore provide insights into cell cycle regulation and the mechanism of action of cancer therapeutics. A previous study identified GTSE1 as a target of CycB-Cdk1 but this appears to be the first study to address the phosphorylation of the protein by Cdk4/6.

      The authors identified GTSE1 by mining an existing proteomic dataset that is elevated in AMBRA1 knockout cells. The AMBRA1 complex normally targets D cyclins for degradation. From this list, they then identified proteins that contain a CDK4/6 consensus phosphorylation site and were responsive to treatment with Palbocyclib.

      The authors show CycD-CDK4/6 overexpression induces a shift in GTSE1 on phostag gels that can be reversed by Palbocyclib. In vitro kinase assays also showed phosphorylation by CDK4. The phosphorylation sites were then identified by mutagenizing the predicted sites and phostag got to see which eliminated the shift.

      The authors go on to show that phosphorylation of GTSE1 affects the steady state level of the protein. Moreover, they show that expression and phosphorylation of GTSE1 confer a growth advantage on tumor cells and correlate with poor prognosis in patients.

      Strengths:

      The biochemical and mutagenesis evidence presented convincingly show that the GTSE1 protein is indeed a target of the CycD-CDK4 kinase. The follow-up experiments begin to show that the phosphorylation state of the protein affects function and has an impact on patient outcomes.

      Weaknesses:

      It is not clear at which stage in the cell cycle GTSE1 is being phosphorylated and how this is affecting the cell cycle. Considering that the protein is also phosphorylated during mitosis by CycB-Cdk1, it is unclear which phosphorylation events may be regulating the protein.

      Please see point (ii) and the last paragraph in the response to Reviewer #1.  Moreover, we show that, compared to the amino acids phosphorylated by cyclin D1-CDK4, cyclin B1-CDK1 phosphorylates GTSE1 on either additional residues or different sites (Figure 2H). We also show that expression of a phospho-mimicking GTSE1 mutant leads to accelerated growth and an increase in the cell proliferative index (Figure 4B,C and new Figure supplement 4D-E).  Finally, we have evaluated also the cell cycle distributions by flow cytometry (new Figure supplement 4F). These analyses show that the expression of a phospho-mimicking GTSE1 mutant induces a decrease in the percentage of cells in G1 and an increase in the percentage of cells in S, similarly to what observed in AMBRA1 KO cells.

      Reviewer #3 (public review)

      Summary:

      This paper identifies GTSE1 as a potential substrate of cyclin D1-CDK4/6 and shows that GTSE1 correlates with cancer prognosis, probably through an effect on cell proliferation. The main problem is that the phosphorylation analysis relies on the over-expression of cyclin D1. It is unclear if the endogenous cyclin D1 is responsible for any phosphorylation of GTSE1 in vivo, and what, if anything, this moderate amount of GTSE1 phosphorylation does to drive proliferation.

      Strengths:

      There are few bonafide cyclin D1-Cdk4/6 substrates identified to be important in vivo so GTSE1 represents a potentially important finding for the field. Currently, the only cyclin D1 substrates involved in proliferation are the Rb family proteins.

      Weaknesses:

      The main weakness is that it is unclear if the endogenous cyclin D1 is responsible for phosphorylating GTSE1 in the G1 phase. For example, in Figure 2G there doesn't seem to be a higher band in the phos-tag gel in the early time points for the parental cells. This experiment could be redone with the addition of palbociclib to the parental to see if there is a reduction in GTSE1 phosphorylation and an increase in the amount in the G1 phase as predicted by the authors' model. The experiments involving palbociclib do not disentangle cell cycle effects. Adding Cdk4 inhibitors will progressively arrest more and more cells in the G1 phase and so there will be a reduction not just in Cdk4 activity but also in Cdk2 and Cdk1 activity. More experiments, like the serum starvation/release in Figure 2G, with synchronized populations of cells would be needed to disentangle the cell cycle effects of palbociclib treatment.   

      Please see last paragraph in the response to Reviewer #1.  Concerning the experiments involving palbociclib, we limited confounding effects on the cell cycle by treating cells with palbociclib for only 4-6 hours. Under these conditions, there is simply not enough time for S and G2 cells to arrest in G1.

      It is unclear if GTSE1 drives the G1/S transition. Presumably, this is part of the authors' model and should be tested.

      We are not claiming that GTSE1 drives the G1/S transition (please see last paragraph in the response to Reviewer #1). GTSE1 is known to promote cell proliferation, but how it performs this task is not well understood.  Our experiments indicate that, when overexpressed, cyclin D1 promotes GTSE1 phosphorylation and its consequent stabilization.  In agreement with the literature, we show that higher levels of GTSE1 promote cell proliferation.  To measure cell cycle distribution upon expressing various forms of GTSE1, we have now performed FACS analyses (new Figure supplement 4F). These analyses show that the expression of a phospho-mimicking GTSE1 mutant induces a decrease in the percentage of cells in G1 and an increase in the percentage of cells in S, similarly to what observed in AMBRA1 KO cells shown in the same panel and in Simoneschi et al. (Nature 2021, PMC8875297).

      The proliferation assays need to be more quantitative. Figure 4B should be plotted on a log scale so that the slope can be used to infer the proliferation rate of an exponentially increasing population of cells. Figure 4c should be done with more replicates and error analysis since the effects shown in the lower right-hand panel are modest.

      In Figure 4B, we plotted data in a linear scale as done in the past (Donato et al. Nature Cell Biol. 2017, PMC5376241) to better underline the changes in total cell number overtime.  The experiments in Figure 4B were performed in triplicate, statistical significance was determined using unpaired T-tests with p-values<0.05, and error bars represent the mean +/- SEM.  In Figure 4C, error analysis was not included for simplicity, given the complexity of the data.  We have now included the other two sets of experiments (new Figure supplement 4D,E).  While the effects shown in the lower right-hand panel of Figure 4C are modest, they demonstrate the same trend as those observed in the AMBRA KO cells (Figure 4C and Simoneschi et al., Nature 2021, PMC8875297). It's important to note that this effect is achieved through the stable expression of a single phospho-mimicking protein, whereas AMBRA KO cells exhibit changes in numerous cell cycle regulators. Moreover, these effects are obtained by growing cells in culture for only 5 days. A similar impact on cell growth in vivo over an extended period could pose significant risks in the long term.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Figure 1E is referenced before 1D. The authors should consider switching D and E.

      Done.

      Figure 1D-E: The authors correctly note in the introduction that GTSE1 is encoded by a cell cycle-dependently expressed gene. Given that cell cycle genes are often associated with poor prognosis (e.g., see Whitfield et al., 2006 Nat. Rev. Cancer), this would be expected to correlate with poor prognosis. This should be mentioned in the results section.

      We agree that the overexpression of certain (but not all) cell cycle-regulated genes are prognostically unfavorable across various cancer types, and we cited Whitfield et al., 2006 Nat. Rev. Cancer.  However, our data indicate that phosphorylation of GTSE1 induces its stabilization and, consequently, its levels do not oscillate during the cell cycle any longer (new Figure 2G and Figure supplement 3B).  Moreover, analyzing data from the Clinical Proteomic Tumor Analysis Consortium, we observed an enrichment of GTSE1 phospho-peptides (normalized to total protein) within a pan-cancer cohort as opposed to adjacent, corresponding normal tissues (Figure 2I).

      Figure 2F: Contrast is too high. Blot images should not contain fully saturated black or white.

      We corrected the contrast.

      Figure 2G and Figure Supplement 3B: It looks like AMBRA1 KO cells do not synchronize properly in response to serum withdrawal. The cell cycle distribution should be checked by FACS. Otherwise, it is unclear whether changes in GTSE1 (phosphor) levels are only due to indirect changes in the cell cycle distribution.

      Synchronization of both parental and AMBRA1 KO cells is demonstrated by the fact that the phosphorylation of Histone H3 peaks at 32 hours after serum readdition in both cases (Figure supplement 3B). 

      Figure 2I: It is important that phosphor-GTSE1 levels are normalized to total GTSE1 levels to understand the distinct contribution of changes in GTSE1 levels and from CCND1-CDK4 driven phosphorylation.

      Done.

      Figure 3A-B: These experiments should also be controlled for cell cycle distribution. Is this effect specific to GTSE1 and other AMBRA1 targets or are other G2/M cell cycle proteins also affected?

      The relative short half-life of GTSE1 (<4 hours) makes its levels sensitive to acute treatments such as Palbociclib or acute AMBRA1 depletion. The effects of these treatments on GTSE1 levels are measurable within a time frame too short to significantly affect cell cycle progression. For example, we used cells with fusion of endogenous AMBRA1 to a mini-Auxin Inducible Degron (mAID) at the N-terminus. This system allows for rapid and inducible degradation of AMBRA1 upon addition of auxin, thereby minimizing compensatory cellular rewiring. Again, we observed an increase in GTSE1 levels upon acute ablation of AMBRA1 (i.e., in 8 hours) (Figure 3B), when no significant effects on cell cycle distribution are observed (please see Simoneschi et al., Nature 2021, PMC8875297 and Rona et al., Mol. Cell 2024, PMC10997477).

      Figure 4: It should be noted that the correlation with cell proliferation and cell cycle protein expression is expected for any cell cycle protein, including GTSE1.

      Actually, the main point of Figure 4 is to show that expression of the phospho-mimicking mutant of GTSE1 promotes cell proliferation. Comparative analysis revealed that cells overexpressing either wild-type GTSE1 or its phospho-deficient form exhibited significantly reduced proliferation rates compared to those expressing the phospho-mimicking mutant (Figure 4B,C). 

      The two-decades-old references 33 and 34 are not well suited to support the notion for Cyclin D1 that "the full spectrum of substrates and their impact on cellular function and oncogenesis remain poorly explored." More recent references should be used to show that this is still the case.

      We added more recent references.

      The authors conclude that their "data indicate that cyclin D1-CDK4 is responsible for the phosphorylation of GTSE1 on four residues (S91, S262, S454, and S724)." However, the authors' data do not exclude a role for their siblings cyclin D2, cyclin D3, and CDK6. Reflecting this, the conclusions should be toned down.

      The analysis of the sites phosphorylated in GTSE1 was performed by experimentally co-expressing cyclin D1-CDK4 (Figure 2F, Figure 2H, and Figure supplement 3A), hence our statement.  Yet, we agree that in cells, cyclin D2, cyclin D3, and CDK6 can contribute to GTSE1 phosphorylation. 

      The authors claim that they "observed that in human cells, when D-type cyclins are stabilized in the absence of AMBRA1, GTSE1 becomes phosphorylated also in G1." However, the G1-specific data presented by the authors are not controlled for, and it is unclear whether these phosphorylation events actually occur in G1 cells.

      We now provide a WB in which GTSE1 phosphorylation is more evident (top panel of the new Figure 2G) (please see point (ii) in the response to the public review of Reviewer #1).  This experiment clearly shows that in AMBRA1 KO cells, GTSE1 is phosphorylated at all points in the cell cycle. Synchronization of both parental and AMBRA1 KO cells is demonstrated by the fact that phosphorylation of Histone H3 peaks at 32 hours after serum re-addition in both cases (Figure supplement 3B). 

      Reviewer #2 (Recommendations for the authors):

      (1) It is not clear from the presented data at which point in the cell cycle that phosphorylation of GTSE1 may be affecting the steady state level of the protein. The implication that GTSE1 is a target of CycD-CDK4 would suggest that the protein is stabilized at G1/S. Can this effect be observed?

      Please see the last paragraph in the response to the public review of Reviewer #1.

      (2) Considering the previous study showing that GTSE1 is also phosphorylated during mitosis by CycB-Cdk1, do levels of GTSE1 protein change during the cell cycle? Do changes in GTSE1 levels correlate with phosphorylation during the cell cycle? Cell synchronization experiments such as double thymidine and subsequent phostag analysis could shed some light on these questions.

      Please see the last paragraph in the response to the public review of Reviewer #1.

      (3) The authors show that the phosphomimetic mutants of GTSE1 confer a growth advantage on cells. The mechanism of this growth advantage is unclear. Is this effect due to a shorter cell cycle, enhanced survival, or another mechanism?

      We did not observe increased cell survival when the phosphomimetic mutants of GTSE1 is expressed.  We show that phosphorylation of GTSE1 induces its stabilization, leading to increased levels that, as expected based on the existing literature, contribute to enhanced cell proliferation.  So, the role of the cyclin D1-CDK4/6 kinase-dependent phosphorylation of GTSE1 is to stabilize GTSE1. 

      (4) Other minor points - all of the presented immunoblots do not show molecular weight markers. The IF images require scale bars.

      To prevent overcrowding of the Figures, the sizes of blotted proteins are indicated in the uncropped scans of each blot. Uncropped scans have been deposited in Mendeley at:  https://data.mendeley.com/datasets/xzkw7hrwjr/1. Scale bars have been added to the IF images.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this paper, the authors have leveraged Single-cell RNA sequencing of the various stages of the evolution of lung adenocarcinoma to identify the population of macrophages that contribute to tumor progression. They show that S100a4+ alveolar macrophages, active in fatty acid metabolic activity, such as palmitic acid metabolism, seem to drive the atypical adenomatous hyperplasia (AAH) stage. These macrophages also seem to induce angiogenesis promoting tumor growth. Similar types of macrophage infiltration were demonstrated in the progression of the human lung adenocarcinomas.

      Strengths:

      Identification of the metabolic pathways that promote angiogenesis-dependent progression of lung adenocarcinomas from early atypical changes to aggressive invasive phenotype could lead to the development of strategies to abort tumor progression.

      We are grateful for your constructive comments. These comments are very helpful for revising and improving our paper and have provided important guiding significance to our study. We have made revisions according to your comments and have provided point-by-point responses to your concerns.

      Weaknesses:

      (1) Can the authors demonstrate what are the functional specialization of the S100a4+ alveolar macrophages that promote the progression of the AAH to the more aggressive phenotype? What are the factors produced by these unique macrophages that induce tumor progression and invasiveness?

      Thank you for your comments. To more comprehensively characterize the functional specialization of the S100a4<sup>+</sup> alveolar macrophages, we expanded the macrophage functional gene sets based on relevant literature and databases and performed enrichment analysis. The results showed that all stages of precancerous progression presented activated states of angiogenesis, M2-like and immunosuppressive functions relative to the normal stage (Figure 4B). As we have demonstrated, S100a4<sup>+</sup> alveolar macrophages predominantly exert pro-angiogenic functions during the AAH phase and may be more biased towards M2-like polarization and immunosuppression during further disease progression. Consistently, S100A4<sup>+</sup> subset population of macrophages has been proved to exhibit a M2-like phenotype with immunosuppressive properties in tumor progression [PMID: 34145030]. In addition, S100A4 has been reported to be associated with macrophage M2 polarization, angiogenesis, and tumorigenesis [PMID: 39664586, 36895491, 30221056, 32117590]. The functional status of human S100A4<sup>+</sup> alveolar macrophages is basically the same. The relevant description was added to the Results section as follows: “It was revealed that the capacities for angiogenesis, M2-like polarization, and immunosuppression were found to be stronger in AAH or other precancerous stages relative to the normal stage (Figure 4B). The pro-angiogenic function predominated in the AAH stage, while M2-like and immunosuppressive functions were more prominent in the subsequent precancerous progression.” (page 11, line 262). Our study puts more attention on the functional phenotypic changes of S100a4<sup>+</sup> alveolar macrophages during the progression from normal to AAH to explain the role of this subpopulation in tumor initiation, and similarly, preliminary coculture experiments could only indicate its role in the early malignant transformation of epithelial cells. In further experimental validation, we will confirm the above functions of the S100a4<sup>+</sup> alveolar macrophages promoting the progression of AAH to the more aggressive phenotype by in vitro and in vivo experiments. We have extended the limitations and potential experimental designs to the Discussion section as follows: “It is worth noting that our mining of S100a4<sup>+</sup> alv-macro remains at the precancerous initiation stage, and further experimental designs are needed to verify its specific contribution at more aggressive stages. For example, FACS sorting of the subpopulation at different stages of disease progression, respectively, for precise functional characterization;” (page 19, line 468).

      For the factors produced by these unique macrophages during induction of malignant transformation, we assayed culture supernatant of S100a4-OE alveolar macrophages for secreted functional cytokines. The results showed up-regulation of MIP-2, HGF, TNFα, IL-1a, CD27, CT-1, MMP9, 4-1BB, and CD40, and GO enrichment showed angiogenesis and tumorigenesis-related processes (Figure 5L and 5M). We have added the detailed content to the Results section as follows: “Next, we detected tumor-inducing factors secreted by these unique macrophages using Cytokine Antibody Array. We noted the production of macrophage inflammatory protein (MIP)-2, hepatocyte growth factor (HGF), tumor necrosis factor α (TNF-α), IL-1α, MMP9, and CD40, and these cytokine-related biological processes were mainly involved in the regulation of angiogenesis and immune response (Figure 5L and 5M).” (page 13, line 319). Furthermore, changes in these cytokines during subsequent invasive tumor progression will also be continuously monitored. The description in the Discussion section have been added as: “Furthermore, TGF-β and HGF activate vascular endothelial cells and promote proliferation and migration, as well as induce the expression of pro-angiogenic factors such as VEGF (Vimalraj, 2022; Watabe, Takahashi, Pietras, & Yoshimatsu, 2023). Macrophage-derived TNF-α and IL-1α lead tumor cells to produce potent angiogenic factors IL-8 and VEGF, which affect angiogenesis and tumor growth (Torisu et al., 2000). MIP2 and CD40 were also identified as pro-tumor factors associated with angiogenesis (Kollmar, Scheuer, Menger, & Schilling, 2006; Murugaiyan, Martin, & Saha, 2007)…continuous monitoring of the fluctuation of the above factors in bronchoalveolar lavage fluid at corresponding periods;” (page 19, line 461).

      All method details covered in this section have been updated in the Materials and methods.

      (2) Angiogenic factors are not only produced by the S100a4+ cells but also by pericytes and potentially by the tumor cells themselves. Then, how do these factors aberrantly trigger tumor angiogenesis that drives tumor growth?

      Thank you for your comment. In our study, we detected up-regulation of angiogenic factors HIF-1α, VEGF, MMP9, and TGF-β (Figure 5K), and elevation of secreted HGF, IL-1α, and TNF-α (Figure 5L). We provide a detailed description of how these factors are involved in angiogenesis-related tumorigenesis to varying degrees in the Discussion section: “Precancerous lesions of LUAD are angiogenic, and pro-angiogenic factors secreted by cells, including S100a4<sup>+</sup> alv-macro, induce endothelial cell sprouting and chemotaxis, leaving the angiogenic switch activated, prompting the formation of new blood vessels on the basis of the original ones to supply oxygen and nutrients to sustain tumor initiation (Chen et al., 2024; Kayser et al., 2003; van Hinsbergh & Koolwijk, 2008). Under hypoxic conditions, HIF-1α activates numerous factors that contribute to the angiogenic process, including VEGF, which promotes vascular permeability, and MMP9, which breaks down the ECM, promotes endothelial cell migration, and recruits pericytes to provide structural support (Raza, Franklin, & Dudek, 2010; Sakurai & Kudo, 2011). Cytokines secreted into the microenvironment activate macrophages, which subsequently produce angiogenic factors, further promoting angiogenesis (Sica, Schioppa, Mantovani, & Allavena, 2006). Furthermore, TGF-β and HGF activate vascular endothelial cells and promote proliferation and migration, as well as induce the expression of pro-angiogenic factors such as VEGF (Vimalraj, 2022; Watabe, Takahashi, Pietras, & Yoshimatsu, 2023). Macrophage-derived TNF-α and IL-1α lead tumor cells to produce potent angiogenic factors IL-8 and VEGF, which affect angiogenesis and tumor growth (Torisu et al., 2000)…” (page 19, line 449).

      (3) It is not clear how abnormal fatty acid uptake by the macrophages drives the progression of tumors.

      Thank you for your comment, which coincides with our mechanistic exploration. The metabolic status of macrophages influences their pro-tumor properties, and lipid metabolism has been shown to determine the functional polarization of macrophages [PMID: 29111350]. In this study, we observed more accumulation of lipid droplets in S100a4-OE MH-S, demonstrating enhanced cellular fatty acid uptake (Figure 6A). The pro-angiogenic ability of S100a4<sup>+</sup> alv-macro was confirmed by tube formation assay and cytokine assay (Figure 6B and 5M). Cpt1a was thought to play a crucial role in the metabolic paradigm shift of S100a4<sup>+</sup> alv-macro, we therefore performed functional rescue experiments by inhibiting CPT1A expression in S100a4-OE MH-S by addition of etomoxir (ETO). After culture with conditioned medium of MH-S, the proliferation, migration, and ROS production of MLE12 cells were all restored to lower levels (Figure 6E-G). In addition, ETO treatment significantly reversed the angiogenesis, which supported the regulation of fatty acid metabolism on macrophage function (Figure 6H). Immunoblotting also revealed restoration of expression in related proteins (Figure 6I and 6J), these findings reinforced previous analyses of the association of fatty acid metabolism with pro-angiogenesis and M2-like function in S100a4<sup>+</sup> alv-macro. The involvement of PPAR-γ in the regulation of metabolic state was also confirmed. Taken together, we suggest that S100a4<sup>+</sup> alv-macro promotes fatty acid metabolism through the CPT1A-PPAR-γ axis, enhances its ability to promote angiogenesis, and thus drives tumor occurrence. The corresponding contents were added in the Results section S100a4<sup>+</sup> alv-macro drove angiogenesis by promoting Cpt1a-mediated fatty acid metabolism (page 13, line 327) and Discussion section: “We demonstrated the regulation of fatty acid metabolism by CPT1A in S100a4<sup>+</sup> alv-macro as well as the involvement of PPAR-γ. Nevertheless, the molecular mechanism that drives the acquisition of metabolic and functional switching properties specific to this cell state still requires further characterization in the context of precancerous lesions. It has been reported that CD36 is the main effector of the S100A4/PPAR-γ pathway, and its mediated fatty acid uptake plays an important role in the tumor-promoting function of macrophages (S. Liu et al., 2021).” (page 18, line 433).

      All method details covered in this section have been supplemented in the Materials and methods.

      (4) Does infusion or introduction of S100a4+ polarized macrophages promote the progression of AAH to a more aggressive phenotype?

      Thank you for your comment. We performed intratracheal instillation of lentivirus-infected S100a4-OE MH-S and culture supernatant in A/J and BALB/c mice, respectively, but no aggressive pathological phenotype was observed so far, possibly due to the lack of time required for lesions or the imperfection of experimental conditions. We will continue to explore the instillation dose and frequency for long-term monitoring and will simultaneously evaluate the availability of primary alveolar macrophages. We have discussed as follows: “It is worth noting that our mining of S100a4<sup>+</sup> alv-macro remains at the precancerous initiation stage, and further experimental designs are needed to verify its specific contribution at more aggressive stages…and intratracheal instillation of primary S100a4<sup>+</sup> alv-macro to observe the pathological progression of precancerous lesions.” (page 19, line 468).

      (5) How does Anxa and Ramp1 induction in inflammatory cells induce angiogenesis and tumor progression?

      Thank you for your comment. ANXA2 is an important member of annexin family of proteins expressed on surface of endothelial cells, macrophages, and tumor cells [PMID: 30125343]. ANXA2 was reported to regulate neoangiogenesis in the tumor microenvironment and most likely due to overproduction of plasmin. As a well-established receptor for plasminogen (PLG) and tissue plasminogen activator (tPA) on the cell surface, ANXA2 converts PLG into plasmin. Plasmin plays a critical role in the activation of cascade of inactive proteolytic enzymes such as metalloproteases (pro-MMPs) and latent growth factors (VEGF and bFGF) [PMID: 12963694, 11487021]. Activated forms of MMPs and VEGF then induce extracellular matrix remodeling facilitating angiogenesis and tumor development [PMID: 15788416]. Sharma et al. suggested administration of ANXA2-antibody inhibited tumor angiogenesis and growth concurrent with plasmin generation [PMID: 22044461], the role of ANXA2 in plasmin activation thus explains it’s importance in tumor-related angiogenesis. We verified the simultaneous upregulation of ANXA2 and PLG in S100a4-OE MH-S and cocultured HUVEC and MLE12 by immunoblotting (Figure 6D). The relevant description was added to the Results section as follows: “ANXA2 is considered to be a cellular receptor for plasminogen (PLG), often expressed on the surface of endothelial cells, macrophages, and tumor cells, which activates a cascade of pro-angiogenic factors by promoting the conversion of PLG to plasmin, thereby promoting angiogenesis and tumor progression (Semov et al., 2005; Sharma, 2019). We found synergistic upregulation of ANXA2 and PLG expression in S100a4-OE MH-S and cocultured HUVEC and MLE12, which may help explain how ANXA2 induction was involved in angiogenesis and malignant transformation (Figure 6D).” (page 14, line 338).

      Recent studies showed that S100A4 is associated with tumor angiogenesis and progression by the interaction with ANXA2. ANXA2 is the endothelial receptor for S100A4 and that their interaction triggers the functional activity directly related to pathological properties of S100A4, including angiogenesis [PMID: 18608216]. It has been proved that S100A4 induces angiogenesis through interaction with ANXA2 and accelerated plasmin formation [PMID: 15788416, 25303710]. In addition, it is generally believed that ANXA2 participates in malignant cell transformation [PMID: 28867585]. Therefore, we speculate that ANXA2 may promote plasmin production by binding to S100A4, thus promoting angiogenesis and tumor initiation, and we have discussed accordingly: “The role of ANXA2 in angiogenesis has been widely recognized, and it may facilitate plasmin production by binding to S100A4 and then trigger angiogenesis and malignant cell transformation (Grindheim, Saraste, & Vedeler, 2017; Y. Liu, Myrvang, & Dekker, 2015).” (page 18, line 446).

      In our study, the primary target of our validation was ANXA2 rather than RAMP1, even though its relationship with angiogenesis had been established [PMID: 20596610], so we weakened the relevant description in the manuscript.

      (6) For the in vitro studies the authors might consider using primary tumor cells and not cell lines.

      Thank you for your suggestion, which was in our initial experimental plan. However, since S100A4 is not expressed on the cell surface, FACS sorting of primary subset of alveolar macrophages presents technical limitations. We have also attempted overexpression in primary macrophages, but the current overexpression efficiency and cell status are not sufficient to support a subsequent series of experiments. For all these reasons, the alveolar macrophage cell line MH-S and the lung epithelial cell line MLE12 were selected to ensure the consistency and stability of the coculture system.

      In addition, we are optimizing the experimental conditions to achieve coculture of primary macrophages and epithelial cells, and will also establish transgenic mouse models for simultaneous validation. The Discussion has been added as: “Besides, as our previous in vitro results were obtained based on cell lines, we will optimize the experimental conditions to achieve coculture of primary macrophage subset and epithelial cells and establish transgenic mouse models for in vivo validation.” (page 19, line 475).

      Reviewer #2 (Public review):

      Summary:

      The work aims to further understand the role of macrophages in lung precancer/lung cancer evolution

      Strengths:

      (1) The use of single-cell RNA seq to provide comprehensive characterisation.

      (2) Characterisation of cross-talk between macrophages and the lung precancerous cells.

      (3) Functional validation of the effects of S100a4+ cells on lung precancerous cells using in vitro assays.

      (4) Validation in human tissue samples of lung precancer / invasive lesions.

      We are grateful for your constructive comments. These comments are very helpful for revising and improving our paper and have provided important guiding significance to our study. We have made revisions according to your comments and have provided point-by-point responses to your concerns.

      Weaknesses:

      (1) The authors need to provide clarification of several points in the text.

      Thank you for your comment. We have clarified these points in the manuscript and responded to all your concerns in detail. Please see the responses to Recommendations for the authors.

      (2) The authors need to carefully assess their assumptions regarding the role of macrophages in angiogenesis in precancerous lesions.

      Thank you for your comment. We have cited relevant literature to support the occurrence of angiogenesis in precancerous lesions, and demonstrated the contribution of S100a4<sup>+</sup> alveolar macrophages by tube formation assay and cytokine assay. In addition, we have discussed the relevant limitations of this study and aimed to provide more robust evidence. Please see the responses to Recommendations for the authors.

      (3) The authors should discuss more broadly the current state of anti-macrophage therapies in the clinic.

      Thank you for your suggestion. We have provided extensive discussion of the clinical state of anti-macrophage therapies. Please see the responses to Recommendations for the authors.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The text has grammatical and syntax errors that need to be corrected accordingly.

      Thank you for your suggestion. We have corrected the grammatical and syntactic errors and asked a native English speaker in the field to help polish the full text.

      Reviewer #2 (Recommendations for the authors):

      This work provides an important contribution to our further understanding of the role of macrophages in lung precancer/lung cancer evolution. I have several comments regarding how the manuscript could be improved:

      Introduction:

      The authors may consider citing the following work to enhance their work:

      (1) At line 78, where they talk about precancerous lesions being reversible, they should cite recent work on this in lung cancer: Teixeria et al 2019 PMID: 30664780, and Pennycuik et al 2020 PMID: 32690541.

      Thank you for your suggestion. We have cited the above references in the corresponding paragraph (page 4, line 76).

      (2) At line 96, where they talk about developing medicines for precancerous lesions, the authors should cite comprehensive review articles where this concept has been discussed in depth, for example: Reynolds et al 2023 PMID: 37067191, and Asad et al 2012 PMID: 23151603.

      Thank you for your suggestion. We have cited the above references in the corresponding paragraph (page 5, line 94).

      Results:

      (1) Line 142, the authors say "mice were feed for 12-16 months" - do they mean the mice were maintained for 12-16 months?

      Thank you for your comment. To best mimic the process of human lung cancer development, A/J mice with the highest incidence of spontaneous lung tumors, which increases substantially with age, were selected. The corresponding description has been modified as: “A/J mice have the highest incidence of spontaneous lung tumors among various mouse strains, and this probability significantly increased with age (Landau, Wang, Yang, Ding, & Yang, 1998). To more comprehensively mirror the tumor initiation and progression process of human lung cancer, A/J mice were maintained for 12-16 months for spontaneous lesions, which resulted in three recognizable precancerous lesions in the lung.” (page 7, line 138).

      (2) Line 143, the authors claim to have seen "three recognizable precancerous and cancerous lesions in the lung" but then, they only go on to describe AAH, adenoma, and AIS, lesions which are all commonly recognized as precancers. What was the cancerous (i.e. invasive) lesion they identified?

      Thank you for your comment. We apologize for this misstatement and will include cancerous lesions from mice for simultaneous analysis in subsequent study. The corresponding description has been revised as: “To more comprehensively mirror the tumor initiation and progression process of human lung cancer, A/J mice were maintained for 12-16 months for spontaneous lesions, which resulted in three recognizable precancerous lesions in the lung.” (page 7, line 140).

      (3) Line 172, the authors say that the "proportion of cell types across the four stages showed a dynamic trend" ... what does this mean? A trend towards what exactly?

      Thank you for your comment. Our intention was to highlight heterogeneous changes, and the description has been corrected: “The proportion of cell types across the four stages showed irregular changes, while transcriptional homogeneity was reduced with precancerous progression, illustrating the importance of heterogeneity in tumorigenesis and also proving the reliability of the sampling in this study.” (page 8, line 169).

      (4) Line 193, the authors say cell communication "showed a tendency to malignant transformation." What does this statement mean? If they mean more cell communication occurred in the malignant lesions than the precancerous, then there is a flaw in the logic because AAH, adenoma, and AIS are all precancerous lesions. What is the sequence of evolution to malignancy the authors are assuming? Do they mean AIS is a more advanced stage of precancerous malignancy than adenoma, and adenoma is more advanced than AAH (albeit they are all precancerous lesions).

      Thank you for your comments. The malignant transformation process involves multiple stages, and histological AAH is regarded as the beginning of this process. Precancerous lesions of LUAD in mice are believed to develop stepwise from AAH, adenoma, to AIS, even if the process is not necessarily completely consistent [PMID: 11235908, 32707077]. What we meant to describe was a gradual increase in the frequency of cell communication during this process. The corresponding description has been modified as: “At the evolutionary stages of precancerous LUAD, despite possible sample heterogeneity and other interference, we observed increased interactions between epithelial cells and surrounding stromal and immune cells in the microenvironment, indicating gradually frequent cell-cell communication during this process” (page 8, line 187).

      (5) Immunofluorescence images in Figure 3G and Figure 4F are captured at low magnification, making it very difficult to evaluate the colocalisation data. Suggest authors provide higher magnification images.

      Thank you for your suggestion. We have replaced the immunofluorescence images in Figure 3G and Figure 4F with higher magnification images.

      (6) Line 284 when referencing the cell line here, the author should make it clear in the text that cells were transfected with a construct expressing S100A4. If possible, would be good to understand if the level of S100A4 expression achieved is less, similar, or greater than that seen in these cells in vivo.

      Thank you for your suggestion. We have amended the text to make it clear: “S100a4-overexpressed (OE) alveolar macrophages were established by transfection of the mS100a4 vector into the murine MH-S cell line, and empty vector was transfected as negative control (NC) cells” (page 12, line 284), and it will be clarified in the following exploration whether the level of S100a4 expression achieved is less, similar, or greater than that seen in these cells in vivo.

      (7) Line 285 - when the authors first refer to OE cells that have been transfected, they should also inform the reader what NC cells are i.e. negative control cells?

      Thank you for your suggestion. We have revised the relevant content as follows: “S100a4-overexpressed (OE) alveolar macrophages were established by transfection of the mS100a4 vector into the murine MH-S cell line, and empty vector was transfected as negative control (NC) cells” (page 12, line 284).

      (8) Line 324 - the authors claim they have demonstrated that the macrophages promote angiogenesis through upregulation of fatty acid metabolism. Whilst they may have demonstrated changes in fatty acid metabolism, no experiments assessing the effect of the macrophages in angiogenesis assays are included in the paper, so the authors should modify this statement.

      Thank you for your comments. The relevant experiments have been added based on your suggestions. Firstly, we demonstrated in vitro the up-regulation of fatty acid metabolism in S100a4<sup>+</sup> alv-macro and uncovered the contribution of CPT1A to angiogenesis and cell transformation through rescue experiments; Then, HUVEC tube formation assay and cytokine assay confirmed the pro-angiogenic effect of S100a4<sup>+</sup> alv-macro. We have added the Results section S100a4<sup>+</sup> alv-macro drove angiogenesis by promoting Cpt1a-mediated fatty acid metabolism (page 13, line 327) and added the Discussion as: “We demonstrated the regulation of fatty acid metabolism by CPT1A in S100a4<sup>+</sup> alv-macro as well as the involvement of PPAR-γ. Nevertheless, the molecular mechanism that drives the acquisition of metabolic and functional switching properties specific to this cell state still requires further characterization in the context of precancerous lesions. It has been reported that CD36 is the main effector of the S100A4/PPAR-γ pathway, and its mediated fatty acid uptake plays an important role in the tumor-promoting function of macrophages (S. Liu et al., 2021).” (page 18, line 433).

      All method details covered in this section have been supplemented in the Materials and methods.

      (9) Regarding angiogenesis in precancerous lesions and the role of macrophages in this process: is there even any evidence that precancerous LUAD lesions are angiogenic? Don't these lesions typically have a lepidic pattern, wherein the cancer cells merely co-opt pre-existing alveolar capillaries without the need to generate new vessels?

      Thank you for your comments. As you mentioned, pathologically, precancerous LUAD lesions mainly show a lepidic growth pattern, characterized by the growth of type II alveolar epithelial cells along pre-existing alveolar walls [PMID: 29690599], but this does not mean that this process does not require the formation of new blood vessels. There are multiple patterns of tumor angiogenesis. Some studies have shown that increased angiogenesis can be observed in certain precancerous lesions, which suggests that angiogenesis may play an important role in the early stages of lung cancer development. Microvessel density (MVD) was increased in AAH and AIS compared to normal lung tissue, indicating that new blood vessels are forming to provide essential nutrients and oxygen to tumor cells to support their growth. The expression level of pro-angiogenic factors such as VEGF is usually upregulated, which promotes the formation of new blood vessels by stimulating endothelial cell proliferation and migration. [PMID: 39570802, 14568684] In addition, the infiltration of macrophages into precancerous areas in response to cytokines has been shown to trigger a tumor angiogenic switch and maintain tumor-associated continuous angiogenesis [PMID: 35022204]. Our in vitro tube formation assay and cytokine assay also demonstrated angiogenesis induced by S100a4<sup>+</sup> alv-macro. We have discussed the relevant content (page 19, line 449) and will provide more sufficient evidence in future work.

      Discussion:

      Perhaps the authors can cite any literature pertaining to the current wave of anti-macrophage therapies currently being tested in the clinic. Moreover, have these therapies been tested in lung cancer, and if so, what were the results?

      Thank you for your suggestion. At present, the clinical trials of anti-macrophage therapies mainly involve Gaucher's disease and hematological malignancies, and the two tests related to lung cancer have no valid data posted. Nevertheless, there are some preclinical studies worth learning from. We have cited the relevant literature and discussed in detail: “With the elaborate resolution of TME, macrophage-related therapy is considered to be promising. So far, macrophage-targeted therapy has demonstrated clinical efficacy in Gaucher's disease and advanced hematological malignancies (Barton et al., 1991; Ossenkoppele et al., 2013). In lung cancer, an attempt to enhance anti-PD-1 therapy in NSCLC by depleting myeloid-derived suppressor cells with gemcitabine was prematurely terminated because of insufficient data collected; another clinical trial of TQB2928 monoclonal antibody promoting macrophage phagocytosis of tumor cells in combination with a third-generation EGFR TKI for advanced NSCLC is now recruiting. Moreover, preclinical studies on macrophage-targeted therapy combined with immune checkpoint inhibitors are being extensively conducted in NSCLC, and it was suggested that blockade of purine metabolism can reverse macrophage immunosuppression, and a synergetic effect can be achieved when combined with anti-PD-L1 therapy, which inspired the direction of our early intervention strategies (H. Wang, Arulraj, Anbari, & Popel, 2024; Yang et al., 2025).” (page 20, line 479).

      Methods:

      Further description of how lesions were classified as precancerous (AAH, adenoma, AIS) or cancerous by the pathologist should be defined (or cite appropriate reference where this is described).

      Thank you for your suggestion. We have cited relevant references in the Methods section (page 21, line 528) on how lesions were classified by the pathologists [PMID: 21252716, 28951454, 32707077, 24811831].

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study combines predictions from MD simulations with sophisticated experimental approaches including native mass spectrometry (nMS), cryo-EM, and thermal protein stability assays to investigate the molecular determinants of cardiolipin (CDL) binding and binding-induced protein stability/function of an engineered model protein (ROCKET), as well as of the native E. coli intramembrane rhomboid protease, GlpG.

      Strengths:

      State-of-the-art approaches and sharply focused experimental investigation lend credence to the conclusions drawn. Stable CDL binding is accommodated by a largely degenerate protein fold that combines interactions from distant basic residues with greater intercalation of the lipid within the protein structure. Surprisingly, there appears to be no direct correlation between binding affinity/occupancy and protein stability.

      Weaknesses:

      (i) While aromatic residues (in particular Trp) appear to be clearly involved in the CDL interaction, there is no investigation of their roles and contributions relative to the positively charged residues (R and K) investigated here. How do aromatics contribute to CDL binding and protein stability, and are they differential in nature (W vs Y vs F)?

      Based on the simulations in Corey et al (Sci Adv 2021), aromatic residues, especially tryptophan, appear to help provide a binding platform for the glycerol moiety of CDL which is quite flat. This interaction is likely why we generally see the tryptophan slightly further into the plane of the membrane than the basic residues, where it may help to orient the lipid. Unlike charge interactions with lipid head groups, such subtle contributions are likely distorted by the transfer to the gas phase, making it difficult to confidently assign changes in stability or lipid occupancy to interactions with tryptophan. We have added an explanation of these considerations to the Discussion section (page 13, last paragraph).

      (ii) In the case of GlpG, a WR pair (W136-R137) present at the lipid-water on the periplasmic face (adjacent to helices 2/3) may function akin to the W12-R13 of ROCKET in specifically binding CDL. Investigation of this site might prove to be interesting if it indeed does.

      Thank you for the suggestion. In our CG simulations, we don’t see significant CDL binding at this site, likely because there is just a single basic residue. We note that there is a periplasmic site nearby with two basic residues (K132+K191+W125) with a higher occupancy, however still far lower than the identified cytoplasmic site. In general, periplasmic sites are less common and/or have lower affinity which may be related to leaflet asymmetry (Corey et al, Sci Adv 2021). We added the CDL density plot for the periplasmic side to Figure S7 and noted this on page 9, next-to-last paragraph.

      (iii) Examples of other native proteins that utilize combinatorial aromatic and electrostatic interactions to bind CDL would provide a broader perspective of the general applicability of these findings to the reader (for e.g. the adenine nucleotide translocase (ANT/AAC) of the mitochondria as well as the mechanoenzymatic GTPase Drp1 appear to bind CDL using the common "WRG' motif.)

      Several confirmed examples are presented in Corey et al (Sci Adv 2021), the dataset which we used to identify the CDL site in GlpG. So essentially, our broader perspective is that we test the common features observed in native proteins in an artificial system. While it is not clear how a peripheral membrane protein like Drp1 fits into this framework, the CDL binding sites in ANTs indeed have the same hallmarks as the one in GlpG (Hedger et al, Biochemistry 2016). We recently contributed to a study demonstrating that the tertiary structure of ANT Aac2 is stabilized by co-purified CDL molecules, underscoring the general validity of our findings (Senoo et al, EMBO J 2024).  We have added this information to the discussion, pg 12, third paragraph, and added a figure (S8, see below) to highlight the architecture of the Aac2-CDL complex.

      Overall, using both model and native protein systems, this study convincingly underscores the molecular and structural requirements for CDL binding and binding-induced membrane protein stability. This work provides much-needed insight into the poorly understood nature of protein-CDL interactions.

      We thank the reviewer for the positive assessment!

      Reviewer #2 (Public review):

      Summary:

      The work in this paper discusses the use of CG-MD simulations and nMS to describe cardiolipin binding sites in a synthetically designed, that can be extrapolated to a naturally occurring membrane protein. While the authors acknowledge their work illuminates the challenges in engineering lipid binding they are able to describe some features that highlight residues within GlpG that may be involved in lipid regulation of protease activity, although further study of this site is required to confirm it's role in protein activity.

      Comments

      Discrepancy between total CDL binding in CG simulations (Fig 1d) and nMS (Fig 2b,c) should be further discussed. Limitations in nMS methodology selecting for tightest bound lipids?

      We thank the reviewer for pointing out that this needs to be clarified. We analyze proteins in detergent, which is in itself delipidating, because detergent molecules compete with the lipids for binding to the protein, an effect that can be observed in MS (Bolla et al, Angew Chemie Int. Ed. 2020). Native MS of membrane proteins requires stripping of the surrounding lipid vesicle or detergent micelle in the vacuum region of the mass spectrometer, which is done through gentle thermal activation in the form of high-energy collisions with gas molecules. Detergent molecules and lipids not directly in contact with the protein generally dissociate easier than bound lipids (Laganowsky et al, Nature 2014), however, the even loosely bound lipids can readily dissociate with the detergent, artificially reducing occupancy. The nMS data is therefore likely biased towards lipids bound tightly (e.g. via electrostatic headgroup interactions), however, these are the lipids we are interested in, meaning that the use of MS is suitable here. We have noted this in the Discussion, last paragraph on page 12.

      Mutation of helical residues to alanine not only results in loss of lipid binding residues but may also impact overall helix flexibility, is this observed by the authors in CG-MD simulations? Change in helix overall RMSD throughout simulation? The figures shown in Fig.1H show what appear to be quite significant differences in APO protein arrangement between ROCKET and ROCKET AAXWA.

      For most of the study, we use CG with fixed backbone bead properties as well as an elastic network to maintain tertiary structure. This means that a mutation to alanine will have essentially no impact on the stability of the helix or protein in general in the CG simulations in the bilayer. It should be noted that Figure 1H shows snapshots from atomistic gas phase simulations with pulling force applied (see schematic in Figure 1F, as well as Figure S1 for ends-point structures), where we naturally expect large structural changes due to unfolding. We have analyzed the helix content in the gas-phase simulations and see that helix 1 in ROCKET unwinds within 10 ns but stays helical ca. 10 ns longer when bound to CDL. The AAWXA mutation stabilizes the helical conformation independently of CDL binding, but CDL tethers the folded helix closer to the core (see Figure 1 G and H). We have added this information to the results section and the plot below to Figure S2.

      CG-MD force experiments could be corroborated experimentally with magnetic tweezer unfolding assays as has been performed for the unfolding of artificial protein TMHC2. Alternatively this work could benefit to referencing Wang et al 2019 "On the Interpretation of Force-Induced Unfolding Studies of Membrane Proteins Using Fast Simulations" to support MD vs experimental values.

      We apologize for the confusion here. The force experiments are gas-phase all-atom MD. The simulations show that the protein-lipid complex has a more stable tertiary structure in the gas phase. Since these are gas-phase simulations, they cannot be corroborated using in-solution measurements. Similarly, the paper by Wang et al is a great reference for solution simulations, however, to date the only validations for gas-phase unfolding come from native MS.

      Did the authors investigate if ROCKET or ROCKETAAXWA copurifies with endogenous lipids? Membrane proteins with stabilising CDL often copurify in detergent and can be detected by MS without the addition of CDL to the detergent solution. Differences in retention of endogenous lipid may also indicate differences in stability between the proteins and is worth investigation.

      We have investigated the co-purification of the ROCKET variants and did not observe any co-purified lipids (see Figure S4) which we clarified in the results section (page 5, third paragraph) now. We previously showed that long residence times in CG-MD are linked to the observation of co-purified lipids, because they are not easily outcompeted by the detergent (Bolla et al, Angew Chemie Int. Ed. 2020). In CG-MD of ROCKET, we see that although the CDL sites are nearly constantly occupied, the CDL molecules are in rapid exchange with free CDL from the bulk membrane. For MS, all ROCKET proteins were extracted from the E. coli membrane fraction with DDM, which likely outcompetes CDL. This interpretation would explain why we see significant CDL retention when the protein is released from liposomes, but not when the protein is first extracted into detergent. For GlpG, CDL residence times in CG-MD  are longer, which agrees with CDL co-purification. Similarly, there is clearly an enrichment of CDL when the protein is extracted into nanodiscs (Sawczyc et al, Nature Commun 2024).

      Do the AAXWA and ROCKET have significantly similar intensities from nMS? The AAXWA appears to show slightly lower intensities than the ROCKET.

      We did not observe a significant difference, however, in most spectra, the AAXWA peaks have a lower intensity than those of the other variants (see e.g. Figure S5). While this could be batch-to-batch variations, there may be a small contribution from the lower number of basic residues (see Abramsson et al, JACS au 2021). However, there is an excess of basic residues in the soluble domain of ROCKET, so this interpretation is speculative.

      Can the authors extend their comments on why densities are observed only around site 2 in the cryo-em structures when site 1 is the apparent preferential site for ROCKET.

      We base the lipid preference of Site 1 > Site 2 on the CG MD data, where we see a higher occupancy for site 1. At the same time, as noted in the text, CDL at both sites have rather short residence times. When the protein is solubilized in detergent, these times can change, and lipids in less accessible sites (such as cavities and subunit interfaces) may be subject to a slower exchange than those that are fully exposed to the micelle (Bolla et al, Angew Chemie Int. Ed. 2020). We speculate that this effect may favor retaining a lipid at site 2. Furthermore, site 1 is flexible, with CDL attaching in various angles while site 2 has more uniform CDL orientations (see CDL density plot in Figure 1D). EM is likely biased towards the less flexible site. Notably, the density is still poorly defined, so it is possible that a more variable lipid position in site 1 would not yield a notable density at all. We have added this information to the Results section (page 5, second paragraph).

      The authors state that nMS is consistent with CDL binding preferentially to Site 1 in ROCKET and preferentially to Site 2 in the ROCKET AAXWA variant, yet it unclear from the text exactly how these experiments demonstrate this.

      As outlined in the previous answer, we base our assessment of the sites on the CG MD simulations. There, we note that CDL binds predominantly to site 1 in ROCKET and predominantly to site 2 in AAXWA, however, the overall occupancy is lower in AAXWA than in Rocket, meaning fewer lipids will be bound simultaneously in that variant. The nMS data show CDL retention by both variants when released from liposomes, but the AAXWA has lower-intensity CDL adduct peaks (Figure 2B, C). We interpret this that both have CDL sites, but in the AAXWA variant, the sites have lower occupancy. We agree that this observation does not demonstrate that the CG MD data are correct, however, it is the outcome one expects based on the simulations, so we described it as “consistent with the simulations”. We have rephrased the section to make this clear.

      As carried out for ROCKET AAXWA the total CDL binding to A61P and R66A would add to supporting information of characterisation of lipid stabilising mutations.

      We considered this possibility too. Unfortunately, the mass differences between A61P / R66A and AAXWA are slightly too high to unambiguously resolve CDL adducts of each variant, as the 1st CDL peak of AAWXA partially overlaps with the apo peak of A61P or R66A.

      Did the authors investigate a double mutation to Site 2 (e.g. R66A + M16A)?

      While designing mutants, we tested several double mutants involving the basic residues that bind the CDL headgroups (e.g. R66 + AAWXA) but found that they could not be purified, probably because a minimum of positive residues at the N-terminus is required for proper membrane insertion and folding. M16 is an interesting suggestion, but wasn’t considered because the more subtle effects of non-charged amino acids on CDL binding may be lost during desolvation (see also our response to Comment (i) from reviewer 1).

      Was the stability of R66A ever compared to the WT or only to AAXWA?

      Some of the ROCKET mutants have very similar masses that cannot be resolved well enough on the ToF instrument. While the R66-WT comparison is possible, we would not be able to compare it to R61P or D7A/S8R. To avoid three-point comparisons, we selected AAXWA as the common point of reference for all variants.

      How many CDL sites in the database used are structurally verified?

      At the time, 1KQF was the only verified E. coli protein with a CDL resolved in a high-resolution structure. The complex was predicted accurately, see Figure 6A in Corey et al (Sci Adv 2021), as were several non-E. coli complexes.

      The work on GlpG could benefit from mutagenesis or discussion of mutagenesis to this site. The Y160F mutation has already been shown to have little impact on stability or activity (Baker and Urban Nat Chem Biol. 2012).

      We thank the referee for their excellent suggestion. While Y160F did not have a pronounced effect, the other 3 positions of the predicted CDL binding site in GlpG have not been covered by Baker and Urban. Looking at sequence conservation in GlpG orthologs, manually sampling down to 50% identity (~1300 sequences in Uniprot) shows that Y160 and K167 are conserved, R92 varies between K/R/Q, whereas W98 is not conserved. The other (weak) site cited above (K132 and K191) is not conserved. A detailed investigation of how the conserved residues impact CDL binding and activity is already planned for a follow up study focusing on GlpG biology.

      Reviewer #3 (Public review):

      Summary:

      The relationships of proteins and lipids: it's complicated. This paper illustrates how cardiolipins can stabilize membrane protein subunits - and not surprisingly, positively charged residues play an important role here. But more and stronger binding of such structural lipids does not necessarily translate to stabilization of oligomeric states, since many proteins have alternative binding sites for lipids which may be intra- rather than intermolecular. Mutations which abolish primary binding sites can cause redistribution to (weaker) secondary sites which nevertheless stabilize interactions between subunits. This may be at first sight counterintuitive but actually matches expectations from structural data and MD modelling. An analogous cardiolipin binding site between subunits is found in E.coli tetrameric GlpG, with cardiolipin (thermally) stabilizing the protein against aggregation.

      “It’s complicated” We could not have phrased the main conclusions of our study better.

      Strengths:

      The use of the artificial scaffold allows testing of hypothesis about the different roles of cardiolipin binding. It reveals effects which are at first sight counterintuitive and are explained by the existence of a weaker, secondary binding site which unlike the primary one allows easy lipid-mediated interaction between two subunits of the protein. Introducing different mutations either changes the balance between primary and secondary binding sites or introduced a kink in a helix - thus affecting subunit interactions which are experimentally verified by native mass spectrometry.

      Weaknesses:

      The artificial scaffold is not necessarily reflecting the conformational dynamics and local flexibility of real, functional membrane proteins. The example of GlpG, while also showing interesting cardiolipin dependency, illustrates the case of a binding site across helices further but does not add much to the main story. It should be evident that structural lipids can be stabilizing in more than one way depending on how they bind, leading to different and possibly opposite functional outcomes.

      We share the reviewer’s concern, as we clearly observe that TMHC4_R does not have the same type of flexibility as a natural protein. We find that by introducing flexibility, we start to see CDL-mediated effects. To test the valIdity of our findings from the artificial system, we apply them to GlpG. In response to a suggestion from Reviewer 1, we compared the findings to Aac2, and found that its stabilizing CDL site closely resembles that in GlpG (see new Figure S8).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor comments:

      There are a number of typos/uncorrected statements in the text.

      i) The last sentence of the Abstract appears to be an uncorrected mishmash of two.

      ii) Line 66: "protects" should be just "protect"

      iii) Line 75: Sentence appears to be incomplete. "...associated changes in protein stability." The word "stability" is missing.

      We have made these changes.

      iv) Fig. 2E. Are the magenta and blue colors inverted for variants 1 and 2?

      No, the color is correct. greater stabilization of the blue tetramer (AAXAW) compared to WT (purple) will lead to fewer blue monomoers than purple monomers in the mass spectrum.

      v) Line 274: the salt bridge should be between R8-E68.

      We have corrected this.

      vi) Lines 350-354 (final sentence of the paragraph): The sentence does not read well (especially with the double negative element). Please reconstruct the sentence and/or break it into two. 

      We have split the sentence in two.

      Suggestions:

      (i) While aromatic residues (in particular Trp) appear to be clearly involved in the CDL interaction, there is no investigation of their roles and contributions relative to the positively charged residues (R and K) investigated here. How do aromatics contribute to CDL binding and protein stability, and are they differential in nature (W vs Y vs F)?

      See our response to comment (i) from reviewer 1. In short, subtle contribution to lipid interactions (such as pi stacking with Trp or Tyr) will likely be lost during transfer to the gas phase. However, see also our response to the last comment from reviewer 2, we plan to use solution-phase activity assays to investigate the effect of Trp on CDL binding to Glp. However, this is beyond thes cope oif the current study.

      (ii) In the case of GlpG, a WR pair (W136-R137) present at the lipid-water on the periplasmic face (adjacent to helices 2/3) may function akin to the W12-R13 of ROCKET in specifically binding CDL. Investigation of this site might prove to be interesting if it indeed does.

      We added the CDL density plot for the periplasmic side to Figure S7 and discuss further sites in GlpG in the Discussion section. See response to point (ii) above for details.

      Reviewer #2 (Recommendations for the authors):

      Minor comments

      - Typo in abstract line 39-40

      - Typo in figure legend of Fig 1 line 145

      - Typo in line 149, missing R66 in residues shown as sticks description

      - Lines 165-167 could benefit from describing what residues are represented as sticks

      We have made these changes.

      - Line 263 should refer to the figure where the tetrameric state was not affected by this mutation.

      The full spectrum of the A61P mutant is not included in the figure, hence there is no reference,

      - Addition of statistics to Fig. 4F ?

      We have added significance indicators to the graph and information about the statistics to the legend.

      Reviewer #3 (Recommendations for the authors):

      Minor issues

      l39: rewrite

      We have made these changes.

      l60: provide evidence for what is presented as a general statement - cardiolipins might also regulate function without affecting oligomeric state, e.g. MgtA

      This is a good point, we have added references to two examples where CDL work without affecting oligomerization (MtgA, Weikum et al BBA 2024, and Aac2, Senoo et al, EMBO J 2024).

      l74: not every functional interaction comes with a thermal shift

      We use thermal shift as a proxy because it indicates tight interactions, even if they may not be functional. We have made this distinction clearer in the text.

      l78: this is true for electrostatic interactions such as are at play here, but not necessarily for hydrophobic ones

      l133: in what direction is the pulling force applied - the figure seems to suggest diagonally?

      The pull coordinate is defined as the distance between the centers of mass of the two helices. The direction of the pull coordinate in Cartesian coordinate space is thus not fixed.

      fig 1f, l159: "dissociating" meaning separation of subunits? the placement of the lipid within one subunit would not suggest that intermolecular interactions are properly represented here, please clarify

      The lipid placement in the schematic is not representative since the lipid occupies different spaces in WT and AAXWA, we have noted this in the legend. Regarding line 159, “Dissociation” is not strictly correct, since the measure the force to separate helix 1 and 2, i.e. unfolding. We have changed the wording to “unfolding”.

      l173: was there any evidence in EM data for monomers or smaller oligomers?

      No smaller particles were identified by visual inspection or in the particle classes. We have noted this in the methods section.

      l203: were tetramer peaks isolated separately for CID?

      C8E4 can cause some activation-dependent charge reduction, which could allow some tetramers to “sneak out” of the isolation window. We used global activation without precursor selection which subjects all ions to activation.

      fig 2c: can you indicate the 3rd lipid binding as it seems to be in the noise

      We can unambiguously assign the retention of three CDL molecules for 17+ charge state only, and clarified this in the legend to Figrue 2.

      fig3: can you pls clarify what is meant by stabilization here - less monomer in case A means a more stable oligomer, but "A > B" should lead to ratios < 50%. This does not help with understanding what "stabilization" means in panels c-f, please define what the y axis means for these. Please also explain the bottom panels (side view) in each case, what do the dots represent?

      We apologize for the oversight of not explaining the side views, we have added a legend. The schematic in panel A is correct (compare the schematic in Figure 2 E). If tetramer A (blue) is stabilized by CDL more than tetramer B  “CDL stabilization A>B”), there will be fewer monomers ejected from A. If there is less A in the presence of CDL, then the ratio of B/(B+A) will go up.

      It is not very clear what consequences the kink introduced by proline has for intra- vs. intermolecular interactions - the cartoons don't help much here

      We agree, the A61P impact on the structure is subtle. The small kink it introduces is not really visible in the top view, and hence, we tried to emphasize this in the side view. We have clarified the meaning of the side view schematics in the legend.

      l360: is that an assumption made here or is there evidence for displacement? native MS could potentially prove this.

      This is an assumption based on the fact that we see very little binding of POPG in the mixed bilayer CG-MD. We have clarified this in the text. Measuring this with MS is an interesting idea, but we have no direct measurement of displacement, since addition of CDL and POPG to the protein in detergent would result in binding to other sites as well.

      fig 4d: there is not much POPG density visible at all - why is that?

      Both plots use the same absolute scale. There is simply much less POPG binding compared to CDL.

      fig 4e: is this released protein already dissociated into monomers due to denaturation or excessive energy (CID product) - please comment.

      The CID energy for the spectrum in Figure 4E was selected to show partial dissociation and monomer release at higher voltages (220V in this case). At lower voltages (150V-170V) we do not observe dissociation in C8E4, see Figure S4A.

      l363: pls comment on the apparent discrepancy between single lipid binding and double density

      We added a clarifying sentence regarding the double lipids. The density seen in the published structure is of four lipid tails next to each other, which is what one would expect for a CDL. Since the CDL could not be resolved unambiguously, two phospholipids with two acyl chains each were modeled into the density instead. Our MS and MD data strongly suggests that the density stems from a single CDL.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:  

      Reviewer #1 (Public Review):

      Strengths:

      The manuscript utilizes a previously reported misfolding-prone reporter to assess its behaviour in ER in different cell line models. They make two interesting observations:

      (1) Upon prolonged incubation, the reporter accumulates in nuclear aggregates.

      (2) The aggregates are cleared during mitosis. They further provide some insight into the role of chaperones and ER stressors in aggregate clearance. These observations provide a starting point for addressing the role of mitosis in aggregate clearance. Needless to say, going ahead understanding the impact of aggregate clearance on cell division will be equally important.

      Weaknesses:

      The study almost entirely relies on an imaging approach to address the issue of aggregate clearance. A complementary biochemical approach would be more insightful. The intriguing observations pertaining to aggregates in the nucleus and their clearance during mitosis lack mechanistic understanding. The issue pertaining to the functional relevance of aggregation clearance or its lack thereof has not been addressed. Experiments addressing these issues would be a terrific addition to this manuscript.

      We have performed protein blotting and proteomics to characterize ER-FlucDM-eGFP expressing cells. We have also provided evidence to support the role of ER reorganization in regulating aggregate clearance. Our proteomic analysis provided a global view of the cellular state of cells expressing ER-FlucDM-eGFP, which potentially revealed functional relevance of ER-FlucDM-eGFP. Details are explained in the following comments. 

      Reviewer #2 (Public Review):

      Summary:

      The authors provide an interesting observation that ER-targeted excess misfolded proteins localize to the nucleus within membrane-entrapped vesicles for further quality control during cell division. This is useful information indicating transient nuclear compartmentalization as a quality control strategy for misfolded ER proteins in mitotic cells, although endogenous substrates of this pathway are yet to be identified.

      Strengths:

      This microscopy-based study reports unique membrane-based compartments of ERtargeted misfolded proteins within the nucleus. Quarantining aggregating proteins in membrane-less compartments is a widely accepted protein quality control mechanism. This work highlights the importance of membrane-bound quarantining strategies for aggregating proteins. These observations open up multiple questions on proteostasis biology. How do these membrane-bound bodies enter the nucleus? How are the singlelayer membranes formed? How exactly are these membrane-bound aggregates degraded? Are similar membrane-bound nuclear deposits present in post-mitotic cells that are relevant in age-related proteostasis diseases? Etc. Thus, the observations reported here are potentially interesting.

      Weaknesses:

      This study, like many other studies, used a set of model misfolding-prone proteins to uncover the interesting nuclear-compartment-based quality control of ER proteins. The endogenous ER-proteins that reach a similar stage of overdose of misfolding during ER stress remain unknown.

      We have included a previous study that showed accumulation of BiP aggregates in the nucleus upon overexpression of BiP (Morris et al., 1997; DOI: 10.1074/jbc.272.7.4327) in the discussion (Line 299).

      The mechanism of disaggregation of membrane-trapped misfolded proteins is unclear. Do these come out of the membrane traps? The authors report a few vesicles in living cells. This may suggest that membrane-untrapped proteins are disaggregated while trapped proteins remain aggregates within membranes.

      We initially made mStayGold-Sec61β to image the ER structures and ER-FlucDM-eGFP aggregates. However, we could not obtain convincing time-lapse images to show the release of ER-FlucDM-eGFP aggregates from the ER membrane as there are abundant ER structures present close to the aggregates during mitosis, preventing the differentiation of the membrane encapsulating aggregates from the ER structures. 

      The authors figure out the involvement of proteasome and Hsp70 during the disaggregation process. However, the detailed mechanisms including the ubiquitin ligases are not identified. Also, is the protein ubiquitinated at this stage?

      We performed cycloheximide chase experiments in cells released from the G2/M and found that ER-FlucDM-eGFP protein level did not fluctuate significantly when cells progressed through mitosis and cytokinesis. Thus, we did not consider protein ubiquitination and degradation of ER-FlucDM-eGFP as a major mechanism for its clearance. We have included this observation in the results (Figure S7A; Line 266) and in the discussion (Line 324) of the revised manuscript.

      This paper suffers from a lack of cellular biochemistry. Western blots confirming the solubility and insolubility of the misfolded proteins are required. This will also help to calculate the specific activity of luciferase more accurately than estimating the fluorescence intensities of soluble and aggregated/compartmentalized proteins. 

      We performed solubility test in cells expressing ER-FlucDM-eGFP and detected insoluble ERFlucDM-eGFP after heat stress (Figure S1E; Line 102). We have also performed protein blotting to detect ER-FlucDM-eGFP to normalize the luciferase activity (Line 609). We have updated the method section for luciferase measurement (Line 494).   

      Microscopy suggested the dissolution of the membrane-based compartments and probably disaggregation of the protein. This data should be substantiated using Western blots. Degradation can only be confirmed by Western blots. The authors should try time course experiments to correlate with microscopy data. Cycloheximide chase experiments will be useful.

      We performed cycloheximide chase experiments in cells released from the G2/M and found that ER-FlucDM-eGFP protein level did not fluctuate significantly when cells progressed through mitosis and cytokinesis (Figure S7A to S7C). Also, live-cell imaging of cells released from the G2/M indicated no significant change of total fluorescence intensity of ER-FlucDMeGFP (Figure S7D). Thus, we do not think that protein degradation of ER-FlucDM-eGFP is the major mechanism for its clearance. 

      The cell models express the ER-targeted misfolded proteins constitutively that may already reprogram the proteostasis. The authors may try one experiment with inducible overexpression.

      We have re-transduced fresh MCF10A cells with lentiviral particles to induce expression of ER-FlucDM-eGFP. The aggregates started to form after 24 h post-transduction. We made similar observations as described in the manuscript (e.g. aggregate clearance) two days after re-transduction.

      It is clear that a saturating dose of ER-targeted misfolded proteins activates the pathway.

      The authors performed a few RT-PCR experiments to indicate the proteostasis-sensitivity.

      Proteome-based experiments will be better to substantiate proteostasis saturation.

      We have performed proteomic analysis in cells expressing ER-FlucDM-eGFP and observed up-regulation of multiple proteins involved in the ER stress response, indicating that cells expressing ER-FlucDM-eGFP experience proteostatic stress (Figure S4A; Line 179).  

      The authors should immunostain the nuclear compartments for other ER-membrane resident proteins that span either the bilayer or a single layer. The data may be discussed.

      We have co-expressed ER-FlucDM-mCherry and mStayGold-Sec61β and detected mStayGold- Sec61β around ER-FlucDM-mCherry aggregates (Figure 1B).  

      All microscopy figures should include control cells with similarly aggregating proteins or without aggregates as appropriate. For example, is the nuclear-targeted FlucDM-EGFP similarly entrapped? A control experiment will be interesting. Expression of control proteins should be estimated by western blots.

      We targeted FlucDM-eGFP to the nucleus by expressing NLS-FlucDM-eGFP (Figure S1A). We found that the nuclear FlucDM-eGFP did not co-localize with the ER-FlucDM-mCherry aggregates (Figure S1B; Line 96). We have also determined the expression levels of NLSFlucDM-eGFP and ER-FlucDM-mCherry (Figure S1C and S1D).

      There are few more points that may be out of the scope of the manuscript. For example, how do these compartments enter the nucleus? Whether similar entry mechanisms/events are ever reported? What do the authors speculate? Also, the bilayer membrane becomes a single layer. This is potentially interesting and should be discussed with probable mechanisms. Also, do these nuclear compartments interfere with transcription and thereby deregulate cell division? What about post-mitotic cells? Similar deposits may be potentially toxic in the absence of cell division. All these may be discussed.

      Thank you for interesting suggestions for our study. We speculated that ER-FlucDM-eGFP aggregates may derive from the invagination of the inner nuclear membrane given that the aggregates are in close proximity to the inner nuclear membrane in interpase cells (Line 299). We have included a previous study that reported a similar aggregate upon BiP overexpression (Morris et al., 1997; DOI: 10.1074/jbc.272.7.4327; Line 300). Our proteomic analysis showed that cells expressing ER-FlucDM-eGFP have several up-regulated proteins related to cell cycle regulation (Figure S4A; Line 346).  

      Reviewer #3 (Public Review):

      Summary:

      This paper describes a new mechanism of clearance of protein aggregates occurring during mitosis.

      The authors have observed that animal cells can clear misfolded aggregated proteins at the end of mitosis. The images and data gathered are solid, convincing, and statistically significant. However, there is a lack of insight into the underlying mechanism. They show the involvement of the ER, ATPase-dependent, BiP chaperone, and the requirement of Cdk1 inactivation (a hallmark of mitotic exit) in the process. They also show that the mechanism seems to be independent of the APC/C complex (anaphase-promoting complex). Several points need to be clarified regarding the mechanism that clears the aggregates during mitosis:

      • What happens in the cell substructure during mitosis to explain the recruitment of BiP towards the aggregates, which seem to be relocated to the cytoplasm surrounded by the ER membrane.

      We have included images to show that BiP co-localizes with ER-FlucDM-eGFP aggregates in interphase cells (Figure S5C). We think that BiP participates in the formation of ER-FlucDMeGFP during interphase instead of getting recruited to the aggregates during mitosis.  

      • How the changes in the cell substructure during mitosis explain the relocation of protein aggregates during mitosis.

      We provided evidence to show that clearance of ER-FlucDM-eGFP aggregates involves the ER remodeling process. We depleted ER membrane fusion proteins ATL2 and ATL3 to perturb the distribution of ER sheets or tubules and found that cells were defective in clearing the aggregates (Figure 7A and B; Line 278). 

      • Why BiP seems to be the main player of this mechanism and not the cyto Hsp70 first described to be involved in protein disaggregation.

      In our proteomic analysis, we found that BiP (HSPA5) but not other Hsp70 family members were up-regulated in cells expressing ER-FlucDM-eGFP (Line 352; Figure S4A). This explains why BiP is the main player of the ER-FlucDM-eGFP aggregate clearance.  

      Strengths:

      Experimental data showing clearance of protein aggregates during mitosis is solid, statistically significant, and very interesting.

      Weaknesses:

      Weak mechanistic insight to explain the process of protein disaggregation, particularly the interconnection between what happens in the cell substructure during mitosis to trigger and drive clearance of protein aggregates.

      In our revised manuscript, we now provided evidence to show that ER-FlucDM-eGFP aggregate clearance involved remodeling of the ER structures during mitotic exit. This is added as a new Figure 7 in the revised manuscript and is described in the result section (Line 278) and in the discussion section (Line 323). We believe that this addition has provided mechanistic insights into ER-FlucDM-eGFP aggregate clearance.

      Recommendations for the authors:

      Reviewing Editor comments:

      I have read these reviews in detail and would like to recommend that the authors perform the experiments according to the reviewers' suggestions, as well as provide the appropriate controls raised by the reviewers.

      I think there are not that many requests and they all seem very reasonable and easily doable. I would recommend that the authors carry out the suggested experiments to develop a stronger story where the evidence transitions from being incomplete presently to a "more complete" standard.

      We have addressed questions raised by three reviewers and updated our manuscript (labeled in red in the main text).

      Reviewer #1 (Recommendations For The Authors):

      The manuscript makes exciting observations about the accumulation of reporter protein aggregates in the nucleus and its clearance during mitosis. It also provides some insight into the role of chaperons in aggregate clearance. These observations provide a good platform to perform in-depth analysis of the underlying mechanism and its functional relevance which perhaps the authors will plan over the long term. However, the below suggestions will help improve the current version of the manuscript:

      (1) Although it is assumed that the aggregates are cleared by the protein degradation mechanism, clear evidence supporting this assumption in the author's experiments is lacking and needs to be provided. Is it possible that mitosis induces disassembly of these aggregates instead of degradation?

      We performed two experiments to verify whether ER-FlucDM-eGFP aggregates are cleared by the protein degradation mechanism. In the first experiment, we treated cells expressing ER-FlucDM-eGFP released from the G2/M boundary with cycloheximide (CHX) and found that ER-FlucDM-eGFP did not decrease in protein abundance in cells progressing through mitosis (Figure S7A to S7C). In the second experiment, we measured the intensity of ERFlucDM-eGFP in early dividing cells and late dividing cells after release from the G2/M boundary and found that there was no significant difference between early and late dividing cells (Figure S7D). Thus, we concluded that protein degradation of ER-FlucDM-eGFP is not the primary mechanism of its clearance during cell division (Line 324). Furthermore, we included new data to show that the ER-FlucDM-eGFP aggregate clearance depends on ER reorganization during cell division, so mitotic exit induces disassembly of the aggregates instead of protein degradation.

      (2) It is intriguing that the aggregates are nuclear. Is the nuclear localization mediated by localization to ER? A time course analysis would reveal this and would provide credence to the idea that the reporter was originally expressed in the ER. It is currently unclear if the reporter ever gets expressed in ER.

      We showed that in interphase cells, ER-FlucDM-eGFP co-localizes with mStayGold-Sec61β, which labels the ER structures (Figure 1B). So, ER-FlucDM-eGFP is expressed and present in the ER network and invaginates into the inner nuclear membrane as aggregates. We attempted to image ER-FlucDM-eGFP for its formation; however it was technically challenging as the aggregates appeared very small and not too visible after clearance under our microscopy system.  

      (3) It would be expected that the persistence of these aggregates would impact cell division and cellular health. An experiment addressing this hypothesis would be very useful in establishing the functional relevance of this observation in the context of the current study.

      We have performed proteomic analysis on cell expressing ER-FlucDM-eGFP and found that multiple proteins involved in the ER stress response were up-regulated (Figure S4A). Additionally, proteins related to cell cycle regulation were up-regulated upon expression of ER-FlucDM-eGFP (Figure S4A). The increase of these proteins may indicate a perturbed cellular health (Line 344). 

      (4) A recent report (PMID: 34467852) identified the role of ER tubules in controlling the size of certain misfolded condensates. Would specific ER substructures affect the nuclear localization and/or clearance of the FlucDM aggregates? This is tied to point#2 and would provide insights into the connection between ER and the nuclear aggregates.

      Thank you for your suggestions. We perturbed the ER remodeling process by knocking down ATL2 and ATL3, which are ER membrane fusion proteins, and found that clearance of ER-FlucDM-eGFP aggregates was affected (Figure 7A and B). Hence, perturbation of the distribution of ER tubules and ER sheets affects ER-FlucDM-eGFP aggregate clearance. We have also added the recent paper about ER tubule size in regulating the sizes of misfolded condensates in the discussion (Line 321)

      Reviewer #2 (Recommendations For The Authors):

      I expect that the images indicate z-sections. Should be indicated in legends as applicable.

      We have indicated whether the images are Z-stack or single Z-slices in the figure legends.  

      Small point: the control region (outside inclusion) that was bleached in 2c may be clearly indicated. 

      We have added the explanation in the figure legend of Figure 2C.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      The authors investigate the neuroprotective effect of reserpine in a retinitis pigmentosa (P23H-1) model, characterized by a mutation in the rhodopsin gene. Their results reveal that female rats show better preservation of both rod and cone photoreceptors following reserpine treatment compared to males.

      Strengths:

      This study effectively highlights the neuroprotective potential of reserpine and underscores the value of drug repositioning as a strategy for accelerating the development of effective treatments. The findings are significant for their clinical implications, particularly in demonstrating sex-specific differences in therapeutic response.

      We sincerely appreciate the reviewer’s comments.

      Weaknesses:

      The main limitation is the lack of precise identification of the specific pathway through which reserpine prevents photoreceptor death.

      We acknowledge that the exact pathway through which reserpine exerts its protective effects on photoreceptors remains undetermined, yet our findings provide critical insights into potential mechanisms. Together with our previous report [PMID: 36975211], the studies being presented here validate proteostasis (including autophagy) and p53 signaling as the key pathways underlying reserpine-mediated survival of photoreceptors in retinal disease models. We also go a step further by showing an influence of the biological sex.

      We emphasize that the primary aim of this study was to demonstrate the effectiveness of reserpine in a different retinal degeneration model—specifically, the autosomal dominant RP model—which shares a retinal disease phenotype with the model used for initial screening but involves different genetic and molecular mechanisms of degeneration.

      Reviewer #2 (Public review):

      Summary:

      In the manuscript entitled "Sex-specific attenuation of photoreceptor degeneration by reserpine in a rhodopsin P23H rat model of autosomal dominant retinitis pigmentosa" by Beom Song et al., the authors explore the transcriptomic differences between male and female wild-type (WT) and P23H retinas, highlighting significant gene expression variations and sex-specific trends. The study emphasizes the importance of considering biological sex in understanding inherited retinal degeneration and the impact of drug treatments on mutant retinas.

      Strengths:

      (1) Relevance to Clinical Challenges: The study addresses a critical limitation in inherited retinal degeneration (IRD) therapies by exploring a gene-agnostic approach. It emphasizes sex-specific responses, which aligns with recent NIH mandates on sex as a biological variable.

      (2) Multi-dimensional Methodology: Combining electroretinography (ERG), optical coherence tomography (OCT), histology, and transcriptomics strengthens the study's findings.

      (3) Novel Insights: The transcriptomic analysis uncovers sex-specific pathways impacted by reserpine, laying the foundation for personalized approaches to retinal disease therapy.

      We are grateful for highlighting the strengths of our work.

      Weaknesses:

      Dose Optimization

      The study uses a fixed dose (40 µM), but no dose-response analysis is provided. Sex-specific differences in efficacy might be influenced by suboptimal dosing, particularly considering potential differences in metabolism or drug distribution.

      We acknowledge the limitation of using a fixed dose (40 µM) of reserpine in this study without conducting a comprehensive dose-response analysis. In the primary screens, the EC<sub>50</sub> of reserpine was approximately 20 µM. We doubled the concentration for injection to account for the potential loss of reserpine during the in vivo procedures. As we observed the rescue effect of reserpine in mice, we used the same concentration for rats. The fixed-dose approach was chosen to maintain consistency with previous studies evaluating reserpine in retinal degeneration models and to facilitate comparison across studies. Efforts to identify optimal dosing were deprioritized, as the primary goal was different and this information cannot be directly translated to clinical applications.

      We also agree that sex-specific differences in efficacy might be influenced by suboptimal dosing, particularly given potential variations in metabolism, drug distribution, and pharmacokinetics between male and female rats. However, recent pharmacokinetic studies on systemically administered reserpine in rats reported no statistically significant covariates, including body weight, age, breed, or sex, affecting pharmacokinetic (PK) or pharmacodynamic (PD) parameters (Alfosea-Cuadrado, G. M., Zarzoso-Foj, J., Adell, A., Valverde-Navarro, A. A., González-Soler, E. M., Mangas-Sanjuán, V., & Blasco-Serra, A. (2024). Population Pharmacokinetic–Pharmacodynamic Analysis of a Reserpine-Induced Myalgia Model in Rats. Pharmaceutics, 16(8), 1101. https://doi.org/10.3390/pharmaceutics16081101). Furthermore, no evidence of sex-specific differences in reserpine pharmacokinetics has been previously identified in available databases (National Center for Biotechnology Information (2025). PubChem Compound Summary for CID 5770, Reserpine. Retrieved January 13, 2025 from https://pubchem.ncbi.nlm.nih.gov/compound/Reserpine). Importantly, the drug in this study was administered intravitreally, where the ocular compartments are relatively isolated from systemic metabolism or excretion. Under these conditions, where absorption, distribution, metabolism, and excretion have minimal impact, we observed sex differences in efficacy using the same dose of drug.

      Nonetheless, we agree with the reviewer and plan to pursue dose-response and other studies in future investigations.

      Statistical Analysis

      In my opinion, there is room for improvement. How were the animals injected? Was the contralateral eye used as control? (no information in the manuscript about it!, line 390 just mentions the volume and concentration of injections). If so, why not use parametric paired analysis? Why use a non-parametric test, as it is the Mann-Whitney U? The Mann-Whitney U test is usually employed for discontinuous count data; is that the case here?<br /> Therefore, please specify whether contralateral eyes or independent groups served as controls. If contralateral controls were used, paired parametric tests (e.g., paired t-tests) would be statistically appropriate. Alternatively, if independent cohorts were used, non-parametric Mann-Whitney U tests may suffice but require clear justification.

      We apologize for the lack of clarity. In line 124, we described the injection as “bilateral intravitreal injections of 5 µL of either vehicle or 40 µM reserpine,” and in Figure 1A, we annotated the bilateral injection as DMSO for both eyes and RSP for both eyes. To address this uncertainty, we added the clarification, “with each group receiving bilateral injections of either vehicle or reserpine” (lines 404–405). Since the results are not paired and involve continuous data for which the normality assumption cannot be confidently met or verified, we used the Mann-Whitney U test for statistical analysis.

      Sex-Specific Pathways

      The authors do identify pathways enriched in female vs. male retinas but fail to explicitly connect these to the changes in phenotype analysed by ERG and OCT. The lack of mechanistic validation weakens the argument.

      The study does not explore why female rats respond better to reserpine. Potential factors such as hormonal differences, retinal size, or differential drug uptake are not discussed.

      It remains open, whether observed transcriptomic trends (e.g., proteostasis network genes) correlate with sex-specific functional outcomes.

      We acknowledge that, while we identified pathways enriched in female versus male retinas, we did not explicitly connect these findings to the functional phenotypes measured by ERG and OCT. Although our transcriptomic data suggest that reserpine differentially influences pathways such as proteostasis and p53 signaling, we did not conduct mechanistic experiments to validate a causal relationship between these pathways and the observed outcomes.

      In practice, designing a study to validate the mechanisms of a small molecule modulating multiple pathways presents significant challenges. If the pathways cannot be specifically modulated or if modulation could result in irreversible outcomes, the mechanistic validation becomes difficult to achieve. Drugs demonstrating mutation-agnostic efficacy are often investigated primarily through outcome measures and the analysis of affected pathways rather than through direct mechanistic validation (Leinonen, H., Zhang, J., Occelli, L. M., Seemab, U., Choi, E. H., L P Marinho, L. F., Querubin, J., Kolesnikov, A. V., Galinska, A., Kordecka, K., Hoang, T., Lewandowski, D., Lee, T. T., Einstein, E. E., Einstein, D. E., Dong, Z., Kiser, P. D., Blackshaw, S., Kefalov, V. J., Tabaka, M., … Palczewski, K. (2024). A combination treatment based on drug repurposing demonstrates mutation-agnostic efficacy in pre-clinical retinopathy models. Nature communications, 15(1), 5943. https://doi.org/10.1038/s41467-024-50033-5).

      As recommended, we added potential factors that might influence the differential response to reserpine, based on other studies (lines 353–362) highlighting differences in dopamine storage capacity and estrogen independence. We also added a discussion on the possibility of sex-related differences in basal ERG response levels (lines 363–366).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The study presents compelling findings on the neuroprotective effects of reserpine in a well-established model of retinitis pigmentosa (P23H-1). The use of ERG, optomotor assays, OCT, immunohistochemistry, and transcriptomic techniques provides a good exploration of the treatment's effects, particularly highlighting the differential response in females. The study underscores the potential of drug repurposing to expedite the availability of therapeutic interventions for patients.

      Thanks for your generous comments.

      While the manuscript presents an important contribution, I would like to highlight a few points that need clarification or further elaboration to strengthen the work:

      (1) Please include the photopic a-wave data in your analysis or provide a justification for its omission. Specifically, it would be valuable to know whether there is an improvement in this parameter under reserpine treatment.

      We appreciate the reviewer’s suggestion to include photopic a-wave data in our analysis and acknowledge the importance of this parameter in evaluating cone photoreceptor function. However, we did not analyze the photopic a-wave amplitude in our study because we found the photopic a-wave has low amplitude and high variability, consistent with findings in other studies with P23H-1 rats (Orhan E, Dalkara D, Neuillé M, Lechauve C, Michiels C, et al. (2015) Genotypic and Phenotypic Characterization of P23H Line 1 Rat Model. PLOS ONE 10(5): e0127319. https://doi.org/10.1371/journal.pone.0127319) or even with wild type rats (V.L. Fonteille, J. Racine, S. Joly, A.L. Dorfman, S. Rosolen, P. Lachapelle; Do Rats Generate a Photopic a–Wave? . Invest. Ophthalmol. Vis. Sci. 2005;46(13):2246). We added the description (lines 435-437) explaining why the photopic a-wave was not analyzed. Studies with P23H-1 did not analyze the photopic a-wave, probably for similar reasons.

      (2) In Figure 1, it would be helpful to include data from normal control animals to provide a benchmark for retinal degeneration in P23H-1 animals and to better contextualize the effects of reserpine treatment.

      Thanks. As suggested, we have included data from normal control animals to Figure 1.

      (3) The manuscript states that "Treated female retinas have significantly higher expression of the gene for P62 (SQSTM1), indicating a potential key route for reserpine's activity" (Line 331). Please explain how this difference in expression might translate into a better photoreceptor response in females compared to males.

      The difference in P62 (SQSTM1) expression between treated female and male retinas could have important implications for the photoreceptor response. We have identified in our previous study that reserpine increased P62 that mediates proteome balance between ubiquitin-proteasome system (UPS) and autophagy. Together with the role of P62 in the regulation of oxidative stress, P62 might be important for photoreceptor survival and function. Higher expression of P62 in treated females could suggest more efficient cellular maintenance and a better ability to cope with stress, leading to improved photoreceptor survival and function.

      (4) Numerous studies have shown that animal models of Parkinson's disease (e.g., those treated with MPTP or rotenone) or retinal tissue from Parkinson's patients exhibit dopaminergic cell death and associated vision loss. Please discuss how these findings relate to your results. Can you hypothesize how dopamine depletion by reserpine may lead to improved photoreceptor responses in your model?

      We appreciate the reviewer’s insightful comments. Both MPTP and rotenone act via inhibition of complex I of the respiratory chain, causing cell death and leading to dopamine depletion. In contrast, reserpine acts by inhibiting the vesicular monoamine transporter, depleting catecholamines by preventing their storage and facilitating their metabolism by monoamine oxidase. Although reserpine and other agents can induce animal models of Parkinson's disease, reserpine differs from the others in several aspects: (i) reserpine do not induce neurodegeneration and protein aggregation; (ii) motor performance, monoamine content, and TH staining are partially restored after treatment interruption; and (iii) reserpine lacks specificity regarding dopaminergic neurotransmission (Leão, A. H., Sarmento-Silva, A. J., Santos, J. R., Ribeiro, A. M., & Silva, R. H. (2015). Molecular, Neurochemical, and Behavioral Hallmarks of Reserpine as a Model for Parkinson's Disease: New Perspectives to a Long-Standing Model. Brain pathology (Zurich, Switzerland), 25(4), 377–390. https://doi.org/10.1111/bpa.12253). We have discussed the various effects of catecholamine depletion on retinal diseases (lines 331–337). Both dopamine receptor antagonists and agonists, as well as catecholamine depletion, can exert protective effects on the retina. The reduction in scotopic b-wave amplitude observed at P54, followed by a lack of further progression in degeneration, may support the hypothesis that reduced neuronal activity due to catecholamine depletion could have mitigated damage to retinal neurons.

      (5) For readers who may not be familiar with the P23H-1 mutation, it would be beneficial to include a brief description of the timeline and progression of retinal degeneration in this model.

      As the progression varies among studies, we have provided our description on observations from the same facility where the animals were housed. The timeline and progression of retinal degeneration are briefly described in the results section (lines 112–115) and Supplementary Figure 1.

      (6) Do you have any data on the effects of reserpine treatment in older animals? If available, this could provide additional insight into the potential applicability of reserpine in later stages of disease progression.

      Unfortunately, we do not have data from older animals. As described in the results section (lines 116–124), we set the timepoint for interventions before functional impairment peaked, aiming to harness the remaining potential for rescue and promote functional improvement. Our approach focused on developing a gene-agnostic therapy that can delay disease progression and be delivered at an earlier stage than AAV-based therapies, using FDA-approved drugs.

      (7) Molecular Basis of Sex Differences: The molecular mechanisms underlying the differential responses in males and females should be elaborated upon. If possible, include a discussion or hypothesis that addresses these sex-specific differences at the molecular level.

      We thank the reviewer for highlighting the importance of addressing the molecular basis of sex-specific differences. In our study, we observed distinct transcriptomic responses to reserpine between male and female rats, particularly in molecular pathways related to proteostasis and p53 signaling. While the sex-specific differences in these molecular pathways remain to be fully evaluated, we have added a discussion on sex differences in reserpine responses, incorporating findings from other studies (lines 353–366).

      Reviewer #2 (Recommendations for the authors):

      (1) There is no mention in the manuscript about the fact that the transgene rats have several copies of rhodopsin and how this can affect these sex differences. Would it be the same in the P23H KO mouse? Or in other models with a single copy of the mutation?

      We have described in the Materials and Methods section how they were bred, but we did not specifically mention the allele status in the manuscript. Hemizygous P23H-1 rats used in this study carry a single P23H transgene allele with a transgene copy number of 9, in addition to the normal two wild-type opsin alleles. We added this description to clear the uncertainty (lines 384-387.

      (2) This sentence: in abstract lines 26 to 29: "Recently, we identified reserpine as a lead molecule for maintaining rod survival in mouse and human retinal organoids as well as in the rd16 mouse, which phenocopy Leber congenital amaurosis caused by mutations in the cilia-centrosomal gene CEP290 (Chen et al. eLife 2023;12:e83205. DOI: https://doi.org/10.7554/eLife.83205)", to my vew, does not belong to the abstract, maybe in the introduction as stage of art.

      Thank you for asking. According to the guidelines for the research advance articles (that follow previously published studies), a reference to the original eLife article should be included in the abstract. As specified in the guidelines, we have updated the citation format to (author, year) for referencing eLife articles (line 29).

      (3) Lines 167-170: "Histologic evaluation of the retinas also demonstrated more prominent ONL thinning in the dorsal retina and increased ONL thickness in the dorsal retina measured at 1,000, 1,250, and 1,500 µm distant from the optic nerve head in reserpine-treated group compared with control group (Figure 3C)". I do not understand this sentence. Is it a more prominent thinning or an increased thickness?

      We apologize for the confusion caused by this sentence. The histological evaluation showed that ONL thinning was more pronounced in the dorsal retina of control group, which was consistent with OCT findings in Figure 3A. Reserpine treatment increased the ONL thickness in the dorsal retina at specific distances from the optic nerve head (1,000, 1,250, and 1,500 µm). We have revised the sentence for clarity (lines 165-168).

      (4) Lines 182-185 and Figure 4B: FL is not the best approach to quantify rhodopsin levels. Since the DAPI staining is overexposed, it is hard to evaluate the staining of RHO in the ONL. From the visible staining in the OS, it is only possible to affirm that the OS are longer in RSP-treated retinas... more is not to be affirmed based on these figures. I suggest using WB.

      We acknowledge the reviewer’s concern regarding the use of fluorescence imaging to quantify rhodopsin levels. While our current data highlight structural preservation, such as the length of the outer segments, we agree that drawing conclusions about rhodopsin levels from fluorescence staining is limited. As we do not have samples for WB and fluorescence imaging cannot quantify rhodopsin, we have revised the description (lines 180-184).

      (5) Lines 188-190 and Figure 4C: The images in 4C showed an extreme divergence between treated and untreated retina concerning the amount of stained cones, which is not observed at the quantification at 1000µm statistic. Are the images not representative?

      We agree with the reviewer that the images in Figure 4C may not adequately represent the quantified data. To address this, we have changed the figure to reflect the quantification results accurately.

      (6) Figures 6C-6D and 6G. Why do the authors not use any statistical analysis? Or are the differences not statistically significant? Why do authors use only WT and DMSO controls? What about untreated P23H controls (no DMSO)?

      Thanks for checking, and we apologize for the oversight. We have updated figures 5, 6 and S5 to include adjusted p-value in relevant plots. In addition, details of significance threshold are available in supplementary tables. Regarding controls, untreated P23H retinas (without DMSO) were not included in the current analysis, as our experience shows that DMSO injection itself does not cause functional or structural changes. The key data demonstrating the effect of reserpine involve a comparison between the group treated with reserpine and the control group treated with DMSO, as the only difference between these groups is the involvement of the drug.

      (7) Validation of findings by testing key genes (e.g., p62/SQSTM1, Nrf2) using qPCR or immunohistochemistry will strengthen the findings.

      We appreciate the reviewer’s suggestion to validate key findings using qPCR or immunohistochemistry, as such experiments are crucial for further strengthening our conclusions. While this was not feasible in the current study due to various constraints, we fully recognize their importance and plan to incorporate these in our follow-up studies.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Response to Public Reviews:

      We would like to thank the reviewers and editors once more for their time and effort in reviewing our manuscript. Below we discuss specifically our response to the recommendations of Reviewer 2, which were the only substantial changes we made to the manuscript.

      Reviewer 2 recommendation:

      "My only remaining suggestion is that the authors acknowledge and cite the work of other groups which have similarly found different subsets of LADs based on various molecular/epigenetic features:

      (1) doi.org/10.1101/2024.12.20.629719

      (2) PMID: 25995381

      (3) PMID: 36691074

      (4) PMID: 23124521 (fLADs versus cLADs, as described by the authors themselves) The exact subtypes of LADs might be different based on the features examined, but others have found/implicated the existence of different types of LADs. Hence, the pwv-LAD should be contextualized within these findings (which they do relative to v-fiLADs)."

      We thank the reviewer for this suggestion and for these references. We think that the best place to go into depth about how our work relates to these references would be in an appropriate review article.

      However, we did read these references carefully and responded, as described below, by adding additional clarifying text in the manuscript as well as mention of articles specifically relevant to our description of our results.

      (1) Reviewer 2 wrote specifically, "Hence, the pwv-LAD should be contextualized within these findings (which they do relative to v-fiLADs)"

      We are not sure exactly what Reviewer 2 means here. In this manuscript we defined p-w-v iLADs, not LADs. So, it would be inappropriate to compare a subset of iLAD regions with different types of LADs.

      If this was the meaning of Reviewer 2, then other readers might have similar confusion. Therefore, we added the following clarifying text in red:

      "Several previous studies have used varying approaches to subdivide LADs further into distinct subsets of LADs with different biochemical and/or functional properties (Martin et al., 2024; Meuleman et al., 2013; Shah et al., 2023; Zheng et al., 2015). However, in this Section we focused instead on asking whether regions specifically within iLADs might show differential localization relative to the lamina and/or nucleoli and, if so, whether these regions would show different levels of gene expression. More specifically, analogously to how gene expression hot-zones appeared as local maxima in speckle TSA-seq with early DNA replication timing, we asked whether iLAD regions that appeared as local maxima in lamina proximity mapping signals would correspond to iLAD regions with locally reduced gene expression levels and later DNA replication timing relative to their flanking iLAD sequences. Our rationale was that these iLAD regions might represent chromatin domains that together with their flanking iLAD regions would typically localize well within the nuclear interior but in a fraction of the cell population would loop back and attach at the nuclear periphery."

      (2) We also added the following text near the end of the section about p-w-v iLADs to place them in the context of one class of "LADs" identified by ChIP-seq rather than DamID. We use quotation marks since the approach used produced a segmentation that included a nearly 50/50 mix of iLAD and LAD regions, as identified by DamID, for this class of domains.

      "We note that in a previous study a three-state Hidden Markov Model (HMM) segmented lamin B ChIP-seq data into two chromatin domain states with extensive overlap with LADs defined by lamina DamID (Shah et al., 2023). Whereas the late replicating, low gene density/expression "T1 LAD" state showed very high overlap (98%) with LADs defined by DamID, the intermediate replicating, intermediate gene expression "T2 LAD" state showed only 47% overlap with LADs defined by DamID. This was partly a result of the HMM segmentation algorithm but also due to substantial differences between the lamina ChIPseq versus DamID signals for reasons that remain unclear. The subset of p-w-v iLADs included in T2 comprise only a small percentage of the total T2 LAD coverage, which includes both other iLAD and LAD regions. Thus, the p-w-v iLADs we identified here represent a novel and distinct class of iLAD chromatin domains, not previously described."

      (3) Alternatively, what Reviewer 2 might be suggesting implicitly is that we should start with the regions identified as p-w-v iLADs in one cell type and then identify all of those p-w-v iLADs which instead exist as LADs in a second cell type. Once we have identified their LAD equivalents in a second cell type we could then ask whether they possess special characteristics such that they correspond to a specific type of LAD subset. Finally, we could then ask how that specific type of LAD subset compared to the different subtypes of LADs identified by other groups and, in particular, the references Reviewer 2 provided.

      We agree that would be an interesting future direction, but we consider that as outside the scope of this current manuscript. We note that we did no such analysis of the characteristics of LADs which existed as p-w-v iLADs in a different cell line. We save that for a possible future analysis, ideally in the same cell types as used in the cited references to allow a more direct comparison.

      (4) Finally, we added text in the Discussion that relates our analysis of the differential SON and LMNB1 TSA-seq signals for different LAD regions, and how these correlate with different histone modifications, with results from the recent preprint cited by Reviewer 2. Note that we could not directly correlate our results from human cells with the three classes of LADs described in MEFs by this preprint.

      "Fourth, we show how LAD regions showing different histone marks- either enriched in H3K9me3, H3K9me2 plus H2A.Z, H3K27me3, or none of these marks- can differentially segregate within nuclei. These results support the previous suggestion of different "flavors" of LAD regions, based on the sensitivity of the autonomous targeting of BAC transgenes to the lamina to different histone methyltransferases (Bian et al., 2013). Differential nuclear localization also was recently inferred by the appearance of different Hi-C Bsubcompartments, which similarly were differentially enriched in either H3K9m3, H3K27me3, or the combination of H3K9me2 and H2A.Z (Spracklin et al., 2023). More recently, and while this paper was in revision, a new study described segmenting mouse embryonic fibroblast LADs into three clusters using histone modification profiling (Martin et al., 2024). Interestingly, these three LAD clusters also most notably differed by their dominant enrichment of either H3K9me3, H3K9me2, or H3K27me3. Thus, three orthogonal approaches have converged on identifying different LAD regions showing differential enrichment either of H3K9me3, H3K9me2, or H3K27me3. Here, our use of TSA-seq directly measures and assigns the intranuclear localization of these different LAD regions to different nuclear locales."

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Gray and colleagues describe the identification of Integrator complex subunit 12 (INTS12) as a contributor to HIV latency in two different cell lines and in cells isolated from the blood of people living with HIV. The authors employed a high-throughput CRISPR screening strategy to knock down genes and assess their relevance in maintaining HIV latency. They had used a similar approach in two previous studies, finding genes required for latency reactivation or genes preventing it and whose knockdown could enhance the latency-reactivating effect of the NFκB activator AZD5582. This work builds on the latter approach by testing the ability of gene knockdowns to complement the latency-reactivating effects of AZD5582 in combination with the BET inhibitor I-BET151. This drug combination was selected because it has been previously shown to display synergistic effects on latency reactivation.

      The finding that INTS12 may play a role in HIV latency is novel, and the effect of its knockdown in inducing HIV transcription in primary cells, albeit in only a subset of donors, is intriguing. However, there are some data and clarifications that would be important to include to complement the information provided in the current version of the manuscript.

      We have now added the requested data and clarifications. In particular, we show that knockout of INTS12 has no effect on cell proliferation (new data added in Figure 2—figure supplement 3)), we clarify how the degree of knockout and the complementation were accomplished, we clarify the differences between the RNA-seq and the activation scores, and we have bolstered the claim that INTS12 affected transcription elongation by performing CUT&Tag on Ser2 phosphorylation of the C-terminal tail of RNAPII along the length of the provirus (new data added in Figure 5C) Please see detailed responses below.

      Reviewer #2 (Public review):

      Summary:

      Identifying an important role for the Integrator complex in repressing HIV transcription and suggesting that by targeting subunits of this complex specifically, INTS12, reversal of latency with and without latency reversal agents can be enhanced.

      Strengths:

      The strengths of the paper include the general strategy for screening targets that may activate HIV latency and the rigor of exploring the mechanism of INTS12 repression of HIV transcriptional elongation. I found the mechanism of INTS12 interesting and maybe even the most impactful part of the findings.

      Weaknesses:

      I have two minor comments:

      There was an opportunity to examine a larger panel of latency reversal agents that reactivate by different mechanisms to determine whether INTS12 and transcriptional elongation are limiting for a broad spectrum of latency reversal agents.

      I felt the authors could have extended their discussion of how exquisitely sensitive HIV transcription is to pausing and transcriptional elongation and the insights this provides about general HIV transcriptional regulation.

      We have now added data on latency reversal agents of different mechanisms of action. We show that INTS12 affects HIV latency reversal from agents that affect the non-canonical NF-kB pathway (AZD5582), the canonical NF-kB pathway (TNF-alpha), activation via the T-cell receptor (CD3/CD28 antibodies), through bromodomain inhibition (I-BET151), and through a histone deacetylase inhibitor (SAHA). This additional data has been added to the manuscript in Figure 7, panels B and C as well as adding text to the discussion.

      We appreciate the suggestion to extend the discussion to emphasize how important pausing and elongation are to HIV transcription. Additionally, to further support our claim that INTS12KO with AZD5582 & I-BET151 leads to an increase in elongation, that we previously showed with CUT&Tag data showing an increase in total RNAPII seen in within HIV (Figure 5B), we measured RNAPII Ser2 phosphorylation (Figure 5C) and RNAPII Ser5 phosphorylation (Figure 5—figure supplement 2) and added these findings to the manuscript. Upon measuring Ser2 phosphorylation, a marker associated with elongation, we observed evidence of elongation-competent RNAPII in our AZD5582 & I-BET151 condition as well as our INTS12 KO with AZD5582 & I-BET151 condition, as we saw an increase of Ser2 phosphorylation within HIV. Despite seeing elongation-competent RNAPII in both conditions, we only saw a dramatic increase in total RNAPII for our INTS12 KO and AZD5582 & I-BET151 condition (Figure 5B), which supports that there are more elongation events and that an elongation block is overcome specifically with INTS12 KO paired with AZD5582 & I-BET151. This claim is further supported by our data showing an increase in virus in the supernatant only with the INTS12 KO with AZD5582 & I-BET151 condition in cells from PLWH (Figure 6C). We did not observe any statistically significant differences between RNAPII Ser5 phosphorylation, which might be expected as this mark is not associated with elongation (Figure 5—figure supplement 2).

      Reviewer #3 (Public review):

      Summary:

      Transcriptionally silent HIV-1 genomes integrated into the host`s genome represent the main obstacle to an HIV-1 cure. Therefore, agents aimed at promoting HIV transcription, the so-called latency reactivating agents (LRAs) might represent useful tools to render these hidden proviruses visible to the immune system. The authors successfully identified, through multiple techniques, INTS12, a component of the Integrator complex involved in 3' processing of small nuclear RNAs U1 and U2, as a factor promoting HIV-1 latency and hindering elongation of the HIV RNA transcripts. This factor synergizes with a previously identified combination of LRAs, one of which, AZD5582, has been validated in the macaque model for HIV persistence during therapy (https://pubmed.ncbi.nlm.nih.gov/37783968/). The other compound, I-BET151, is known to synergize with AZD5582, and is a inhibitor of BET, factors counteracting the elongation of RNA transcripts.

      Strengths:

      The findings were confirmed through multiple screens and multiple techniques. The authors successfully mapped the identified HIV silencing factor at the HIV promoter.

      Weaknesses:

      (1) Initial bias:

      In the choice of the genes comprised in the library, the authors readdress their previous paper (Hsieh et al.) where it is stated: "To specifically investigate host epigenetic regulators involved in the maintenance of HIV-1 latency, we generated a custom human epigenome specific sgRNA CRISPR library (HuEpi). This library contains sgRNAs targeting epigenome factors such as histones, histone binders (e.g., histone readers and chaperones), histone modifiers (e.g., histone writers and erasers), and general chromatin associated factors (e.g., RNA and DNA modifiers) (Fig 1B and 1C)".

      From these figure panels, it clearly appears that the genes chosen are all belonging to the indicated pathways. While I have nothing to object to on the pertinence to HIV latency of the pathways selected, the authors should spend some words on the criteria followed to select these pathways. Other pathways involving epigenetic modifications and containing genes not represented in the indicated pathways may have been left apart.

      (2) Dereplication:

      From Figure 1 it appears that INTS12 alone reactivates HIV -1 from latency alone without any drug intervention as shown by the MACGeCk score of DMSO-alone controls. If INTS12 knockdown alone shows antilatency effects, why, then were they unable to identify it in their previous article (Hsieh et al., 2023)? The authors should include some words on the comparison of the results using DMSO alone with those of the previous screen that they conducted.

      (3) Translational potential:

      In order to propose a protein as a drug target, it is necessary to adhere to the "primum non nocere" principle in medicine. It is therefore fundamental to show the effects of INTS12 knockdown on cell viability/proliferation (and, advisably, T-cell activation). These data are not reported in the manuscript in its current form, and the authors are strongly encouraged to provide them.

      Finally, as many readers may not be very familiar with the general principles behind CRISPR Cas9 screening techniques, I suggest addressing them in this excellent review: https://pmc.ncbi.nlm.nih.gov/articles/PMC7479249/.

      (1) The CRISPR library used was more completely described in a previous publication (Hsieh et al, PLOS Pathogens, 2023). However, we now more explicitly refer the reader to information about the pathways targeted in the library. We also point out how initial hits in the library lead to finding genes outside of the starting library as in the follow-up screen in Figure 7 where each of the members of the INT complex are interrogated even though only INTS12 was the only member in the initial library.

      (2) We understand the confusion between the hits in this paper and a previous publication. Indeed, INTS12 was observed in Hsieh et al., PLOS Pathogens, 2023 as a hit in the Venn diagram of Figure 3B of that paper, and in Figure 5A, right panel of that paper. However, it was not followed up on in the previous paper since that paper focused on a hit that was unique to increasing the potency of one particular LRA. We added text to the present manuscript to make it clear that the screens identified many of the same hits. We have also added additional data here on hit validation to underscore the reliability of the CRISPR screen. In one of the cell lines (5A8), EZH2 was a strong hit (Figure 1B). We have now added data that shows that an inhibitor to EZH2 augments the latency reversal of AZD5582/I-BET151 as predicted from the screen. This data has been added to Figure 1, figure supplement 1.

      (3) We appreciate the concern that for INTS12 to be a drug target, it should not be essential to cell viability. We now show that knockout of INTS12 has no effect on cell proliferation (new data added in Figure 2—figure supplement 3). In addition, the discussion now adds additional literature references that describe how knockout of INTS12 has relatively minor effects on cell functions in comparison to knockout of other INT members which supports that the proposal that modulation of INTS12 may be more specific than targeting the catalytic modules of Integrator. Nonetheless, we completely agree with the reviewer that many other aspects of how INTS12 affects T cell functions have not been addressed as well as other potential detrimental effect of INTS12 as a drug target in vivo. We now more explicitly describe these caveats in the discussion but feel that the present manuscript is a first step with a long path ahead before the translational potential might be realized.

      (4) We now cite the review of CRISPR screens suggested by the reviewer.

      Responses to recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The authors report in the legend of Figure 2 (and similarly in other figures) that there was "a calculated INTS12 knockout score of 76% (for the one guide used) and 69% (for one of three guides used), respectively." However, it would be helpful to show representative data on the efficiency of INTS12 knockdown in cell lines and primary cells, as well as data on the efficiency of the complementation (Figure 2C).

      The knockout scores cited are the genetic assays for the efficiency based on sequence files. As the knockouts are done with multiple guides the knockout for each guide is an underestimate of the total knockout. The complementation, however, was done by adding back INTS12 in a lentiviral vector that also contains a drug resistance marker (puromycin). Cells were then selected for puromycin resistance, and therefore, all of them contain the complemented gene. What one would ideally like is a Western blot to quantify the amount of INTS12 remaining in the knockout pools. Unfortunately, despite obtaining multiple different commercial sources of INTS12 antibodies, we were unable to identify one that was suitable for Western blotting (as opposed to two that did work for CUT&Tag). Nonetheless, the functional data in primary T cells from PLWH and in J-Lat cells lines does show the even if the knockout is suboptimal, we find activation after INTS12 knockout (e.g., Figure 6).

      (2) Flow cytometry methods are not reported, but was a viability dye included when testing GFP reactivation (Figure S2)? More broadly, showing data on the viability of cells post-knockdown and drug treatments would help, as cell mortality is inherently associated with latency reactivation in J-Lat cells. For the same reason, reporting viability data would be important for primary cells, as the electroporation procedure can lead to significant mortality.

      We did not include viability dyes in the data for GFP activation. However, as described in the public response, we have done growth curves in J-Lat 10.6 cells with and without INTS12 knockout and find no effects on cell proliferation (Figure 2—figure supplement 3). As the reviewer points out, it is not possible to do these experiments in primary cells since the electroporation itself causes a degree of cell death. Nonetheless, we do see effects on HIV activation in these primary cells (Figure 6).

      (3) Figure S2 shows a relatively high baseline expression (approximately 15%) of HIV-GFP, which is not unusual for the J-Lat 10.6 clone. However, Figure 3 appears to show no HIV RNA reads in the control condition of this same cell clone. How do the authors reconcile this discrepancy?

      We believe that the discrepancies in the flow cytometry versus RNA-seq assays are due to differences in the sensitivity of the assays, the linear range of the assays especially at the lower end, and the different half-lives of RNA versus protein. We now clarify that Figure 3 does not show “no” HIV RNA at baseline, but rather values of ~30 copies per million read counts. This increases to ~800 copies per million read counts when INTS12 knockout cells are treated with AZD5582/I-BET151. These values have the same fold change predicted in Figure 4, and more closely resemble the trend in Figure 2—figure supplement 1.

      (4) The combination of AZD5582 and I-BET151 consistently reactivates HIV latency (including GFP protein expression), as previously reported and as shown here by the authors. However, in Figure 5B, RPB3/RNAPII occupancy in the DMSO control appears higher than in the AAVS1KO + AZD5582 and I-BET151 samples. This should be discussed, as it could raise concerns about the robustness of RPB3/RNAPII occupancy results as a proxy for provirus elongation.

      As addressed in the public comments, in order to strengthen our claims about transcriptional elongation control, we measured RNAPII Ser2 and Ser5 phosphorylation levels. We see evidence of elongation with Ser2 in the condition of concern (AAVS1 KO + AZD5582 & I-BET151) as well as our main condition of interest (INTS12 KO + AZD5582 & I-BET151) and no change in Ser5 for any condition. With both the Ser2 phosphorylation and total RNAPII as well as our virus release and transcription data we believe that we are seeing evidence of increased elongation with INTS12 KO with AZD5582 & I-BET151. One potential nuance that may not be gathered from the CUT&Tag data is the turnover rate of the polymerase. Despite the levels of RNAPII appearing lower in the condition of concern (AAVS1 KO + AZD5582 & I-BET151) compared to DMSO it is possible that low levels of elongation are occurring but that in our INTS12 KO + AZD5582 & I-BET151 condition there is more rapid elongation and this is why we can observe more RNAPII within HIV. This new data is added in Figure 5C and Figure 5—supplement 2 and its implications are now described in more detail in the discussion.

      (5) The authors write that "Degree of reactivation was correlated with reservoir size as donors PH504 (star symbol) and PH543 (upside down triangle) have the largest HIV reservoirs (supplemental Figure S2)." I could not find mention of the reservoir size of these donors in the figure provided.

      This confusion was caused by mislabeling of the supplement number, which we fixed, and we added additional labeling to make finding the reservoir size even more clear as this is an important part of the manuscript. This is now found in Supplemental file S4.

      Reviewer #3 (Recommendations for the authors):

      (1) The MAGeCK gene score is a feature that is essential for the interpretation of the results in Figure 1. The authors do quote the Li et al. paper where this score was described for the first time (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0554-4), however, they may understand that not all readers may be familiar with this score. Therefore a didactic short description of this score should be done when introducing the results in Figure 1.

      We have added a short description to the paper to address this.

      (2) Figure 4. The authors write: "Among the host genes most prominently affected by INTS12 knockout with AZD5582 & I-BET151 are MAFA, MAFB, and ID2 (full list of genes in supplemental file S3)." I am a bit confused. In the linked Excel file there is only a list of a few genes. The differentially expressed genes appear to be many more from Figure 4. The full list should be uploaded.

      We believe there was a mistake in our original uploading and naming of the supplements. We have now double-checked numbering on the supplements and added in text clarification of which excel tabs hold the desired information.

      (3) Figure 6: The authors are right in highlighting that there is a high level of variability in viral RNA in supernatants in the early stages of viral reactivation. It is therefore advisable to repeat measurements at Day 7, at which variability decreases and data are more reliable (please, see: https://www.thelancet.com/journals/ebiom/article/PIIS2352-3964(23)00443-7/fulltext).

      While it would have been nice to prolong these measurements, our current assay conditions are not optimal for longer term growth of the cells. We note that the measurements were all done in biological triplicates (independent knockouts) and in different individuals. Because the number of activatable latent proviruses is variable and the number of cells tested is limiting, the variability in the assays is expected.

      (4) Figure 7: The main genes outside the INTS family should be identified, also.

      We include the full list in supplemental file S5 and sort by most enriched.

      (5) Methods: A statistical paragraph should be added in the Methods section, detailing the data analysis procedures and the key parameters utilized (for example, which is the MAGeCK gene score threshold that they used to consider knockdown efficacy on HIV latency?).

      There is no MAGeCK score threshold that we use to determine efficacy on HIV latency. In a previous publication using CRISPR screens for HIV Dependency Factors (Montoya et al, mBio 2023), we showed that there is a relationship between the MAGeCK and the effect of that gene knockout on HIV replication (Figure 5 that paper). However, it is a continuum rather than a strict threshold and we believe that the effects on HIV latency would respond similarly. In the current paper, we have focused on the top hits rather than a comprehensive analysis of all the entire list. In case the reviewer is referring to the average and standard deviation of the non-targeting controls, we have added this to the figure legend and methods.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary:  

      The study identifies two types of activation: one that is cue-triggered and nonspecific to motion directions, and another that is specific to the exposed motion directions but occurs in a reversed manner. The finding that activity in the medial temporal lobe (MTL) preceded that in the visual cortex suggests that the visual cortex may serve as a platform for the manifestation of replay events, which potentially enhance visual sequence learning.  

      Strengths: 

      Identifying the two types of activation after exposure to a sequence of motion directions is very interesting. The experimental design, procedures, and analyses are solid. The findings are interesting and novel. 

      Weaknesses: 

      It was not immediately clear to me why the second type of activation was suggested to occur spontaneously. The procedural differences in the analyses that distinguished between the two types of activation need to be a little better clarified.  

      We thank the reviewer for his/her summary and constructive feedback on our study. We appreciate the recognition of the strengths of our study.

      The second type of activation, namely the replay of feature-specific reactivations, is considered spontaneous because it reflects internally driven neural processes rather than responses directly triggered by external stimuli. Unlike responses evoked by stimuli, spontaneous replay is not time-locked to stimulus onset. Instead, it arises from the brain's intrinsic activity, typically observed during offline periods (e.g., rest or blank period) when external stimuli are absent. This allows the neural system to reactivate and consolidate prior experiences without interference from ongoing external stimuli.

      Replay is believed to be a key mechanism underlying various cognitive functions, such as memory consolidation (Gillespie et al., 2021; Gridchyn et al., 2020), learning (Igata et al., 2021), prediction and planning (Ólafsdóttir et al., 2018). Furthermore, the hippocampus and related cortical areas engage in replay to extract abstract relationships from sequential experiences, forming a "template" that can generalize across contexts (Liu et al., 2019). In our study, the feature-specific replay observed during blank periods likely reflects this process, supporting the integration of exposed motion direction sequences into cohesive memory representations and facilitating visual sequence learning.

      We have extended the Discussion section to incorporate this explanation (Lines 440 - 447).

      Regarding the second question, the procedural differences between the two types of activations lie in the classifiers used for the two analyses: a multiclass classifier for non-specific elevated responses and binary classifiers for feature-specific replay. 

      For the non-feature-specific elevated responses, we trained a five-class (with the labels of the four RDKs and the ITI (inter-stimulus interval)) classifier on the localizer data and tested on the blank period in the main phase. We attempted to decode motion direction information at each time point at the group level. However, the results revealed no feature-specific information at the group level during the blank period.

      For the feature-specific replay, we employed the temporal delayed linear modeling (TDLM) to examine whether individual motion direction information was encoded in a sequential and spontaneous manner. Here, we first needed to train four binary classifiers, each was sensitive to only one motion direction (i.e., 0°, 90°, 180°, or 270°), as our aim was to quantify the evidence of feature-specific sequence in the subsequent analyses. For each classifier, positive instances were trials where the corresponding feature (e.g., 0°) was presented, while negative instances included trials with other features (e.g., 90°, 180°, and 270°) and an equivalent amount of null data from the ITI period (1–1.5 s).

      We have clarified these methodological details in the Methods section (Pages 34 – 41).

      Reviewer #2 (Public review): 

      This paper shows and analyzes an interesting phenomenon. It shows that when people are exposed to sequences of moving dots (that is moving dots in one direction, followed by another direction, etc.), showing either the starting movement direction or ending movement direction causes a coarse-grained brain response that is similar to that elicited by the complete sequence of 4 directions. However, they show by decoding the sensor responses that this brain activity actually does not carry information about the actual sequence and the motion directions, at least not on the time scale of the initial sequence. They also show a reverse reply on a highly compressed time scale, which is elicited during the period of elevated activity, and activated by the first and last elements of the sequence, but not others. Additionally, these replays seem to occur during periods of cortical ripples, similar to what is found in animal studies. 

      These results are intriguing. They are based on MEG recordings in humans, and finding such replays in humans is novel. Also, this is based on what seems to be sophisticated statistical analysis. However, this is the main problem with this paper. The statistical analysis is not explained well at all, and therefore its validity is hard to evaluate. I am not at all saying it is incorrect; what I am saying is that given how it is explained, it cannot be evaluated. 

      We thank the reviewer’s detailed evaluation as well as the acknowledgment of the novelty of our study.

      To address the concern about the statistical analysis, in the revised manuscript, we have modified the Methods section to provide a more detailed explanation of the analytical pipeline, particularly for several important aspects such as decoding probability and TDLM. (Lines 646 – 657, Lines 682 – 734). 

      Below, we provide point-by-point responses to further elaborate on these revisions and address the reviewer’s comments.

      Recommendations for the authors: 

      Reviewer #1 (Recommendations for the authors): 

      I have questions.  

      (1) Participants were exposed to a predefined sequence of motion directions either clockwise or counterclockwise. Is it possible that the observed replay is related to the activation of MST neurons? If a predetermined sequence is not in either clockwise or counterclockwise but is randomly determined like 0{degree sign}->180{degree sign}->270{degree sign}->90{degree sign}, would the same result be obtained?  

      We thank the reviewer for these thoughtful questions.

      First, regarding the potential involvement of MST neurons, it is plausible that the observed replay might involve activity in motion-sensitive brain regions, including the medial superior temporal (MST) and even middle temporal (MT) areas. MST neurons, located in the extrastriate visual cortex, are highly direction-selective and are known for their sensitivity to complex motion patterns, such as rotations and expansions (Duffy & Wurtz, 1991; Saito et al., 1986). In our experiment, the use of RDKs with four distinct motion directions might elicit responses in MST neurons. However, due to the limited spatial resolution of MEG, we cannot provide direct evidence for this claim. 

      Second, regarding the impact of randomly ordered sequences, we believe that the replay patterns would still occur even if the sequences were randomly ordered (e.g., 0° → 180° → 270° → 90°). After a sequence is repeatedly exposed, the hippocampus has the capacity to encode abstract relationships in the sequence. Evidence supporting this view comes from previous studies. For example, Liu et al., (2019) showed that replay does not merely recapitulate visual experience but can also follow a sequence implied by learned abstract knowledge. In their study, participants were instructed that viewing pictures C→D, B→C, and A→B implies a true sequence of A→B→C→D. During subsequent testing, they observed replay events following this learned true sequence, even with novel visual stimuli, indicating that the brain maintains sequence knowledge independent of specific stimuli. Similarly, Ekman et al., (2023) showed that prediction-based neural responses could be observed when moving dots were presented in a random order rather than in a clockwise or counterclockwise order, which correspond to the four motion directions in our study. 

      Together, these studies suggest that replay mechanisms in the brain are flexible and can encode and reproduce abstract relationships between sequential stimuli, regardless of their specific spatial contents. Therefore, we believe that even if the sequence were randomly ordered, the same backward replay pattern would still be observed.

      (2) Is it possible that the motion direction non-specific responses actually reflect the replay of another feature of the exposed sequence, namely, the temporally rhythmic presentations of the sequence, rather than suggested in the discussion?  

      We thank the reviewer for raising this insightful possibility.

      There is substantial evidence that rhythmic stimulation can entrain neural oscillations, which in turn facilitates predictions about future inputs and enhances the brain's readiness for incoming stimuli (Barne et al., 2022; Herrmann et al., 2016; Lakatos et al., 2008, 2013). In our study, the temporally rhythmic presentation of the motion sequence may have entrained oscillatory activity in the brain, leading to periodic activation of sensory cortices. This rhythmic entrainment could account for the observed nonspecific responses by reflecting the brain's temporal predictions rather than specific feature replay. 

      It is important to note that, however, this interpretation is in line with our initial explanation that the non-feature-specific elevated responses likely reflect a general facilitation of neural processes for any upcoming stimuli, rather than being tied to specific stimuli. The rhythmic entrainment mechanism provides another way to understand how the temporal structure in the sequences might contribute to the non-feature-specific elevated responses.

      We have revised the Discussion section to incorporate this interpretation, providing a more comprehensive account for the non-feature-specific elevated responses (Lines 428 – 439).

      Reviewer #2 (Recommendations for the authors): 

      The main problem with the paper is that the sophisticated statistical methodology is not explained well and therefore its validity is hard to evaluate. I am not at all saying it is incorrect, what I am saying is that given how it is explained, it cannot be evaluated.  

      See below for detailed point-by-point responses.  

      The first part is clear. There are 4 directions of motion, and there can also be a blank screen. The random decoding accuracy would be 20%. The decoding methods from the sensors yielded a little above 50% accuracy. This is clearly about chance, but much less than one would get from electrode recording of motion-selective cells in the cortex. However, the concept and methods used here seem clear, in contrast to what comes next.  

      Indeed, in the first step, we aimed to validate the reliability of our decoding model by applying a leave-one-out cross validation scheme to the localizer data. Our results showed that the decoding accuracy exceeded 50%, demonstrating robust decoding performance. However, due to the noninvasive nature of MEG and its low spatial resolution, the recorded signals represent population-level activity that inherently includes more noise compared to electrode recordings of motion-selective neurons. Therefore, the decoding accuracy in our study is understandably lower than that obtained with electrode recordings.

      Next, and most of the paper relies on this concept, they use the term decoding probability (Figure 2). What is the decoding probability measure (Turner 2023)? This is not explained in the methods section. I scanned the Turner et al 2023 paper referenced and could not find the term decoding probability there. In short, I have no idea what this means. What are these numbers between 0-0.3? How does this relate to accuracies above 50% reported? This is an important concept here, and it is used throughout the paper, so it makes it hard to evaluate the paper.  

      We apologize for the lack of clarity in our explanation of the term "decoding probability." Specifically, we used a one-versus-rest Lasso logistic regression model trained on the localizer data to decode the MEG signal patterns elicited by each motion direction during the main phase. The trained model could be used to predict a single label at each time point for each trial (e.g., labels 1 – 4 correspond to the four motion directions and label 5 corresponds to the ITI period). By comparing the predicted label with the true label across test trials, we could compute the time-resolved decoding accuracy as final reports.

      Alternatively, rather than predicting a single label for each time point and each trial, the model can also output the probabilities associated with each label/class (e.g., we used the predict_proba function in scikit-learn). This results in a 5-column output, where each column represents the probability of the corresponding class, and the sum of the probabilities across the five columns equals 1. Finally, at each time point, averaging these probabilities across trials yields five values that indicate the likelihood of the predicted stimulus belonging to each class.

      For example, Figure 2 in the manuscript depicts the decoding probabilities for the four RDKs (the probabilities for the ITI class are not shown in the figure). The number in a cell (between 0 and 0.3) indicates the probability of each class at a given time point (Figure 2A). The decoding probability does not have a direct relationship with the decoding accuracy. However, since there are five classes, the chance level of the decoding probability is 0.2. The highest probability among the five classes at a given time point determines the decoded label when computing the decoding accuracy.

      For illustration, in the left panel of Figure 2B, at the onset of the first RDK (0 s), the mean decoding probabilities for the classes 0°, 90°, 180°, 270°, and the blank ITI are 5%, 4.1%, 4.0%, 4.5%, and 82.4%, respectively. Thus, the decoded label should be the blank ITI. In contrast, 0.4 s after the onset of the first RDK, the mean decoding probabilities for the five classes are 28.0%, 19.0%, 22.8%, 21.2%, and 9.0%, respectively. Therefore, the decoded label should be 0°.

      We have revised the Methods section to explain this issue (Lines 646 – 657).

      They did find compressed reversed reply events (Figures 3-4). This is again confusing for several reasons. First, because they use the same unexplained decoding probability measure. Second, the optimal time point defined above depends on the start time of a stimulus, but here the start time is random. Third, the TDLM algorithm is hard to understand. For example, what are the reactivation probabilities of Figure 3C? They do make an effort to explain this in the methods section (lines 652-697) but it's not clear enough from the outset. For example, what does the state X_j is this a vector of activity of sensors? Are these decoding probabilities of the different directions? What is it? Also, what is X_i vs X_i(\Delta t)? Frankly, despite their efforts, I am very confused. Additionally, the figures use the term reactivation probability, where is it defined? So again, the results seem interesting, but the methods are not explained well at all.  

      This paper must better explain the statistical methods so that they can be evaluated. This is not easy, these are relatively complex methods, but they must be explained much better so the validity of the paper can be examined.  

      Regarding the optimal time point, we defined it as the time point with the highest decoding accuracy, determined during the validation of the localizer data using a leave-one-out cross-validation scheme. This optimal time point was participant- and motion-direction-specific, as the latency to achieve the peak decoding accuracy varied across individuals and motion directions. For group-level visualization, we circularly shifted the data over time, aligning each optimal time point to a common reference point (arbitrarily set at 200 ms after stimulus onset). Importantly, however, these time points are unrelated to the data in the main phase, as the models were trained using the independent localizer data and then applied to each time point during the blank period in the main phase.

      Regarding the TDLM algorithm, detailed descriptions of the algorithm have been provided in the revised Methods section (Line 683 – 735). Furthermore, we have included explanatory notes in the main text and figure legend to provide immediate context for terms such as "reactivation probability" (Lines 247 – 248, Lines 275 – 276).

      This paper uses MEG in humans, a non-invasive technique. This allows for such results in humans. Indeed (if the methods are correct) these units can be decoded to provide statistically significant estimates of motion direction. Note, however, that the spatial resolution of MEG is limited. The decoding accuracies of above 50% are way above chance. Note however that if actual motion-sensitive neurons (e.g. area MT) were recorded, and even if the motion is far from 100% coherence, the decoding accuracy would approach 100%. 

      We agree with the reviewer that decoding accuracy would approach 100% if single-neuron data from motion-sensitive areas (e.g., area MT) were recorded, given the exceptionally high signal-to-noise ratio (SNR) of such data. However, two considerations inform the methodology of our study.

      First, while single-neuron recordings provide invaluable insights, acquiring such data in humans is both ethically challenging and logistically impractical.

      Non-invasive MEG, by contrast, offers a practical alternative that can achieve robust decoding of population-level activity with a reasonable SNR.

      Second, the primary goal of our study was not merely to achieve high decoding accuracy but also to examine the replay of an exposed motion sequence in the human visual cortex. To achieve this, we first needed to train feature-specific models that can be used to decode the spontaneous reactivations of the four motion directions during the blank period. The ability to distinguish representations of the four motion directions was essential for calculating the “sequenceness” of the exposed motion sequence in the TDLM algorithm. While the absolute decoding accuracy of MEG data may not match that of single-neuron data, an important outcome was the successful construction of feature-specific models for the four motion directions (Figure 3B in the manuscript). These models provided a robust foundation for investigating sequential replay in the brain. These results also align with the broader goal of leveraging MEG data to study dynamic neural processes in humans, even in the face of its spatial resolution limitation.

      Minor:  

      (1) Line 246 - there is no figure S2A, subplots are not labeled.  

      We have corrected this in the revised manuscript.

      (2) Is Figure 3B referred to in the text? Same for 3C. This figure is there for explaining the statistical models used, but it is not well utilized.

      We have modified the text to clarify this issue in the revised manuscript.

      (3) English:  

      There are problems with the use of English in the paper, this should be corrected in the next version. A few examples are below.  

      Noises -> noise  

      - "along the motion path in visual cortex" What does this sentence mean? Is this referring to motion-sensitive areas in the brain? Please clarify.  

      There are many other examples. This is minor, but should be corrected.

      We have corrected these errors in the revised manuscript.

      References

      Barne, L. C., Cravo, A. M., de Lange, F. P., & Spaak, E. (2022). Temporal prediction elicits rhythmic preactivation of relevant sensory cortices. European Journal of Neuroscience, 55(11–12), 3324–3339. https://doi.org/10.1111/ejn.15405

      Ekman, M., Kusch, S., & de Lange, F. P. (2023). Successor-like representation guides the prediction of future events in human visual cortex and hippocampus. eLife, 12, e78904. https://doi.org/10.7554/eLife.78904

      Gillespie, A. K., Maya, D. A. A., Denovellis, E. L., Liu, D. F., Kastner, D. B., Coulter, M. E., Roumis, D. K., Eden, U. T., & Frank, L. M. (2021). Hippocampal replay reflects specific past experiences rather than a plan for subsequent choice. Neuron, 109(19), 3149-3163.e6. https://doi.org/10.1016/j.neuron.2021.07.029

      Gridchyn, I., Schoenenberger, P., O’Neill, J., & Csicsvari, J. (2020). AssemblySpecific Disruption of Hippocampal Replay Leads to Selective Memory Deficit. Neuron, 106(2), 291-300.e6. https://doi.org/10.1016/j.neuron.2020.01.021

      Herrmann, B., Henry, M. J., Haegens, S., & Obleser, J. (2016). Temporal expectations and neural amplitude fluctuations in auditory cortex interactively influence perception. NeuroImage, 124, 487–497. https://doi.org/10.1016/j.neuroimage.2015.09.019

      Igata, H., Ikegaya, Y., & Sasaki, T. (2021). Prioritized experience replays on a hippocampal predictive map for learning. Proceedings of the National Academy of Sciences, 118(1), e2011266118. https://doi.org/10.1073/pnas.2011266118

      Lakatos, P., Karmos, G., Mehta, A. D., Ulbert, I., & Schroeder, C. E. (2008). Entrainment of Neuronal Oscillations as a Mechanism of Attentional Selection. Science, 320(5872), 110–113. https://doi.org/10.1126/science.1154735

      Lakatos, P., Musacchia, G., O’Connel, M. N., Falchier, A. Y., Javitt, D. C., & Schroeder, C. E. (2013). The Spectrotemporal Filter Mechanism of Auditory Selective Attention. Neuron, 77(4), 750–761. https://doi.org/10.1016/j.neuron.2012.11.034

      Liu, Y., Dolan, R. J., Kurth-Nelson, Z., & Behrens, T. E. J. (2019). Human Replay Spontaneously Reorganizes Experience. Cell, 178(3), 640-652.e14. https://doi.org/10.1016/j.cell.2019.06.012

      Ólafsdóttir, H. F., Bush, D., & Barry, C. (2018). The Role of Hippocampal Replay in Memory and Planning. Current Biology, 28(1), R37–R50. https://doi.org/10.1016/j.cub.2017.10.073

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Overall I found the approach taken by the authors to be clear and convincing. It is striking that the conclusions are similar to those obtained in a recent study using a different computational approach (finite state controllers), and lends confidence to the conclusions about the existence of an optimal memory duration. There are a few questions that could be expanded on in future studies:

      (1) Spatial encoding requirements

      The manuscript contrasts the approach taken here (reinforcement learning in a gridworld) with strategies that involve a "spatial map" such as infotaxis. However, the gridworld navigation algorithm has an implicit allocentric representation, since movement can be in one of four allocentric directions (up, down, left, right), and wind direction is defined in these coordinates. Future studies might ask if an agent can learn the strategy without a known wind direction if it can only go left/right/forward/back/turn (in egocentric coordinates). In discussing possible algorithms, and the features of this one, it might be helpful to distinguish (1) those that rely only on egocentric computations (run and tumble), (2) those that rely on a single direction cue such as wind direction, (3) those that rely on allocentric representations of direction, and (4) those that rely on a full spatial map of the environment.

      We agree that the question of what orientation skills are needed to implement an algorithm is interesting. We remark that our agents do not use allocentric directions in the sense of north, east, west and east relative to e.g. fixed landmarks in the environment. Instead, directions are defined relative to the mean wind, which is assumed fixed and known. (In our first answer to reviewers we used “north east south west relative to mean wind”, which may have caused confusion – but in the manuscript we only use upwind downwind and crosswind).

      (2) Recovery strategy on losing the plume

      The authors explore several recovery strategies upon losing the plume, including backtracking, circling, and learned strategies, finding that a learned strategy is optimal. As insects show a variety of recovery strategies that can depend on the model of locomotion, it would be interesting in the future to explore under which conditions various recovery strategies are optimal and whether they can predict the strategies of real animals in different environments.

      Agreed, it will be interesting to study systematically the emergence of distinct recovery strategies and compare to living organisms.

      (3) Is there a minimal representation of odor for efficient navigation?

      The authors suggest that the number of olfactory states could potentially be reduced to reduce computational cost. They show that reducing the number of olfactory states to 1 dramatically reduces performance. In the future it would be interesting to identify optimal internal representations of odor for navigation and to compare these to those found in real olfactory systems. Does the optimal number of odor and void states depend on the spatial structure of the turbulence as explored in Figure 5?

      We agree that minimal odor representations are an intriguing question. While tabular Q learning cannot derive optimal odor representations systematically, one could expand on the approach we have taken here and provide more comparisons. It will be interesting to follow this approach in a future study.

      Reviewer #2 (Public review):

      Summary:

      The authors investigate the problem of olfactory search in turbulent environments using artificial agents trained using tabular Q-learning, a simple and interpretable reinforcement learning (RL) algorithm. The agents are trained solely on odor stimuli, without access to spatial information or prior knowledge about the odor plume's shape. This approach makes the emergent control strategy more biologically plausible for animals navigating exclusively using olfactory signals. The learned strategies show parallels to observed animal behaviors, such as upwind surging and crosswind casting. The approach generalizes well to different environments and effectively handles the intermittency of turbulent odors.

      Strengths:

      * The use of numerical simulations to generate realistic turbulent fluid dynamics sets this paper apart from studies that rely on idealized or static plumes.

      * A key innovation is the introduction of a small set of interpretable olfactory states based on moving averages of odor intensity and sparsity, coupled with an adaptive temporal memory.

      * The paper provides a thorough analysis of different recovery strategies when an agent loses the odor trail, offering insights into the trade-offs between various approaches.

      * The authors provide a comprehensive performance analysis of their algorithm across a range of environments and recovery strategies, demonstrating the versatility of the approach.

      * Finally, the authors list an interesting set of real-world experiments based on their findings, that might invite interest from experimentalists across multiple species.

      Weaknesses:

      * Using tabular Q-learning is both a strength and a limitation. It's simple and interpretable, making it easier to analyze the learned strategies, but the discrete action space seems somewhat unnatural. In real-world biological systems, actions (like movement) are continuous rather than discrete. Additionally, the ground-frame actions may not map naturally to how animals navigate odor plumes (e.g. insects often navigate based on their own egocentric frame).

      We agree with the reviewer, and will look forward to study this problem further to make it suitable for meaningful comparisons with animal behavior.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors have addressed my major concerns and I support publication of this interesting manuscript. A couple of small suggestions:

      (1) In discussing performance in different environments (line 328-362) it might be easier to read if you referred to the environments by descriptive names rather than numbers.

      Thank you for the suggestion, which we implemented

      (2) Line 371: measurements of flow speed depend on antennae in insects. Insects can measure local speed and direct of flow using antennae, e.g. Bell and Kramer, 1979, Suver et al. 2019. Okubo et al. 2020,

      Thank you for the references

      (3) line 448: "Similarly, an odor detection elicits upwind surges that can last several seconds" maybe "Similarly, an odor detection elicits upwind surges that can outlast the odor by several seconds"?

      Thank you for the suggestion

      Reviewer #2 (Recommendations for the authors):

      I commend the authors for their revisions in response to reviewer feedback.

      While I appreciate that the manuscript is now accompanied by code and data, I must note that the accompanying code-repository lacks proper instructions for use and is likely incomplete (e.g. where is the main function one should run to run your simulations? How should one train? How should one recreate the results? Which data files go where?).

      For examples of high-quality code-release, please see the documentation for these RL-for-neuroscience code repositories (from previously published papers):

      https://github.com/ryzhang1/Inductive_bias

      https://github.com/BruntonUWBio/plumetracknets

      The accompanying data does provide snapshots from their turbulent plume simulations, which should be valuable for future research.

      Thank you for the suggestions for how to improve clarity of the code. The way we designed the repository is to serve both the purpose of developing the code as well as sharing. This is because we are going to build up on this work to proceed further. Nothing is missing in the repository (we know it because it is what we actually use).

      We do plan to create a more user-friendly version of the code, hopefully this will be ready in the next few months, but it wont be immediate as we are aiming to also integrate other aspects of the work we are currently doing in the Lab. The Brunton repository is very well organized, thanks for the pointer.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Overall I found the approach taken by the authors to be clear and convincing. It is striking that the conclusions are similar to those obtained in a recent study using a different computational approach (finite state controllers), and lend confidence to the conclusions about the existence of an optimal memory duration. There are a few points or questions that could be addressed in greater detail in a revision:

      (1) Discussion of spatial encoding

      The manuscript contrasts the approach taken here (reinforcement learning in a grid world) with strategies that involve a "spatial map" such as infotaxis. The authors note that their algorithm contains "no spatial information." However, I wonder if further degrees of spatial encoding might be delineated to better facilitate comparisons with biological navigation algorithms. For example, the gridworld navigation algorithm seems to have an implicit allocentric representation, since movement can be in one of four allocentric directions (up, down, left, right). I assume this is how the agent learns to move upwind in the absence of an explicit wind direction signal. However, not all biological organisms likely have this allocentric representation. Can the agent learn the strategy without wind direction if it can only go left/right/forward/back/turn (in egocentric coordinates)? In discussing possible algorithms, and the features of this one, it might be helpful to distinguish<br /> (1) those that rely only on egocentric computations (run and tumble),<br /> (2) those that rely on a single direction cue such as wind direction,<br /> (3) those that rely on allocentric representations of direction, and<br /> (4) those that rely on a full spatial map of the environment.

      As Referee 1 points out, even if the algorithm does not require a map of space, the agent is still required to tell apart directions relative to the wind direction which is assumed known. Indeed, although in the manuscript we labeled actions allocentrically as “ up down left and right”, the source is always placed in the same location, hence “left” corresponds to upwind; “right” to downwind and “up” and “down” to crosswind right and left. Thus in fact directions are relative to the mean wind, which is therefore assumed known. We have better clarified the spatial encoding required to implement these strategies, and re-labeled the directions as upwind, downwind, crosswind-right and crosswind-left.

      In reality, animals cannot measure the mean flow, but rather the local flow speed e.g. with antennas for insects, with whiskers for rodents and with the lateral line for marine organisms. Further work is needed to address how local flow measures enable navigation using Q learning.

      (2) Recovery strategy on losing the plume

      While the approach to encoding odor dynamics seems highly principled and reaches appealingly intuitive conclusions, the approach to modeling the recovery strategy seems to be more ad hoc. Early in the paper, the recovery strategy is defined to be path integration back to the point at which odor was lost, while later in the paper, the authors explore Brownian motion and a learned recovery based on multiple "void" states. Since the learned strategy works best, why not first consider learned strategies, and explore how lack of odor must be encoded or whether there is an optimal division of void states that leads to the best recovery strategies? Also, although the authors state that the learned recovery strategies resemble casting, only minimal data are shown to support this. A deeper statistical analysis of the learned recovery strategies would facilitate comparison to those observed in biology.

      We thank Referee 1 for their remarks and suggestion to give the learned recovery a more prominent role and better characterize it. We agree that what is done in the void state is definitely key to turbulent navigation. In the revised manuscript, we have further substantiated the statistics of the learned recovery by repeating training 20 times and comparing the trajectories in the void (Figure 3 figure supplement 3, new Table 1). We believe however that starting with the heuristic recovery is clearer because it allows to introduce the concept of recovery more clearly. Indeed, the learned “recovery” is so flexible that it ends up mixing recovery (crosswind motion) to aspects of exploitation (surge): we defer a more in-depth analysis that disentangles these two aspects elsewhere. Also, we added a whole new comparison with other biologically inspired recoveries both in the native environment and for generalization (Figure 3 and 5).

      (3) Is there a minimal representation of odor for efficient navigation?

      The authors suggest (line 280) that the number of olfactory states could potentially be reduced to reduce computational cost. This raises the question of whether there is a maximally efficient representation of odors and blanks sufficient for effective navigation. The authors choose to represent odor by 15 states that allow the agent to discriminate different spatial regimes of the stimulus, and later introduce additional void states that allow the agent to learn a recovery strategy. Can the number of states be reduced or does this lead to loss of performance? Does the optimal number of odor and void states depend on the spatial structure of the turbulence as explored in Figure 5?

      We thank the referee for their comment. Q learning defines the olfactory states prior to training and does not allow a systematic optimization of odor representation for the task. We can however compare different definitions of the olfactory states, for example based on the same features but different discretizations. We added a comparison with a drastically reduced number of non-empty olfactory states to just 1, i.e. if the odor is above threshold at any time within the memory, the agent is in the non-void olfactory state, otherwise it is in the void state. This drastic reduction in the number of olfactory states results in less positional information and degrades performance (Figure 5 figure supplement 5).

      The number of void states is already minimal: we chose 50 void states because this matches the time agents typically remain in the void (less than 50 void states results in no convergence and more than 50 introduces states that are rarely visited).

      One may instead resort to deep Q-learning or to recurrent neural networks, which however do not provide answers as for what are the features or olfactory states that drive behavior (see discussion in manuscript and questions below).

      Reviewer #2 (Public review):

      Summary:

      The authors investigate the problem of olfactory search in turbulent environments using artificial agents trained using tabular Q-learning, a simple and interpretable reinforcement learning (RL) algorithm. The agents are trained solely on odor stimuli, without access to spatial information or prior knowledge about the odor plume's shape. This approach makes the emergent control strategy more biologically plausible for animals navigating exclusively using olfactory signals. The learned strategies show parallels to observed animal behaviors, such as upwind surging and crosswind casting. The approach generalizes well to different environments and effectively handles the intermittency of turbulent odors.

      Strengths:

      (1) The use of numerical simulations to generate realistic turbulent fluid dynamics sets this paper apart from studies that rely on idealized or static plumes.

      (2) A key innovation is the introduction of a small set of interpretable olfactory states based on moving averages of odor intensity and sparsity, coupled with an adaptive temporal memory.

      (3) The paper provides a thorough analysis of different recovery strategies when an agent loses the odor trail, offering insights into the trade-offs between various approaches.

      (4) The authors provide a comprehensive performance analysis of their algorithm across a range of environments and recovery strategies, demonstrating the versatility of the approach.

      (5) Finally, the authors list an interesting set of real-world experiments based on their findings, that might invite interest from experimentalists across multiple species.

      Weaknesses:

      (1) The inclusion of Brownian motion as a recovery strategy, seems odd since it doesn't closely match natural animal behavior, where circling (e.g. flies) or zigzagging (ants' "sector search") could have been more realistic.

      We agree that Brownian motion may not be biologically plausible -- we used it as a simple benchmark. We clarified this point, and re-trained our algorithm with adaptive memory using circling and zigzaging (cast and surge) recoveries. The learned recovery outperforms all heuristic recoveries (Figure 3D, metrics G). Circling ranks second, and achieves these good results by further decreasing the probability of failure and paying slightly in speed. When tested in the non-native environments 2 to 6, the learned recovery performs best in environments 2, 5 and 6 i.e. from long range more relevant to flying insects; whereas circling generalizes best in odor rich environments 3 and 4, representative of closer range and close to the substrate (Figure 5B, metrics G). In the new environments, similar to the native environment, circling favors convergence (Figure 5B, metrics f<sup>+</sup>) over speed (Figure 5B, metrics g<sup>+</sup> and τ<sub>min</sub>/τ), which is particularly deleterious at large distance.

      (2) Using tabular Q-learning is both a strength and a limitation. It's simple and interpretable, making it easier to analyze the learned strategies, but the discrete action space seems somewhat unnatural. In real-world biological systems, actions (like movement) are continuous rather than discrete. Additionally, the ground-frame actions may not map naturally to how animals navigate odor plumes (e.g. insects often navigate based on their own egocentric frame).

      We agree with the reviewer that animal locomotion does not look like a series of discrete displacements on a checkerboard. However, to overcome this limitation, one has to first focus on a specific system to define actions in a way that best adheres to a species’ motor controls. Moreover, these actions are likely continuous, which makes reinforcement learning notoriously more complex. While we agree that more realistic models are definitely needed for a comparison with real systems, this remains outside the scope of the current work. We have added a remark to clarify this limitation.

      (3) The lack of accompanying code is a major drawback since nowadays open access to data and code is becoming a standard in computational research. Given that the turbulent fluid simulation is a key element that differentiates this paper, the absence of simulation and analysis code limits the study's reproducibility.

      We have published the code and the datasets at

      - code: https://github.com/Akatsuki96/qNav

      - datasets: https://zenodo.org/records/14655992

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line 59-69: In comparing the results here to other approaches (especially the Verano and Singh papers), it would also be helpful to clarify which of these include an explicit representation of the wind direction. My understanding is that both the Singh and Verano approaches include an explicit representation of wind direction. In Singh wind direction is one of the observations that inputs to the agent, while in Verano, the actions are defined relative to the wind direction. In the current paper, my understanding is that there is no explicitly defined wind direction, but because movement directions are encoded allocentrically, the agent is able to learn the upwind direction from the structure of the plume- is this correct? I think this information would be helpful to spell out and also to address whether an agent without any allocentric direction sense can learn the task.

      Thank you for the comment. In our algorithm the directions are defined relative to the mean wind, which is assumed known, as in Verano et al. As far as we understand, Singh et al provide the instantaneous, egocentric wind velocities as part of the input.

      (1) Line 105: "several properties of odor stimuli depend on the distance from the source" might cite Boie...Victor 2018, Ackles...Schaefer, 2021, Nag...van Breugel 2024.

      Thank you for the suggestions - we have added these references

      (2) Line 130: "we first define a finite set of olfactory states" might be helpful to the reader to state what you chose in this paragraph rather than further down.

      We have slightly modified the incipit of the paragraph. We first declare we are setting out to craft the olfactory states, then define the challenges, finally we define the olfactory states.

      (3) Line 267: "Note that the learned recovery strategy resembles casting behavior observed in flying insects" Might note that insects seem to deploy a range of recovery strategies depending on locomotor mode and environment. For example, flying flies circle and sink when odor is lost in windless environments (Stupski and van Breugel 2024).

      Thank you for your comment. We have included the reference and we now added comparisons to results using circling and cast & surge recovery strategies.

      (4) Line 289: "from positions beyond the source, the learned strategy is unable to recover the plume as it mostly casts sideways, with little to no downwind action" This is curious as many insects show a downwind bias in the absence of odor that helps them locate the plumes in the first place (e.g. Wolf and Wehner, 2000, Alvarez-Salvado et al. 2018). Is it possible that the agent could learn a downwind bias in the absence of odor if given larger environments or a longer time to learn?

      The reviewer is absolutely correct – Downwind motion is not observed in the recovery simply because the agent rarely overshoots the source. Hence overall optimization for that condition is washed out by the statistics. We believe downwind motion will emerge if an agent needs to avoid overshooting the source – we do not have conclusive results yet but are planning to introduce such flexibility in a further work. We added this remark and refs.

      (5) Line 377-391: testing these ideas in living systems. Interestingly, Kathman..Nagel 2024 (bioRxiv) shows exactly the property predicted here and in Verano in fruit flies- an odor memory that outlasts the stimulus by a duration of several seconds, appropriate for filling in "blanks." Relatedly, Alvarez-Salvado et al. 2018 showed that fly upwind running reflected a temporal integration of odor information over ~10s, sufficient to avoid responding to blanks as loss of odor.

      Indeed, we believe this is the most direct connection between algorithms and experiments. We are excited to discuss with our colleagues and pursue a more direct comparison with animal behavior. We were aware of the references and forgot to cite them, thank you for your careful reading of our work !

      Reviewer #2 (Recommendations for the authors):

      Suggestions

      (1) The paper does not clearly specify which type of animals (e.g., flying insects, terrestrial mammals) the model is meant to approximate or not approximate. The authors should consider clarifying how these simulations are suited to be a general model across varied olfactory navigators. Further, it isn't clear how low/high the intermittency studied in this model is compared to what different animals actually encounter. (Minor: The Figure 4 occupancy circles visualization could be simplified).

      Environment 1 represents the lower layers of a moderately turbulent boundary layer. Search occurs on a horizontal plane ~half meter from the ground. The agent is trained at distances of about 10 meters and also tested on longer distances  ~ 17 meters (environment 6), lower heights ~1cm from the ground (environments 3-4), lower Reynolds number (environment 5) and higher threshold of detection (environment 2 and 4). Thus Environments 1,2,5 and 6 are representative of conditions encountered by flying organisms (or pelagic in water), and Environments 3 and 4 of searches near the substrate, potentially involved in terrestrial navigation (benthic in water). Even near the substrate, we use odor dispersed in the fluid, and not odor attached to the substrate (relevant to trail tracking).

      Also note that we pick Schmidt number Sc = 1 and this is appropriate for odors in air but not in water. However, we expect a weak dependence on the Schmidt number as the Batchelor and Kolmogorov scales are below the size of the source and we are interested in the large scale statistics Falkovich et al., 2001; Celani et al., 2014; Duplat et al., 2010.

      Intermittency contours are shown in Fig 1C, they are highest along the centerline, and decay away from the centerline, so that even within the plume detecting odor is relatively rare. Only a thin region near the centerline has intermittency larger than 66%; the outer and most critical bin of the plume has intermittency under 33%; in the furthest point on the centerline intermittency is <10%. For reference, experimental values in the atmospheric boundary layer report intermittency 25% to 20% at 2 to 15m from the source along the centerline (Murlis and Jones, 1981).

      We have more clearly labeled the contours in Fig 1C and added these remarks.

      We included these remarks and added a whole table with matching to real conditions within the different environments.

      (2) Could some biological examples and references be added to support that backtracking is a biologically plausible mechanism?

      Backtracking was observed e.g. in ants displaced in unfamiliar environments (Wystrach et al, P Roy Soc B, 280,  2013), in tsetse flies executing reverse turns uncorrelated to wind, which bring them back towards the location where they last detected odor (Torr, Phys Entom, 13, 1988, Gibson & Brady Phys Entom 10, 1985) and in coackroaches upon loss of contact with the plume (Willis et al, J. Exp. Biol. 211, 2008). It is also used in computational models of olfactory navigation (Park et al, Plos Comput Biol, 12:e1004682, 2016).

      (3) Hand-crafted features can be both a strength and a limitation. On the one hand, they offer interpretability, which is crucial when trying to model biological systems. On the other hand, they may limit the generality of the model. A more thorough discussion of this paper's limitations should address this.

      (4) The authors mention the possibility of feature engineering or using recurrent neural networks, but a more concrete discussion of these alternatives and their potential advantages/disadvantages would be beneficial. It should be noted that the hand-engineered features in this manuscript are quite similar to what the model of Singh et al suggests emerges in their trained RNNs.

      Merged answer to points 3 and 4.

      We agree with the reviewer that hand-crafted features are both a strength and a limitation in terms of performance and generality. This was a deliberate choice aimed at stripping the algorithm bare of implicit components, both in terms of features and in terms of memory. Even with these simple features, our model performs well in navigating across different signals, consistent with our previous results showing that these features are a “good” surrogate for positional information.

      To search for the most effective temporal features, one may consider a more systematic hand crafting, scaling up our approach. In this case one would first define many features of the odor trace; rank groups of features for their accuracy in regression against distance; train Q learning with the most promising group of features and rank again. Note however that this approach will be cumbersome because multiple factors will have to be systematically varied: the regression algorithm; the discretization of the features and the memory.

      Alternatively, to eliminate hand crafting altogether and seek better performance or generalization, one may consider replacing these hand-crafted features and the tabular Q-learning approach with recurrent neural networks or with finite state controllers. On the flip side, neither of these algorithms will directly provide the most effective features or the best memory, because these properties are hidden within the parameters that are optimized for. So extra work is needed to interrogate the algorithms and extract these information. For example, in Singh et al, the principal components of the hidden states in trained agents correlate with head direction, odor concentration and time since last odor encounter. More work is needed to move beyond correlations and establish more systematically what are the features that drive behavior in the RNN.

      We have added these points to the discussion.

      (5) Minor: the title of the paper doesn't immediately signal its focus on recovery strategies and their interplay with memory in the context of olfactory navigation. Given the many other papers using a similar RL approach, this might help the authors position this paper better.

      We agree with the referee and have modified the title to reflect this.

      (6) Minor: L 331: "because turbulent odor plumes constantly switch on and off" -- the signal received rather than the plume itself is switching on and off.

      Thank you for the suggestion, we implemented it.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In the study "Re-focusing visual working memory during expected and unexpected memory tests" by Sisi Wang and Freek van Ede, the authors investigate the dynamics of attentional re-orienting within visual working memory (VWM). Utilizing a robust combination of behavioral measures, electroencephalography (EEG), and eye tracking, the research presents a compelling exploration of how attention is redirected within VWM under varying conditions. The research question addresses a significant gap in our understanding of cognitive processes, particularly how expected and unexpected memory tests influence the focus and re-focus of attention. The experimental design is meticulously crafted, enabling a thorough investigation of these dynamics. The figures presented are clear and effectively illustrate the findings, while the writing is concise and accessible, making the complex concepts understandable. Overall, this study provides valuable insights into the mechanisms of visual working memory and attentional re-orienting, contributing meaningfully to the field of cognitive neuroscience. Despite the strengths of the manuscript, there are several areas where improvements could be made.

      We thank the reviewer for this summary and positive appraisal of our study and our findings. In addition, we are of course grateful for the excellent suggestions for improvements that we have embraced to further strengthen our article. 

      Microsaccades or Saccades?

      In the manuscript, the terms "microsaccades" and "saccades" are used interchangeably. For instance, "microsaccades" are mentioned in the keywords, whereas "saccades" appear in the results section. It is crucial to differentiate between these two concepts. Saccades are large, often deliberate eye movements used for scanning and shifting attention, while microsaccades are small, involuntary movements that maintain visual perception during fixation. The authors note the connection between microsaccades and attention, but it is not well-recognized that saccades are directly linked to attention. Despite the paradigm involving a fixation point, it remains unclear whether large eye movements (saccades) were removed from the analysis. The authors mention the relationship between microsaccades and attention but do not clarify whether large eye movements (saccades) were excluded from the analysis. If large eye movements were removed during data processing, this should be documented in the manuscript, including clear definitions of "microsaccades" and "saccades." If such trials were not removed, the contribution of large eye movements to the results should be shown, and an explanation provided as to why they should be considered.

      We thank the reviewer for raising this relevant point. Before turning to this relevant distinction, we first wish to clarify how, for our main aim of tracking the dynamics of ‘re-orienting in working memory’, any spatial modulation in gaze – be it driven by micro- or macro-saccades – suits this purpose. Having made this explicit, we also fully agree that disambiguating the nature of the saccade bias during internal focusing has additional value.

      Because it is notoriously challenging (or at least inherently arbitrary) to draw an absolute fixed boundary between macro- and microsaccades, we instead decided to adopt a two-stage approach to our analysis (building on prior studies from our lab, e.g., de Vries et al., 2023; Liu et al., 2023; Liu et al., 2022). In the first step, we analysed spatial biases in all detected saccades no matter their size (hence our labelling of them as “saccades” when describing these analyses). In a second step, we decomposed and visualized the saccade-rate effect as a function of saccade size in degrees. This second stage directly exposed the ‘nature’ of the saccade bias, as we visualized in Figure 2c (with time on the x axis, saccade size on the y axis, and the spatial modulation color coded). Because these visualizations directly address this major comment, we have now made these key set of results much clearer in our work (we agree that our original visualization of this key aspect of our data was suboptimal). In addition, we have added similar plot for the saccade data in the test-phase in Supplementary Figure S2b.

      These complementary analyses show how the saccade bias (more toward than away saccades) is indeed predominantly driven by small saccades (hence are labelling as “micro-saccades” when interpreting our findings), and less so by larger saccades associated with looking back all the way to the location where the memory item had been presented at encoding (positioned at 6 degrees). This is important as it helps to arbitrate between fixational/micro-saccadic eye-movement biases (previously associated with covert and internal attention shifts; cf. de Vries et al., 2023; Engbert and Kliegl, 2003; Hafed and Clark, 2002; Liu et al., 2023; Liu et al., 2022) vs. larger eye movements back to the original locations of the item (previously associated with ‘looking at nothing’ during memory retrieval and imagery; cf. Brandt and Stark, 1997; Ferreira et al., 2008; Johansson and Johansson, 2014; Laeng et al., 2014; Martarelli and Mast, 2013; Spivey and Geng, 2001). By adopting this visualization, we can show this while preserving the richness of our data, and without having to a-priori set an (inherently arbitrary) threshold for classifying saccades as either “macro” or “micro”.

      Having explained our rationale, we nevertheless agree with the reviewer that it is worth showing how our time course results hold up when only considering fixational eye movements below 2 visual degrees, which we consider “fixational” provided that our memory stimuli at encoding were presented at 6 visual degrees from central fixation. We show this in Supplementary Figure S1. As can be seen below, our main saccade bias results stay almost the same when restricting our analyses exclusively to fixational saccades within 2 degrees, both when considering our data after the retrocue (Supplementary Figure S1a) as well as after the memory test (Supplementary Figure S1b).

      Because we agree this is important complementary data, we have now added this as supplementary figures. In addition, we have added the results to our article. We also point to these additional corroborating findings at key instances in our article:  

      Page 5 (Results)

      “As in prior studies from our lab with similar experimental set-ups, internal attentional focusing was predominantly driven by fixational micro-saccades (small, involuntary eye-movements around current fixation). To reveal this in the current study, we decomposed and visualized the observed saccade-rate effect as a function of saccade size (Figure 2c), following the same procedure as we have adopted in other recent studies on this bias (de Vries et al., 2023; Liu et al., 2023; Liu et al., 2022). As shown in the saccade-size-over-time plots in Figure 2c, also in the current study, the difference between toward and away saccades (with red colours denoting more toward saccades) was predominantly driven by fixational saccades in the micro-saccades range (< 2°).”

      “Moreover, as shown in Supplementary Figure S1a, complementary analyses show that our time course (saccade bias) results hold even when exclusively considering eye movements below 2 visual degrees that we defined as “fixational” provided that the memory items were presented 6 visual degrees from the fixation during encoding. This further corroborates that the bias observed during internal attentional focusing was predominantly driven by fixational micro-saccades rather than looking back to the encoded location of the memory items (cf. Johansson and Johansson, 2014; Richardson and Spivey, 2000; Spivey and Geng, 2001; Wynn et al., 2019).”

      Page 7 (Results):

      “As shown in the corresponding saccade-size-over-time plots in Supplementary Figure S2b, consistent with what we observed following the cue, the difference between toward and away saccades following the test was again predominantly driven by saccades in the fixational microsaccade range (< 2°), and the time course (saccade bias) results hold even when exclusively considering fixational eye movements below 2 visual degrees (Supplementary Figure S1b). Thus, just like mnemonic focusing after the cue, re-orienting after the memory test was also predominantly reflected in fixational micro-saccades, and not looking back at the original location of the memory items that were encoded at 6 degrees away from central fixation.”

      Alpha Lateralization in Attentional Re-orienting

      In the attentional orienting section of the results (Figure 2), the authors effectively present EEG alpha lateralization results with time-frequency plots and topographic maps. However, in the attentional reorienting section (Figure 3), these visualizations are absent. It is important to note that the time period in attentional orienting differs from attentional re-orienting, and consequently, the time-frequency plots and topographic maps may also differ. Therefore, it may be invalid to compute alpha lateralization without a clear alpha activity difference. The authors should consider including timefrequency plots and topographic maps for the attentional re-orienting period to validate their findings.

      We thank the reviewer also for this constructive suggestion. The reason we did not expand on the time-frequency maps and topographies at the test-stage was the relative lack of alpha effects at the test stage (compared to the clearer alpha modulations after the retrocue). Nevertheless, we agree that including these data will increase transparency and the comprehensiveness of our article. We now added time-frequency plots and topographic maps for alpha lateralization in response to the workingmemory test in Supplementary Figure S2. As can be seen, the time-frequency plots and topographies in the re-focusing period after the working-memory test were consistent with our time-series plots in Figure 3a – reinforcing how alpha lateralization is generally not clear following the working-memory test. In accordance with this relevant addition, we added the following in the revised manuscript:

      Page 7 (Results):

      “For complementary time-frequency and topographical visualizations, see Supplementary Figure S2a.”

      Onset and Offset Latency of Saccade Bias

      The use of the 50% peak to determine the onset and offset latency of the saccade bias is problematic. For example, if one condition has a higher peak amplitude than another, the standard for saccade bias onset would be higher, making the observed differences between the onset/offset latencies potentially driven by amplitude rather than the latencies themselves. The authors should consider a more robust method for determining saccade bias onset and offset that accounts for these amplitude differences.

      We thank the reviewer for raising this valuable point. We agree that the calculation of onset and offset latencies of the saccade bias could be influenced by the peak amplitude of the waveforms. Thus, we further conducted the Fractional Area Latency (FAL) analysis on the comparison of the saccade bias following the working-memory test between valid cue (expected test) and invalid cue (unexpected test) trials. The FAL analysis has been commonly applied to Event-Related Potentials (ERPs) to estimate the latency of ERP components (Hansen and Hillyard, 1980; Luck, 2005). Instead of relying on the peak latency, the FAL method calculates latency based on a predefined fraction of the area under the waveform. This can provide a more robust measure of component latency. Prompted by this comment, we now also applied FAL analysis to our saccade bias waveforms. This corroborated our original conclusion. Because we believe this is an important complement, we now added these additional outcomes to our article: 

      Page 9 (Results): 

      “We additionally conducted Fractional Area Latency (FAL) analysis on the comparison of the saccade bias following the memory test between valid- and invalid-cue trials to rule out the potential contribution of peak amplitude differences into the onset and offset latency differences (Hansen and Hillyard, 1980; Kiesel et al., 2008; Luck, 2005). Consistent with our jackknife-based latency analysis, the FAL analysis revealed a significantly prolonged saccade bias following the unexpected tests (the invalid-cue trials) vs. expected tests (the valid-cue trials) in both 80% and 60% cue-reliability conditions (411 ms vs. 463 ms, t<sub>(14)</sub> = 2.358, p = 0.034; 417 ms vs. 468 ms, t<sub>(15)</sub> = 2.168, p = 0.047; for 80% and 60%, respectively). Again, there was no significant difference in onset latency following unexpected vs. expected tests. (346 ms vs. 374 ms, t<sub>(14)</sub> = 2.052, p = 0.060; 353 ms vs. 401 ms, t<sub>(15)</sub> = 1.577, p = 0.136; for 80% and 60%, respectively).”

      In accordance, we also added the following to our Methods:

      Page 18 (Methods): 

      “In addition to the jackknife-based latency analysis, we further applied a Fractional Area Latency (FAL) method to the saccade bias comparison between validly and invalidly cued memory tests to rule out the contribution of the peak amplitude difference into the onset and offset latency difference (Hansen and Hillyard, 1980; Kiesel et al., 2008; Luck, 2005). We first defined the onset and offset latency of the saccade bias as the first time point at which 25% or 75% of the total area of the component has been reached, relative to a lower boundary of a difference of 0.3 Hz between toward and away saccades (to remove the influence of noise fluctuations in our difference time course below this lower boundary). The extracted onset and offset latency for all participants was then compared using paired-samples t-tests.”

      Control Analysis for Trials Not Using the Initial Cue

      The control analysis for trials where participants did not use the initial cue raises several questions:

      (1) The authors claim that "unlike continuous alpha activity, saccades are events that can be classified on a single-trial level." However, alpha activity can also be analyzed at the single-trial level, as demonstrated by studies like "Alpha Oscillations in the Human Brain Implement Distractor Suppression Independent of Target Selection" by Wöstmann et al. (2019). If single-trial alpha activity can be used, it should be included in additional control analyses.

      We agree with the reviewer that alpha activity can also be analyzed at the single-trial level. However, because alpha is a continuous signal, single-trial alpha activity will necessarily be graded (trials with more or less alpha power). This is still different from saccades, that are not continuous signals but true ‘events’ (either a saccade was made, or no saccade was made, with no continuum in between). Because of this unique property, it is possible to sort trials by whether a saccade was present (and, if present, by its direction), in an all-or-none way that is not possible for alpha activity that can only be sorted by its graded amplitude/power. This is the key distinction underlying our motivation to sort the trials based on saccades, as we now make clearer: 

      Page 10 (Results): 

      “Although alpha can also be analyzed as the single trial level (e.g. Macdonald et al., 2011; Wöstmann et al., 2019; for a review, see Kosciessa et al., 2020), saccades offer the unique opportunity to split trials not by graded amplitude fluctuations but by discrete all-or-none events.” 

      In addition, please note how our saccade markers were also more reliable/sensitive, especially in the subsequent memory-test-phase of interest. This is another reason we decided to focus this control analysis on saccades and not alpha activity. 

      (2) The authors aimed to test whether the re-orienting signal observed after the test is not driven exclusively by trials where participants did not use the initial cue. They hypothesized that "in such a scenario, we should only observe attention deployment after the test stimulus in trials in which participants did not use the preceding retro cue." However, if the saccade bias is the index for attentional deployment, the authors should conduct a statistical test for significant saccade bias rather than only comparing toward-saccade after-cue trials with no-toward-saccade after-cue trials. The null results between the two conditions do not immediately suggest that there is attention deployment in both conditions.

      We thank the reviewer for bringing up this important point. We fully agree and, in fact, we had conducted the relevant statistical analysis for each of the conditions separately (in addition to their comparison). Upon reflection, we came to realize that in our original submission it was easy to overlook this point, and therefore thank the reviewer for flagging this. To make this clearer, we now also added the relevant statistical clusters in Figure 4a,b and more clearly report them in the associated text: 

      Page 10 (Results):

      “As we show in Figure 4a,b, we found clear gaze signatures of attentional deployment in response to expected (valid) memory tests, no matter whether we had pre-selected trials in which we had also seen such deployment after the cue in gaze (cluster P: 0.115, 0.041, 0.027, <0.001 for 80%-valid, 60%-valid, 80%-invalid, 60%-invalid trials, respectively), or not (cluster P: 0.016, 0.009, 0.001, <0.001 for 80%-valid, 60%-valid, 80%-invalid, 60%-invalid trials, respectively).”

      (3) Even if attention deployment occurs in both conditions, the prolonged re-orienting effect could also be caused by trials where participants did not use the initial cue. Unexpected trials usually involve larger and longer brain activity. The authors should perform the same analysis on the time after the removal of trials without toward-saccade after the cue to address this potential confound.

      We thank the reviewer for raising this. It is crucial to point out, however, that after any given 80% or 60% reliable cue, the participants cannot yet know whether the subsequent memory test in that trial will be expected (valid cue) or unexpected (invalid cue). Accordingly, the prolonged re-orienting after unexpected vs. expected memory tests cannot be explained by differential use of the cue (i.e., differential cue-use cannot be a “confound” for differential responses to expected and unexpected memory tests, as observed within the 80 and 60% cue-reliability conditions). 

      Reviewer #2 (Public Review):

      Summary:

      This study utilized EEG-alpha activity and saccade bias to quantify the spatial allocation of attention during a working memory task. The findings indicate a second stage of internal attentional deployment following the appearance of a memory test, revealing distinct patterns between expected and unexpected test trials. The spatial bias observed during the expected test suggests a memory verification process, whereas the prolonged spatial bias during the unexpected test suggests a reorienting response to the memory test. This work offers novel insights into the dynamics of attentional deployment, particularly in terms of orienting and re-orienting following both the cue and memory test.

      Strengths:

      The inclusion of both EEG-alpha activity and saccade bias yields consistent results in quantifying the attentional orienting and re-orienting processes. The data clearly delineate the dynamics of spatial attentional shifts in working memory. The findings of a second stage of attentional re-orienting may enhance our understanding of how memorized information is retrieved.

      Weaknesses:

      Although analyses of neural signatures and saccade bias provided clear evidence regarding the dynamics of spatial attention, the link between these signatures and behavioral performance remains unclear. Given the novelty of this study in proposing a second stage of 'verification' of memory contents, it would be more informative to present evidence demonstrating how this verification process enhances memory performance.

      We thank the reviewer for the positive summary of our work and for highlighting key strengths. We also appreciate the constructive suggestions, such as addressing the link between our observed refocusing signals and behavioral performance in our task. We now performed these additional analyses and added their outcomes to the revised article, as we detail in response to comment 2 below.  

      Reviewer #2 (Recommendations For The Authors):

      (1) Figure 2 shows graded spatial modulations in both EEG-alpha activity and saccade bias. However, while the imperative 100% cue conditions and 100% validity conditions largely overlap in EEG-alpha activity, a clear difference is present between these two conditions in saccade bias. The cause of the difference in saccade bias is unclear.

      We thank the reviewer for pointing out this interesting difference. At this stage, it is hard to know with certainty whether this reflects a genuine difference in our 100% reliable and 100% imperative cue conditions that is selectively picked up by our gaze but not alpha marker. Alternatively, this may reflect differential sensitivity of our two markers to different sources of noise. Either way, we agree that this observation is worth calling out and reflecting on when discussing these results: 

      Page 6 (Results):  

      “It’s worth noting that while alpha lateralization shows very comparable amplitudes in the imperative-100% and 100% conditions, the saccade bias was larger following imperative-100% vs. 100% reliable cues. This may reflect a difference between these two cueing conditions that is selectively picked up by our gaze marker (though it may also reflect differential sensitivity of our two markers to different sources of noise). […]”

      (2) Figure 3 shows signatures of attentional re-orienting after the memory test presented at the center. When the cue was not 100% valid, a noticeable saccade bias towards the memorized location of the test item was observed. This finding was explained as reflecting a re-orienting to the mnemonic contents. To strengthen this interpretation, I suggest providing evidence for the link between the attentional re-orienting signatures and memory performance.

      We thank the reviewer for this constructive suggestion. We now sorted trials by behavioral performance using a median split on RT (fast-RT vs. slow-RT trials) and reproduction error (highaccuracy vs. low-accuracy trials).  Because we believe the outcomes of these analyses increase transparency as well as the comprehensiveness of our article, we have now included them as Supplementary Figure S3.

      As shown below, we were able to link the saccade bias following the memory test to subsequent performance, but this reached significance only for the 80% valid-cue trials when splitting by RT (cluster P = 0.001). For the other conditions, we could not establish a reliable difference by our performance splits. Possibly this is due to a lack of sensitivity, given the relatively large number of conditions we had and, consequently, the relatively small number of trials we therefore had per condition (particularly in the invalid-cue condition with unexpected memory tests). We now bring forward these additional outcomes at the relevant section in our Results: 

      Page 7 (Results):

      “We also sorted patterns of gaze bias after the memory test by performance but could only establish a link between this gaze bias and RT following expected memory tests in our 80% cuereliability condition (cluster P = 0.001, Supplementary Figure S3). The lack of significant statistical differences in the remaining conditions may possibly reflect a lack of sensitivity (insufficient trial numbers) for this additional analysis.”

      (3) When comparing the time course of attentional re-orienting after the memory test, a prolonged attentional re-orienting was observed for unexpected memory tests compared to the expected ones. While the onset latency was similar for unexpected and expected memory tests, the offset latency was prolonged for the unexpected memory test. Could this be attributed to the learned tendency to saccade toward the expected location in more valid trials? In this case, the prolonged re-orienting may indicate increased efforts in suppressing the previously learned tendency.

      We thank the reviewer for bringing up this interesting possibility. In our original interpretation, this prolonged signal reflects a longer time needed to bring the unexpected memory content ‘back in focus’ before being able to report its orientation. At the same time, we agree that there are alternative explanations possible, such as the one raised by the reviewer. We now make this clearer when discussing this finding: 

      Page 14 (Discussion): 

      “[…] attentional deployment did become prolonged when re-focusing the unexpected memory item, likely reflecting prolonged effort to extract the relevant information from the memory item that was not expected to be tested. However, there may also be alternative accounts for this observation, such as suppressing a learned tendency to saccade in the direction of the expected item following an unexpected memory test.”

      (4) To test whether the re-orienting signature is predominantly influenced by trials where participants delayed the use of cue information until the memory test appeared, the authors sorted the trials based on saccade bias after the initial cue. However, it would be more informative to depict the reorienting patterns by sorting trials based on memory performance. The rationale is that in trials where participants delayed using the initial retro-cue, memory performance (e.g., measured by reproduction error) might be less precise due to the extended memory retention period. Compared to saccade bias for initial orienting, memory performance could provide more reliable evidence as it represents a more independent measure.

      We thank the reviewer for this suggestion. As delineated in response to comment 2, we now conducted this additional analysis and added the relevant outcomes to our article.  

      (5) While the number of trials was well-balanced across blocks (~ 240 trials), how did the authors address the imbalance between valid and invalid trials, especially in the 80% cue validity block?

      We thank the reviewer for raising this point.  First, we wish to point out that while trial numbers will indeed impact the sensitivity for finding an effect, trial numbers do not bias the mean – and therefore also not the comparison between means. In this light, it is vital to appreciate that our findings do not reflect a significant effect in valid trials but no significant effect in invalid trials (which we agree could be due to a difference in trial numbers), but rather a statistical difference between valid and invalid trials. This significant difference in the means between valid and invalid true cannot be attributed to a difference in trial numbers between these conditions. 

      Having clarified this, we nevertheless agree that it is also worthwhile to empirically validate this assertion and show how our findings hold even when carefully matching the number of trials between valid and invalid conditions (i.e., between expected and unexpected memory tests). To do so, we ran a sub-sampling analysis where we sub-sampled the number of valid trials to match the number of invalid trials available per condition (and averaged the results across 1000 random sub-samplings to increase reliability). As anticipated, this replicated our findings of robust differences between the gaze bias following expected and unexpected memory tests in both our 80 and 60% cue-reliability conditions. We now present these additional outcomes in Supplementary Figure S4.

      Because we agree this is an important re-assuring control analysis, we have now added this to our article:

      Page 9 (Results):

      “To rule out the possibility that the saccade-bias differences following expected and unexpected memory tests are caused by uneven trial numbers (200 vs. 50 trials in the 80% cuereliability condition, 150 vs. 100 trials in the 60% cue-reliability condition), we ran a subsampling analysis where we sub-sampled the number of valid trials to match the number of invalid trials available per condition (averaging the results across 1000 random sub-samplings to increase reliability). As shown in Supplementary Figure S4, this complementary subsampling analysis confirmed that our observed differences between the saccade bias following expected and unexpected memory tests in both 80% and 60% cue-reliability conditions are robust even when carefully matching the number of trials between validly cued (expected) and invalidly cued (unexpected) memory test.”

      Reviewer #3 (Public Review):

      Summary:

      Wang and van Ede investigate whether and how attention re-orients within visual working memory following expected and unexpected centrally presented memory tests. Using a combination of spatial modulations in neural activity (EEG-alpha lateralization) and gaze bias quantified as time courses of microsaccade rate, the authors examined how retro cues with varying levels of reliability influence attentional deployment and subsequent memory performance. The conclusion is that attentional reorienting occurs within visual working memory, even when tested centrally, with distinct patterns following expected and unexpected tests. The findings provide new value for the field and are likely of broad interest and impact, by highlighting working memory as an action-bound process (in)dependent on (an ambiguous) past.

      Strengths:

      The study uniquely integrates behavioral data (accuracy and reaction time), EEG-alpha activity, and gaze tracking to provide a comprehensive analysis of attentional re-orienting within visual working memory. As typical for this research group, the validity of the findings follows from the task design that effectively manipulates the reliability of retro cues and isolates attentional processes related to memory tests. The use of well-established markers for spatial attention (i.e. alpha lateralization) and more recently entangled dependent variable (gaze bias) is commendable. Utilizing these dependent metrics, the concise report presents a thorough analysis of the scaling effects of cue reliability on attentional deployment, both at the behavioral and neural levels. The clear demonstration of prolonged attentional deployment following unexpected memory tests is particularly noteworthy, although there are no significant time clusters per definition as time isn't a factor in a statistical sense, the jackknife approach is convincing. Overall, the evidence is compelling allowing the conclusion of a second stage of internal attentional deployment following both expected and unexpected memory tests, highlighting the importance of memory verification and re-orienting processes.

      Weaknesses:

      I want to stress upfront that these weaknesses are not specific to the presented work and do not affect my recommendation of the paper in its present form.

      The sample size is consistent with previous studies, a larger sample could enhance the generalizability and robustness of the findings. The authors acknowledge high noise levels in EEG-alpha activity, which may affect the reliability of this marker. This is a general issue in non-invasive electrophysiology that cannot be handled by the authors but an interested reader should be aware of it. Effectively, the sensitivity of the gaze analysis appears "better" in part due to the better SNR. The latter also sets the boundaries for single-tiral analyses as the authors correctly mention. In terms of generalizability, I am convinced that the main outcome will likely generalize to different samples and stimulus types. Yet, as typical for the field future research could explore different contexts and task demands to validate and extend the findings. The authors provide here how and why (including sharing of data and code).

      We thank the reviewer for summarising our work and for carefully delineating its strengths. We also appreciate the mentioning of relevant generic limitations and agree that important avenues for future studies will be to expand this work with larger sample sizes, complementary measurement techniques, and complementary task contexts and stimuli.    

      Reviewer #3 (Recommendations For The Authors):

      In the conclusion, Wang and van Ede successfully demonstrate that attentional re-orienting occurs within visual working memory following both expected and unexpected tests. The conclusions are supported by the data and analyses applied, showing that attentional deployment is by the reliability of retro cues. Centrally presented memory tests can invoke either a verification or a revision of internal focus, the latter thus far not considered in both theory and experimental design in cognitive neuroscience.

      I don't have any recommendations that will significantly change the conclusions.

      We thank the reviewer for having carefully evaluated our work and hope the reviewer will also perceive the changes we made and the additional analyses we added in responses to the other two reviewers as further strengthening our article.

      Reference

      Brandt SA, Stark LW. 1997. Spontaneous eye movements during visual imagery reflect the content of the visual scene. J Cogn Neurosci 9. doi:10.1162/jocn.1997.9.1.27

      de Vries E, Fejer G, van Ede F. 2023. No obligatory trade-off between the use of space and time for working memory. Communications Psychology.

      Engbert R, Kliegl R. 2003. Microsaccades uncover the orientation of covert attention. Vision Res 43. doi:10.1016/S0042-6989(03)00084-1

      Ferreira F, Apel J, Henderson JM. 2008. Taking a new look at looking at nothing. Trends Cogn Sci 12. doi:10.1016/j.tics.2008.07.007

      Hafed ZM, Clark JJ. 2002. Microsaccades as an overt measure of covert attention shifts. Vision Res 42. doi:10.1016/S0042-6989(02)00263-8

      Hansen JC, Hillyard SA. 1980. Endogeneous brain potentials associated with selective auditory attention. Electroencephalogr Clin Neurophysiol 49. doi:10.1016/0013-4694(80)90222-9

      Johansson R, Johansson M. 2014. Look Here, Eye Movements Play a Functional Role in Memory Retrieval. Psychol Sci 25. doi:10.1177/0956797613498260

      Kiesel A, Miller J, Jolicœur P, Brisson B. 2008. Measurement of ERP latency differences: A comparison of single-participant and jackknife-based scoring methods. Psychophysiology 45. doi:10.1111/j.1469-8986.2007.00618.x

      Kosciessa JQ, Grandy TH, Garrett DD, Werkle-Bergner M. 2020. Single-trial characterization of neural rhythms: Potential and challenges. Neuroimage 206. doi:10.1016/j.neuroimage.2019.116331

      Laeng B, Bloem IM, D’Ascenzo S, Tommasi L. 2014. Scrutinizing visual images: The role of gaze in mental imagery and memory. Cognition 131. doi:10.1016/j.cognition.2014.01.003

      Liu B, Alexopoulou SZ, van Ede F. 2023. Jointly looking to the past and the future in visual working memory. Elife.

      Liu B, Nobre AC, van Ede F. 2022. Functional but not obligatory link between microsaccades and neural modulation by covert spatial attention. Nat Commun 13. doi:10.1038/s41467-022-312173

      Luck S. 2005. Ten Simple Rules for Deisgning ERP Experiments. Event-related potentials: A methods handbook.

      Macdonald JSP, Mathan S, Yeung N. 2011. Trial-by-trial variations in subjective attentional state are reflected in ongoing prestimulus EEG alpha oscillations. Front Psychol 2. doi:10.3389/fpsyg.2011.00082

      Martarelli CS, Mast FW. 2013. Eye movements during long-term pictorial recall. Psychol Res 77. doi:10.1007/s00426-012-0439-7

      Richardson DC, Spivey MJ. 2000. Representation, space and Hollywood Squares: Looking at things that aren’t there anymore. Cognition 76. doi:10.1016/S0010-0277(00)00084-6

      Spivey MJ, Geng JJ. 2001. Oculomotor mechanisms activated by imagery and memory: Eye movements to absent objects. Psychol Res 65. doi:10.1007/s004260100059

      van Ede F, Chekroud SR, Nobre AC. 2019. Human gaze tracks attentional focusing in memorized visual space. Nat Hum Behav. doi:10.1038/s41562-019-0549-y

      Wöstmann M, Alavash M, Obleser J. 2019. Alpha oscillations in the human brain implement distractor suppression independent of target selection. Journal of Neuroscience 39. doi:10.1523/JNEUROSCI.1954-19.2019

      Wynn JS, Shen K, Ryan JD. 2019. Eye movements actively reinstate spatiotemporal mnemonic content. Vision (Switzerland) 3. doi:10.3390/vision3020021

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript investigates the role of the membrane-deforming cytoskeletal regulator protein Abba in cortical development and its potential implications for microcephaly. It is a valuable contribution to the understanding of Abba's role in cortical development. The strengths and weaknesses identified in the manuscript are outlined below:

      Clinical Relevance:

      The authors identified a patient with microcephaly and a patient with an intellectual disability harboring a mutation in the Abba variant (R671W) adding a clinically relevant dimension to the study.

      Mechanistic Insights:

      The study offers valuable mechanistic insights into the development of microcephaly by elucidating the role of Abba in radial glial cell proliferation, radial fiber organization, and the migration of neuronal progenitors. The identification of Abba's involvement in the cleavage furrow during cell division, along with its interaction with Nedd9 and positive influence on RhoA activity, adds depth to our understanding of the molecular processes governing cortical development. Though the reported results establish the novel interaction between Abba and Nedd9, the authors have not addressed whether the mutant protein loses this interaction and whether that results in the observed effects.

      We appreciate the reviewer’s observation and fully agree that our study does not provide direct evidence that the phenotypes induced by the R671W mutant are mediated through NEDD9. We sincerely apologize if the manuscript inadvertently conveyed this impression.

      While we show that the interaction with NEDD9 plays a role in the action of ABBA, our findings suggest that NEDD9 and RhoA activation have a minor influence on the phenotypes induced by this mutation, as highlighted by the evidence we presented.

      We would like to point out that we have previously addressed this point in the discussion section of the manuscript. For clarity, below is an excerpt from that section:

      “heterozygous expression of the human R671W variant would exert a dominant negative effect on ABBA's role in brain development, leading to microcephaly and cognitive delay. This notion is supported by recent work disclosing additional patient carrying the R671W variant42. In the same study the significant neurological phenotypes were observed in a drosophila model where the ortholog of human MTSS2 and MTSS1 mim was deleted.   However, from a clinical genetics’ standpoint, it is unlikely to find patients with the recurrent R671W mutation without any homozygous or compound heterozygous loss-of-function mutations elsewhere in the ABBA gene. This could also suggest a gain-of-function effect of the R671W mutation. Supporting this notion, overexpressing ABBA-R671W in cells expressing the wild-type Abba in this study did not result in a dominant-negative decrease in RhoA activation, nor did it affect the expression of PH3 in vivo. These findings make it plausible to suggest that a mechanism responsible for the phenotype associated with overexpression of the human variant may primarily involve post-cell division processes, such as cell migration. “

      We have made corrections to the new version of the manuscript to emphasize this further.

      In Vivo Validation:

      The overexpression of mutant Abba protein (R671W) resulting in phenotypic similarities to Abba knockdown effects supports the significance of Abba in cortical development.

      Reviewer #2 (Public Review):

      Summary:

      Carabalona and colleagues investigated the role of the membrane-deforming cytoskeletal regulator protein Abba (MTSS1L/MTSS2) in cortical development to better understand the mechanisms of abnormal neural stem cell mitosis. The authors used short hairpin RNA targeting Abba20 with a fluorescent reporter coupled with in-utero electroporation of E14 mice to show changes to neural progenitors. They performed flow cytometry for in-depth cell cycle analysis of Abba-shRNA impact on neural progenitors and determined an accumulation in the S phase. Using culture rat glioma cells and live imaging from cortical organotypic slides from mice in utero electroporated with Abba-shRNA, the authors found Abba played a prominent role in cytokinesis. They then used a yeast-two-hybrid screen to identify three high-confidence interactors: Beta-Trcp2, Nedd9, and Otx2. They used immunoprecipitation experiments from E18 cortical tissue coupled with C6 cells to show Abba's requirement for Nedd9 localization to the cleavage furrow/cytokinetic bridge. The authors performed a shRNA knockdown of Nedd9 by in-utero electroporation of E14 mice and observed similar results as with the Abba-shRNA. They tested a human variant of Abba using in-utero electroporation of cDNA and found disorganized radial glial fibers and misplaced, multipolar neurons, but lacked the impact of cell division seen in the shRNA-Abba model.

      Strengths:

      A fundamental question in biology about the mechanics of neural stem cell division.

      Directly connecting effects in Abba protein to downstream regulation of RhoA via Nedd9.

      Incorporation of human mutation in ABBA gene.

      Use of novel technologies in neurodevelopment and imaging.

      Weaknesses:

      Unexplored components of the pathway (such as what neurogenic populations are impacted by Abba mutation) and unleveraged aspects of their data (such as the live imaging) limit the scope of their findings and leave significant questions about the effect of ABBA on radial glia development.

      (1) The claim of disorganized radial glial fibers lacks quantifications.

      On page 11, the authors claim that knockdown of Abba leads to changes in radial glial morphology observed with vimentin staining. Here they claim misoriented apical processes, detached end feet, and decreased number of RGP cells in the VZ. However, they do not provide quantification of process orientation to better support their first claim. Measurements of radial glia fiber morphology (directionality, length) and angle of division would be metrics that can be applied to data.

      In the corrected version of the manuscript, we provide new qualification of changes in dispersion of vimentin immunostaining (Supplementary Figure 1).

      Some of these analyses could be done in their time-lapse microscopy images, such as to quantify the number of cell divisions during their period of analysis (though that is short-15 hours).

      This is indeed a very good idea. We have reanalyzed the recordings to follow cell division. Unfortunately, the number of cells that we were able to follow was low, making statistical analysis of the data unreliable.  As the reviewer alluded in the comment longer recording times than 15h are required to make reliable conclusion. Instead, we have performed live-cell imaging using Aniling-GFP coelectroporeted with RFP as a marker of mitotic progression . We monitored the distribution of cells showing accumulation of Anillin-GFP in control (Scramble) and ABBA-shRNA3 conditions (this data was added to new Supplementary Figure 3). Anillin has been shown to be an efficient tool to monitor cell division in vivo as in particular as it displays accumulation and correlated increase intensity of Anillin-GFP ((Hesse et al Nature Com. 2012, DOI: 10.1038/ncomms2089).

      (2) It is unclear where the effect is:

      -In RG or neuroblasts? Is it in cell cleavage that results in the accumulation of cells at VZ (as sometimes indicated by their data like in Figure 2A or 4D)?

      The data suggest that radial glial (RG) cells are indeed blocked prior to abscission. This phenomenon might contribute to the accumulation of cells at the ventricular zone (VZ), as indicated by observations such as those in Figure 2A and 4D. The interruption in cell cleavage likely prevents the proper progression of division, causing RG cells to remain at the VZ rather than proceeding with their normal differentiation or migration processes. This finding highlights a potential mechanistic link between disrupted abscission and cell accumulation in the VZ.

      Interrogation of cell death (such as by cleaved caspase 3) would also help.  

      Caspase-3 cleavage is widely used as a marker for apoptosis; however, it may not be the most reliable tool for monitoring apoptosis during brain cortical development. The developing brain is a highly dynamic environment where caspase-3 activation can be transient and involved in non-apoptotic processes, such as synaptic pruning and neuronal remodeling. This makes it challenging to distinguish caspase-3 activity associated with apoptosis from its roles in physiological processes.

      In contrast, monitoring overall cell survival provides a more reliable measure of developmental outcomes, as it reflects the net balance of cell death and survival mechanisms. By focusing on cell survival e.g. quantification of number of RGP, we can better assess the functional consequences of apoptosis and its interplay with neurogenesis and other developmental processes.  In line with this we have added more data on the quantification of RGPC as well as their distribution in new Supplementary Figure 3. 

      Given their time-lapse, can they identify what is happening to the RG fiber?

      Both apical and basal endfeet appear to detach and retract prior to radial glial (RG) cell death. This is evident in Figure 1D, as well as from our observation of cellular bodies located far from the ventricular surface (VS), as demonstrated in the new Supplementary Figure 3.

      The authors describe a change in "migration" but do not show evidence for this for either progenitor or neuroblast populations. Given they have nice time-lapse imaging data, could they visualize progenitor versus young neuron migration? Analysis of neuroblasts (such as with doublecortin expression in the tissue) would also help understand any issues in migration (of neurons v stem cells).

      This is an excellent question that arises from the extensive data presented in this study. Addressing it would require repeating a significant portion of the experiments. We fully agree with the reviewer that these are important and obvious questions that warrant a dedicated study to answer them thoroughly. Additionally, we believe that the data showing the accumulation of migrating electroporated cells in the ventricular (V) and subventricular (SV) zones provide compelling evidence of abnormal migration in ABBA-shRNA electroporated cells.

      -At cleavage furrow? In abscission? There is high-resolution data that highlights the cleavage furrow as the location of interest (Figure 3A), however, there is also data (Figure 3B) to suggest Abba is expressed elsewhere as well and there is an overall soma decrease. More detail of the localization of Abba during the division process would be helpful for example, could cleavage furrow proteins, such as Aurora B, co-localization (and potentially co-IP) help delineate subpopulations of Abba protein? Furthermore, the FRET imaging is a unique way to connect their mutation with function - could they measure/quantify differences at furrow compared to the rest of soma to further corroborate that the Abba-associated RhoA effect was furrow-enriched?

      In the corrected version of the manuscript, we include new quantification of RhoA activity in the region corresponding to the cleavage furrow (New Figure 5), This new data show similar results as the previous and indicate that the changes observed are primarily derived from the cleavage furrow region. In the future a detailed dissection of the molecules involved in the mechanism would be highly desirable. These notions are now included in the discussion. 

      -The data highlights nicely that a furrow doesn't clearly form when ABBA expression and subsequent RhoA activity are decreased (in Figure 3 or 5A). Does this lead to cells that can't divide because of poor abscission, especially since "rounding" still occurs? Or abnormal progenitors (with loss of fiber or inability to support neuroblast migration)? Or abnormal progression of progenitors to neuroblasts?

      Our findings, combined with previous results, suggest multiple mechanisms through which ABBA depletion and subsequent Nedd9 and RhoA signaling disruptions could impact progenitor cells and neuroblasts. Below is a detailed response to each question: 

      (1) Do cells fail to divide due to poor abscission?

      Nedd9 is a key regulator of RhoA signaling, which could be essential for cleavage furrow ingression and abscission. Reduced Nedd9 expression may leads to non-activation of RhoA, thereby impairing cleavage furrow ingression. Furthermore, since RhoA deactivation is critical for successful abscission, any disruption in this signaling pathway could compromise the final stages of cytokinesis. While we do not directly observe failed abscission, the impaired furrow formation in Figure 3 and 5A aligns with the hypothesis that some cells may struggle to complete division due to defects in RhoA-mediated abscission. 

      (2) Are abnormal progenitors generated (e.g., loss of fiber or inability to support neuroblast migration)?

      Disrupted Nedd9 expression not only affects cell cycle progression but also influences the structural integrity of radial glial progenitors (RGPs). RGPs with impaired cleavage furrow ingression may exhibit detachment of apical and basal endfeet (Supplementary Figure 3), leading to abnormalities in their scaffold function. This structural disruption likely contributes to the accumulation of electroporated cells in the ventricular (V) and subventricular (SV) zones (Figure 5A), supporting the idea that abnormal progenitors fail to support proper neuroblast migration. 

      (3) Is there abnormal progression of progenitors to neuroblasts?

      Given that Nedd9 triggers cells to enter mitosis, its impaired function may prevent progenitors from properly progressing through the cell cycle, causing cell cycle arrest and eventual decrease survival. This would directly impact the ability of progenitors to transition into neuroblasts. Moreover, the abnormal membrane composition and PI(4,5)P2 enrichment we hypothesize during cytokinesis could disrupt ABBA recruitment and its interaction with Nedd9. This disruption would impair RhoA activation, further compromising the progression of progenitors to neuroblasts. 

      In conclusion, our findings suggest that impaired ABBA expression disrupts Nedd9 and RhoA signaling, leading to poor cleavage furrow ingression, abnormal progenitor structure, and defective neuroblast migration. These processes collectively contribute to developmental defects in the cortex. Future studies focusing on live imaging of cytokinesis and cell fate mapping will help elucidate better these mechanisms further.

      (3) Limited to a singular time point of mouse cortical development

      On page 13, the authors outline the results of their Y2H screen with the identification of three high-confidence interactors. Notably, they used an E10.5-E12.5 mouse brain embryo library rather than one that includes E14, the age of their in-utero electroporation mice. Many of the authors' claims focus on in-utero electroporation of shRNA-Abba of E14 mice that are then evaluated at E16-18. Justification for the focus on this age range should be included to support that their findings can then be applied to all mouse corticogenesis.

      We thank the reviewer to point this out. Indeed, the data suggest that the interaction between ABBA and Nedd9 occurs before E14. The reason to address the questions at E14 is that in earlier work, we have shown that ABBA is mainly expressed through E10.5-12.5 in the floorplate structure formed by radial glia. The radial glia-specific expression was confirmed through double staining with radial glial (RC2) and neuronal (Tuj1) markers at E12.5 (see Saarikangas et al. J. Cell Sci. 121:1444-1454, 2008). Thus, we consider the Y2H library relevant for identifying ABBA's interactors within radial glia. We have specified this better in the corrected manuscript.

      (4) Detail of the effect of the human variant of the ABBA mutation in mice is lacking.

      Their identification of the R671W mutation is interesting and the IUE model warrants more characterization, as they did with their original KD experiments.

      We have now included addition data in the corrected manuscript showing R671W dependent changes in INM (Supplementary Figure 3 )

      Could they show that Abba protein levels are decreased (in either cell lines or electroporated tissue)?

      Estimation of ABBA expression in cell expressing ABBA R671W as in Supplemental Figure 5 did not show significant change.

      -While time-lapse morphology might not have been performed, more analysis on cell division phenotype (such as plane of division and radial glia morphology) would be helpful. 

      This would be indeed very informative, but we were not able to perform these analysis in the existing dataset.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Here are some suggestions for targeting some of the weaknesses by additional experiments:

      Regional Demarcation in Radial Glial Cell Population:

      While the authors demonstrate a decrease in overall RFP-positive cells in response to Abba knockdown, the distinction between different regions should be demarcated using cortical layer-specific markers (e.g., CUX1/BRN2 for the upper layer and CTIP2/FOXP2). Quantification based on regional markers would enhance accuracy and meaningful interpretation.

      In order to harmonize the quantification during the different developmental stages we have used a broader definition of the cortical regions that may not be entirely fitting with the regions identified with the staining of Cux1 and CTIP2. We have now however included in the supplementary figure 1 with the staining for Cux1 and CTIP2 showing the corresponding regions defined in the manuscript. Supplementary Figure 1.

      Mitotic Stage Marker and BrdU Staining:<br /> The discrepancy between no changes in staining with the mitotic stage marker PH3 and a reported decrease in Ki67 staining calls for further clarification. Additionally, the use of BrdU staining could distinguish the effects on dividing cells after Abba knockdown. The authors are encouraged to explore these aspects further, including their applicability to NEDD9 knockdown and Abba mutant overexpression.

      As suggested by the reviewer elsewhere, we made use of life imaging. We monitored the distribution of cells showing accumulation of Anillin-GFP in control (Scramble) and ABBA-shRNA3 conditions (this data has been added to the new Supplementary Figure 3). Anillin has been shown to be an efficient tool for monitoring cell cycle stages in vivo (Hesse et al Nature Com. 2012, DOI: 10.1038/ncomms2089). Interestingly, we observed an increase in cells displaying accumulated Anillin in ABBA-shRNA3 treated cells, which is consistent with an arrest of progression of mitosis.  

      Quantification of Cytokinesis Effects:

      The brain slices illustrating the effects of Abba knockdown on cytokinesis would benefit from a quantification depicting changes in interkinetic nuclear migration and the number of successful mitosis events. This would enhance the clarity and interpretation of the observed effects.

      In the revised manuscript we have included new data in Supplementary Figure 3 were we report the quantification of the distance of the RGC from the ventricle to address the reviewer’s comments. We were not entirely sure about comment about quantification of successful mitosis events, but as specified above, we have included new data from the monitoring of anillin. We hope to perform more detailed experiments and analysis in future studies. 

      Loss of Interaction and NEDD9 Localization:

      The manuscript lacks an exploration of the loss or decrease in interaction between Abba and NEDD9 in the case of the pathogenic patient-derived mutation in Abba. Addressing this aspect is crucial, as it may shed light on the underlying causes of the observed effects. Furthermore, investigating changes in NEDD9 localization following overexpression of the Abba mutant would provide additional insights.

      We fully agree with the reviewer’s comment. Unfortunately the anti NEDD9 antibody had a poor performance in slice immunohistochemistry, which hampered further reliable investigation of expression and distribution changes in vivo. Resolving this issue and providing a more detailed characterization of the mechanism of Abba-NEDD9 interaction will be important in future studies.

      Overall, I believe that with minor revisions and additional contextualization, the manuscript has the potential to make a significant contribution to the field. I recommend acceptance pending the incorporation of the suggested revisions.

      Reviewer #2 (Recommendations For The Authors):

      The manuscript is generally well-organized. We hope that given their nice experimental systems, many of the comments and questions can be addressed with their data already on hand.

      Minor Comments

      • For Figure 6E A closeup of the vimentin would be helpful - hard to visualize radial glia morphology at the current magnification.

      This has been corrected in the new version of the manuscript

      • For the in utero electroporation what was their rationale for 2-4 day interval before evaluation? For example, waiting for more cortical plate development to be able to manifest long-term effects.

      We observed a massive cell death at E18, in only few of those brains we were able to still observe RFP cells. We have also tried P6 animals but none of them had significant reminding electroporated cells that’s why we have decided to focus at E17, 3 days after the electroporation to have still enough expression of the shRNA.

      • Figure 4E-F lacks images of controls for comparison of effect.

      This has been corrected in the revised version of the manuscript

    1. Author response:

      Reviewer #1:

      The manuscript Xu et al. explores the regulation of the microtubule minus end protein CAMSAP2 localization to the Golgi by the Serine/threonine-protein kinase MARK2 (PAR1, PAR1B). The authors utilize immunofluorescence and biochemical approaches to demonstrate that MARK2 is localized at the Golgi apparatus via its spacer domain. They show that depletion of this protein alters Golgi morphology and diminishes CAMSAP2 localization to the Golgi apparatus. The authors combine mass spectroscopy and immunoprecipitation to show that CAMSAP2 is phosphorylated at S835 by MARK2, and that this phosphorylation regulates localization of CAMSAP2 at Golgi membranes. Further, the authors identify USO1 (p115) as the Golgi resident protein mediating CAMSAP2 recruitment to the Golgi apparatus following S835 phosphorylation. The authors would need to address the following queries to support their conclusions.

      We sincerely thank the reviewer for their valuable time and effort in evaluating our manuscript. We deeply appreciate the constructive feedback and insightful suggestions, which have been instrumental in improving the quality and clarity of our study. We have carefully considered all the comments and have made the necessary revisions to address the concerns raised.

      Major Comments 

      (1) Dynamic localization of CAMSAP2 during Golgi reorientation

      - The authors use fixed wound edges assays and co-localization analysis to describe changes in CAMSAP2 positioning during Golgi reorientation in response to polarizing cues (a free wound edge in this case). In Figure 1C, they present a graphical representation of quantified immunofluorescence images, using color coding to to describe the three states of Golgi reorientation in response to a wound (green, blue, red indicating non-polarised, partial and complete Golgi reorientation, respectively). They then use these 'colour coded' classifications to quantitate CAMSAP2/GM130 co-localization.It is unclear why the authors have not just used representative immunofluorescence images in the main figures. Transparent, color overlays could be placed over the cells in the representative images to indicate which of the three described states each cell is currently exhibiting. However, for clarity, I would recommend changing the color coded 'states' to a descriptor rather than a color. i.e. Figure 1D x axis labels should be 'complete' and 'partial', instead of 'red' and 'blue'. 

      Thank you for this insightful suggestion. We have added representative immunofluorescence images with transparent color overlay to indicate the three Golgi orientation states. These images are included in Supplementary Figure 2B-C, providing a clear visual reference for the quantitative data. Additionally, we have revised the x-axis labels in Figure 1E from "Red" and "Blue" to "Complete" and "Partial" to ensure clarity and consistency with the descriptive terminology in the text. These changes are described in the Results section (page 7, lines 15-19) and the figure legend (page 29, lines 27-29).

      We believe these updates improve the clarity and accessibility of our figures and hope they address the reviewer’s concerns.

      - note- figure 2 F-G, is semi quantitative, why did the authors not just measure Golgi angle using the nucleus and Golgi distribution?

      We appreciate the reviewer’s comment on this point. Following the recommendation, we have performed an additional analysis measuring Golgi orientation angles based on the nucleus-Golgi distribution. This quantitative approach complements our initial semi-quantitative analysis and provides a more precise assessment of Golgi orientation during cell migration.

      The new data have been incorporated into Supplementary Figure 1F-H. These results clearly demonstrate the consistency between the quantitative and semi-quantitative methods, further validating our findings and highlighting the dynamic changes in Golgi orientation during cell migration. These changes are described in the Results section (page 6, lines 24-31).

      - While it is established that the Golgi is dispersed during reorientation in wound edge migration, the Golgi apparatus also becomes dispersed/less condensed prior to cell division. As the authors have used fixed images - how are they sure that the Golgi morphology or CAMSAP2 localization in 'blue cells' are indicative of Golgi reorientation and not division? Live imaging of cells expressing CAMSAP2, and an additional Golgi marker could be used to demonstrate that the described changes in Golgi morphology and CAMSAP2 localization are occurring during the rear-to-front transition of the Golgi.

      Thank you for raising this important question. To address this concern, we carefully examined the nuclear morphology of dispersed Golgi cells and found no evidence of mitotic features, indicating that these cells are not undergoing division (Figure 1A, Supplemental Figure 2A). Furthermore, during the scratch wound assay, we use 2% serum to culture the cells, which helps minimize the impact of cell division. This analysis has been added to the Results section (page7, lines 19-22 in the revised manuscript).

      Additionally, we conducted live-cell imaging, as suggested, using cells expressing a Golgi marker. This approach confirmed that Golgi dispersion occurs transiently during reorientation in cell migration. The new live-cell imaging data have been incorporated into Supplementary Figure 2A, and the corresponding description has been updated in the Results section (page 7, lines 2-5).

      Finally, considering that overexpression of CAMSAP2 can lead to artifactually condensed Golgi structures, we used endogenous staining to observe CAMSAP2 localization at different stages of migration. These observations provide a clearer understanding of CAMSAP2 dynamics during Golgi reorientation and are now presented in revised Figure 1A-B. This information has been described in the Results section (page 7, lines 5-10).

      We hope these additions and clarifications address the reviewer’s concerns. Once again, we are deeply grateful for this constructive feedback, which has greatly improved the robustness of our study.

      (2) MARK2 localization to the Golgi apparatus

      - The authors investigated the positioning of endogenous MARK2 via immunofluorescence staining, and exogenous flag-tagged MARK2 in a KO background. The description of the protocol required to visualize Golgi localization of MARK2 is inconsistent between the results and methods text. The results text reads as through the 2% serum incubation occurs as a blocking step following fixation. Conversely, the methods section describes the 2% serum incubation as occurring just prior to fixation as a form of serum starvation. The authors need to clarify which of these protocols is correct. Further, whilst I can appreciate that the mechanistic understanding of why serum starvation is required for MARK2 Golgi localization is beyond the scope of the current work, the authors should at a minimum speculate in the discussion as to why they think it might occur.

      We sincerely thank the reviewer for the constructive feedback on the localization of MARK2 at the Golgi. Due to the complexity and variability of this phenomenon, we decided to remove the related data from the current manuscript to maintain the rigor of our study. However, we have included a discussion of this phenomenon in the Discussion section (page 13, lines 31-39 and page 14, 1-6in the revised manuscript) and plan to further investigate it in future studies.

      The localization of MARK2 at the Golgi was initially observed in experiments following serum starvation, where cells were fixed and stained (The data is not displayed). This observation was supported by the loss of Golgi localization in MARK2 knockdown cells, indicating the specificity of the antibody (The data is not displayed). However, this phenomenon was not consistently observed across all cells, likely due to its transient nature.We speculate that the localization of MARK2 to the Golgi depends on its activity and post-translational modifications. For example, phosphorylation at T595 has been reported to regulate the translocation of MARK2 from the plasma membrane to the cytoplasm (Hurov et al., 2004). Serum starvation might induce modifications or conformational changes in MARK2, leading to its temporary Golgi localization. Additionally, we hypothesize that this localization may coincide with specific Golgi dynamics, such as the transition from dispersed to ribbon-like structures during cell migration.

      We also acknowledge the inconsistency in the Results and Methods sections regarding serum starvation. We confirm that serum starvation was performed prior to fixation as an experimental condition, rather than as a blocking step in immunostaining. This clarification has been incorporated into the revised Methods section (page 24, lines 11-12).

      We hope this clarification, along with our planned future studies, adequately addresses the reviewer’s concerns. Once again, we deeply appreciate the reviewer’s valuable comments, which have provided important insights for our ongoing work. References:

      Hurov, J.B., Watkins, J.L., and Piwnica-Worms, H. (2004). Atypical PKC phosphorylates PAR-1 kinases to regulate localization and activity. Curr Biol 14 (8): 736-741.

      - The authors should strengthen their findings by using validated tools/methods consistent with previous publications. i.e. Waterman lab has published two MARK2 constructs- Apple and eGFP tagged versions (doi.org/10.1016/j.cub.2022.04.088), and the localization of MARK2 in U2Os cells (using the same antibody (Anti- MARK2 C-terminal, ABCAM Cat# ab136872). The authors should (1) image the cells live using eGFP-tagged MARK2 during serum starvation to show the dynamics of this localization, (2) image U2Os cells using the abcam ab136872 antibody +/- 2% serum starve. Two MARK2 antibodies are listed in Table 2. Does abcam (ab133724) show a similar localisation?

      - The Golgi localization of MARK2 occurs in the absence of the T structural domain, but not when full length MARK2 is expressed. The authors conclude the T- domain is likely inhibitory. When combined with the requirement for serum starvation for this interaction to occur, the authors should clarify the physiological relevance of these observations.

      We sincerely thank the reviewer for their valuable suggestions regarding the use of tools and methods and the physiological relevance of MARK2 localization to the Golgi. Regarding the question of how MARK2 itself localizes to the Golgi, we are currently unable to fully elucidate the underlying mechanism. Therefore, we have removed the discussion of MARK2’s Golgi localization from the manuscript to ensure scientific accuracy. However, Below, we provide our detailed response as soon as possible:

      First, regarding the suggestion to use tools and methods developed by the Waterman lab to strengthen our findings, we have carefully evaluated their applicability. In our live-cell imaging experiments, we found that full-length MARK2 does not stably localize to the Golgi, even under serum starvation conditions. However, truncated MARK2 mutants lacking the Tail (T) domain exhibit robust Golgi localization. Furthermore, our immunofluorescence staining results indicate that the Spacer domain is the minimal region required for MARK2 localization at the Golgi. Based on these findings, we believe that live-cell imaging of EGFP-tagged full-length MARK2 may not effectively reveal the dynamics of its Golgi localization. However, we plan to focus on the truncated constructs in future studies to better explore the mechanisms underlying MARK2's dynamic behavior. 

      Regarding the use of the ab136872 antibody to stain U2OS cells with and without serum starvation, we note that the protocol described by the Waterman lab involves pre-fixation and permeabilization steps, which are not compatible with live-cell imaging. Additionally, we observed that MARK2 Golgi localization appears to be condition-dependent and may coincide with specific Golgi dynamics, such as transitions from dispersed stacks to intact ribbon structures. These events are likely brief and challenging to capture consistently. Nevertheless, we recognize the value of this experimental design and plan to adapt the staining conditions in future work to validate our results further. As for the ab133724 antibody listed in Table 2, we clarify that it has only been validated for Western blotting in our study and does not yield reliable results in immunofluorescence experiments. For this reason, all immunofluorescence staining in this study relied exclusively on ab136872. This distinction has been clarified in the revised Table 2 .

      Regarding the hypothesis that the Tail domain of MARK2 is inhibitory, our observations showed that truncated MARK2 mutants lacking the T domain stably localized to the Golgi, whereas fulllength MARK2 did not. Literature evidence supports this hypothesis, as studies on the yeast homolog Kin2 indicate that the C-terminal region (including the Tail domain) binds to the Nterminal catalytic domain to inhibit kinase activity (Elbert et al., 2005). We speculate that serum starvation disrupts this intramolecular interaction, relieving the inhibition by the T domain, activating MARK2, and promoting its localization to the Golgi. Moreover, we hypothesize that the transient nature of MARK2 localization to the Golgi may be related to specific Golgi remodeling processes, such as the transition from dispersed stacks to intact ribbon structures during cell migration or polarity establishment. 

      References:

      Elbert, M., Rossi, G., and Brennwald, P. (2005). The yeast par-1 homologs kin1 and kin2 show genetic and physical interactions with components of the exocytic machinery. Mol Biol Cell 16 (2): 532-549.

      (3) Phosphorylation of CAMSAP2 by MARK2

      - The authors examined the effects of MARK2 phosphorylation of CAMSAP2 on Golgi architecture through expression of WT-CAMSAP2 and two CAMSAP2 S835 mutants in CAMSAP2 KO cells. They find that CAMSAP2 S835A (non-phosphorylatable) was less capable of rescuing Golgi morphology than CAMSAP2 S835D (phosphomimetic). Golgi area has been measured to demonstrate this phenomenon. Representative immunofluorescence images in Fig. 4D appear to indicate that this is the case. However, quantification in Fig. 4E does not show significance between HA-CAMSAP2 and HA-CAMSAP2A that would support the initial claim. The authors could analyze other aspects of Golgi morphology (e.g. number of Golgi fragments, degree of dispersal around the nucleus) to capture the clear structural defects demonstrated in HACAMSAP2A cells.

      We sincerely thank the reviewer for their valuable feedback and for pointing out potential areas of improvement in our analysis of Golgi morphology. We apologize for any misunderstanding caused by our description of the results in Figure 4E.

      The quantification indeed shows a significant difference between HA-CAMSAP2 and HACAMSAP2A in terms of Golgi area, as indicated in the figure by the statistical annotations (pvalue provided in the legend). To ensure clarity, we have revised the figure legend (page 32, lines 19-23 in the revised manuscript) to explicitly describe the statistical significance, and the method used for quantification.

      Because the quantification indeed shows a significant difference between HA-CAMSAP2 and HA-CAMSAP2A in terms of Golgi area, and to maintain consistency throughout the manuscript, we did not further analyze other aspects of Golgi morphology.

      We hope this clarification, along with the additional analyses, will address the reviewer’s concerns. Once again, we are deeply grateful for these constructive comments, which have helped us improve the quality and robustness of our study.

      - Wound edge assays are used to capture the difference in Golgi reorientation towards the leading edge between CAMSAP2 S835A and CAMSAP2 S835D. However, these studies lack comparison to WT-CAMSAP2 that would support the role of phosphorylated CAMSAP2 in reorienting the Golgi in this context.

      We sincerely thank the reviewer for their insightful suggestion. In response, we have added a comparison between CAMSAP2 S835A/D and WT-CAMSAP2, in addition to HT1080 and MARK2 KO cells, to better evaluate the role of phosphorylated CAMSAP2 in Golgi reorientation.

      The results, now shown in Figure 5A-C, indicate that in the absence of MARK2, there is no significant difference in Golgi reorientation between WT-CAMSAP2 and CAMSAP2 S835A. This observation supports the conclusion that MARK2-mediated phosphorylation of CAMSAP2 at S835 is essential for effective Golgi reorientation.

      To enhance clarity, we have updated the corresponding Results section (page 9, lines 37-40 and page 10, line 1 in the revised manuscript) to describe this additional comparison. We believe this analysis strengthens our findings and provides a clearer understanding of the role of phosphorylated CAMSAP2 in Golgi dynamics.

      We hope this additional data addresses the reviewer’s concerns. Once again, we are grateful for the constructive feedback, which has helped improve the clarity and robustness of our study.

      (4) Identification of CAMSAP2 interaction partners

      - Quantification of interaction ability between CAMSAP2 and CG-NAP, CLASP2, or USO1 in Fig. 5D, 5F and 5J respectively, lack WT-CAMSAP2 comparisons.

      We sincerely thank the reviewer for their valuable suggestion. In response, we have included WT-CAMSAP2 data in the quantification of interaction ability between CAMSAP2 and CG-NAP, CLASP2, and USO1. These results, now shown in revised Figures 5 D-G and Figures 6 C-D, provide a direct comparison that further validates the differential interaction abilities of CAMSAP2 mutants.

      The inclusion of WT-CAMSAP2 allows us to better contextualize the effects of specific mutations on CAMSAP2 interactions and strengthens our conclusions regarding the role of these interactions in Golgi dynamics.

      We hope this addition addresses the reviewer’s concerns and enhances the clarity and robustness of our study. We deeply appreciate the constructive feedback, which has been instrumental in improving our manuscript.

      - The CG-NAP immunoblot presented in Fig. 5C shows that the protein is 310 kDa, which is the incorrect molecular weight. CG-NAP (AKAP450) should appear at around 450 kDa. Further, no CG-NAP antibody is included in Table 2 - Information of Antibodies. The authors need to explain this discrepancy.

      We sincerely apologize for the lack of clarity in our annotation and description, which may have caused confusion regarding the CG-NAP immunoblot presented in Figure 5C (Figure 5D in the revised manuscript). To clarify, CG-NAP (AKAP450) is indeed a 450 kDa protein, and the marker at 310 kDa represents the molecular weight marker’s upper limit, above which CG-NAP is observed. This has been clarified in the figure legend (page 33, lines 21-23 in the revised manuscript).

      Regarding the CG-NAP antibody, it was custom-made and purified in our laboratory. Polyclonal antisera against CG-NAP, designated as αEE, were generated by immunizing rabbits with GSTfused fragments of CG-NAP (aa 423–542). This antibody has been validated extensively in our previous research, demonstrating its specificity and reliability (Wang et al., 2017). The details of the antibody preparation are included in the footnote of Table 2 for reference.

      We hope this clarification, along with the additional context regarding the antibody validation, resolves the reviewer’s concerns. We are deeply grateful for the reviewer’s attention to detail, which has helped us improve the clarity and rigor of our manuscript.

      References:

      Wang, J., Xu, H., Jiang, Y., Takahashi, M., Takeichi, M., and Meng, W. (2017). CAMSAP3dependent microtubule dynamics regulates Golgi assembly in epithelial cells. Journal of genetics and genomics = Yi chuan xue bao 44 (1): 39-49.

      Minor Comments

      - Authors should change immunofluorescence images to colorblind friendly colors. The current presentation of merged overlays makes it really difficult to interpret- I would strongly encourage inverted or at a minimum greyscale individual images of key proteins of interest.

      We sincerely thank the reviewer for their valuable suggestion regarding the presentation of immunofluorescence images. In response, we have converted the images in Figure 1C to greyscale individual images for each key protein of interest. This adjustment ensures that the figures are more accessible and interpretable, including for readers with color vision deficiencies.

      We hope this modification addresses the reviewer’s concern and improves the clarity of our data presentation. We are grateful for the constructive feedback, which has helped us enhance the overall quality of our figures.

      - On p. 8 text should be amended to 'Previous literature has documented MARK2's localization to the microtubules, microtubule-organizing center (MTOC), focal adhesions..'

      We sincerely thank the reviewer for their comment regarding the text on page 8. Considering the reasoning provided in response to question 2, where we clarified that MARK2's Golgi localization is not fully understood, we have decided to remove this section from the manuscript to maintain the accuracy and rigor of our study.

      We appreciate the reviewer’s attention to detail and constructive feedback, which has helped us improve the clarity and focus of our manuscript. 

      - In Fig.1A scale bars are not shown on individual channel images of CAMSAP or GM130

      We sincerely thank the reviewer for pointing out the omission of scale bars in the individual channel images of CAMSAP and GM130 in Figure 1A (Figure 1C in the revised manuscript). In response, we have added a scale bar (5 μm) to the CAMSAP2 channel, as shown in the revised Figure 1C. These updates have been described in the figure legend (page 29, line 21).

      We hope this modification addresses the reviewer’s concern and improves the accuracy and clarity of our figure presentation. We greatly appreciate the reviewer’s constructive feedback, which has helped enhance the quality of our manuscript.

      - In Fig. 1B the title should be amended to 'Colocalization of CAMSAP2/GM130'

      We sincerely thank the reviewer for their suggestion to amend the title in Figure 1B (Figure 1D in the revised manuscript). In response, we have updated the title to "Colocalization of CAMSAP2/GM130," as shown in the revised Figure 1D.

      We hope this modification addresses the reviewer’s concern and improves the clarity and accuracy of the figure. We greatly appreciate the reviewer’s valuable feedback, which has helped us refine the presentation of our results.

      - In Fig. 2F, 5A, and Sup Fig 3C scale bars have been presented vertically

      We sincerely thank the reviewer for pointing out the issue with the vertical orientation of scale bars in Figures 2F (Figure 2D in the revised manuscript), 5A, and Supplementary Figure 3C. In response, we have modified the scale bars in revised Figures 2D and 5A to a horizontal orientation for improved consistency and clarity. Additionally, Supplementary Figure 3C has been removed from the revised manuscript.

      We hope these adjustments address the reviewer’s concerns and enhance the overall presentation quality of the figures. We greatly appreciate the reviewer’s constructive feedback, which has helped us refine our manuscript.

      - Panels are not correctly aligned, and images are not evenly spaced or sized in multiple figures - Fig. 2F, 4D, Sup Fig. 1F, Sup Fig. 2C, Sup Fig. 3E, Sup Fig. 4C

      We sincerely thank the reviewer for pointing out the misalignment and uneven spacing or sizing of panels in multiple figures, including Figures 2F, 4D, Supplementary Figures 1F, 2C, 3E, and 4C (Figure 2D, 4D, Supplementary Figures 1F, 2C, and 3H in the revised manuscript.

      Supplementary Figure 3E was removed from our manuscript). In response, we have standardized the spacing and sizing of all panels throughout the manuscript to ensure consistency and improve visual clarity.

      We hope this modification addresses the reviewer’s concerns and enhances the overall presentation quality of our figures. We greatly appreciate the reviewer’s constructive feedback, which has helped us improve the organization and professionalism of our manuscript.

      - An uncolored additional data point is present in Fig. 3F

      We sincerely thank the reviewer for pointing out the presence of an uncolored additional data point in Figure 3F. In response, we have removed this data point from the revised figure to ensure accuracy and clarity.

      We hope this adjustment resolves the reviewer’s concern and improves the overall quality of the figure. We greatly appreciate the reviewer’s careful review and constructive feedback, which have helped us refine our manuscript.

      - In Fig. 3A 'GAMSAP2/GM130' in the vertical axis label should be amended to 'CAMSAP2/GM130'

      We sincerely thank the reviewer for pointing out the error in the vertical axis label of Figure 3A. In response, we have corrected "GAMSAP2/GM130" to "CAMSAP2/GM130," as shown in the revised Figure 3I.

      We hope this correction resolves the reviewer’s concern and improves the accuracy of our figure. We greatly appreciate the reviewer’s careful review and constructive feedback, which have helped us refine our manuscript.

      - In Fig 5A the green label should be amended to 'GFP-CAMSAP2' instead of 'GFP'

      We sincerely apologize for the confusion caused by our labeling in Figure 5A. To clarify, the green label “GFP” refers to the antibody used, while “GFP-CAMSAP2” is indicated at the top of the figure to specify the construct being analyzed.

      We hope this explanation resolves the misunderstanding and provides clarity regarding the labeling in Figure 5A. We greatly appreciate the reviewer’s feedback, which has allowed us to address this issue and improve the precision of our figure annotations.

      - The repeated use of contractions throughout the manuscript was distracting, I would strongly encourage removing these.

      We sincerely thank the reviewer for pointing out the distracting use of contractions in the manuscript. In response, we have removed and replaced all contractions with their full forms to improve the clarity and formal tone of the text.

      We hope this modification addresses the reviewer’s concern and enhances the readability and professionalism of our manuscript. We greatly appreciate the reviewer’s constructive feedback, which has helped us refine the quality of our writing.

      Reviewer #2: 

      Summary  

      This work by the Meng lab investigates the role of the proteins MARK2 and CAMSAP2 in the Golgi reorientation during cell polarisation and migration. They identified that both proteins interact together and that MARK2 phosphorylates CAMSAP2 on the residue S835. They show that the phosphorylation affects the localisation of CAMSAP2 at the Golgi apparatus and in turn influences the Golgi structure itself. Using the TurboID experimental approach, the author identified the USO1 protein as a protein that binds differentially to CAMSAP2 when it is itself phosphorylated at residue 835. Dissecting the molecular mechanisms controlling Golgi polarisation during cell migration is a highly complex but fundamental issue in cell biology and the author may have identified one important key step in this process. However, although the authors have made a genuine iconographic effort to help the reader understand their point of view, the data presented in this study appear sometimes fragile, lacking rigour in the analysis or over-interpreted. Additional analyses need to be conducted to strengthen this study and elevate it to the level it deserves.

      We sincerely thank the reviewer for their thoughtful evaluation and recognition of our study's significance in understanding Golgi reorientation during cell migration. We appreciate the constructive feedback regarding data robustness, clarity, and interpretation. In response, we have conducted additional analyses, revised data presentation, and ensured cautious interpretation throughout the manuscript. These changes aim to address the reviewer’s concerns comprehensively and strengthen the scientific rigor of our study.

      Major comments

      In order to conclude as they do about the putative role of USO1, the authors need to perform a siRNA/CRISPR of USO1 to validate its role in anchoring CAMSAP2 to the Golgi apparatus in a MARK2 phosphorylation-dependent manner. In other words, does depletion of USO1 affect the recruitment of CAMSAP2 to the Golgi apparatus?

      We sincerely thank the reviewer for their insightful suggestion regarding the role of USO1 in anchoring CAMSAP2 to the Golgi apparatus. In response, we performed USO1 knockdown using siRNA and quantified the Pearson correlation coefficient of CAMSAP2 and GM130 colocalization in control and USO1-knockdown cells.

      The results show that CAMSAP2 localization to the Golgi is significantly reduced in USO1knockdown cells, confirming that USO1 plays a critical role in recruiting CAMSAP2 to the Golgi apparatus. These results are now presented in Figures 6 E–G, and corresponding updates have been incorporated into the Results section (page 10, lines 36-37 in the revised manuscript).

      We hope this additional experiment addresses the reviewer’s concern and strengthens our conclusions regarding the role of USO1. We are grateful for the reviewer’s constructive feedback, which has greatly improved the robustness of our study.  

      It is not clear from this study exactly when and where MARK2 phosphorylates CAMSAP2. What is the result of overexpression of the two proteins in their respective localisation to the Golgi apparatus? As binding between CAMSAP2 and MARK2 appears robust in the immunoprecipitation assay, this should be readily investigated. 

      We sincerely thank the reviewer for their insightful comments and questions. To address the role of MARK2 in regulating CAMSAP2 localization to the Golgi apparatus, we overexpressed GFPMARK2 in cells and compared its effects on CAMSAP2 localization to the Golgi with control cells overexpressing GFP alone. Our results show that CAMSAP2 localization to the Golgi is significantly increased in GFP-MARK2-overexpressing cells, as shown in Supplementary Figures 3C and 3E. Corresponding updates have been incorporated into the Results section (page 8, lines 25-27 in the revised manuscript).

      Regarding the question of how MARK2 itself localizes to the Golgi, we are currently unable to fully elucidate the underlying mechanism. Therefore, we have removed the discussion of MARK2’s Golgi localization from the manuscript to ensure scientific accuracy. Consequently, we have not conducted experiments to assess the effects of CAMSAP2 overexpression on MARK2’s localization to the Golgi.

      We hope this explanation clarifies the reviewer’s concerns. We are grateful for the reviewer’s constructive feedback, which has guided us in improving the clarity and focus of our study.

      To strengthen their results, can the author map the interaction domains between CAMSAP2 and MARK2? The authors have at their disposal all the constructs necessary for this dissection.

      We sincerely thank the reviewer for their insightful suggestion to map the interaction domains between CAMSAP2 and MARK2. In response, we performed immunoprecipitation experiments using truncated constructs of CAMSAP2. Our results reveal that MARK2 interacts specifically with the C-terminus (1149F) of CAMSAP2, as shown in Supplementary Figures 3A and 3B. Corresponding updates have been incorporated into the Results section (page 7, lines 41-42 and page 8, line 1 in the revised manuscript).

      We hope this additional analysis addresses the reviewer’s suggestion and further strengthens our conclusions. We greatly appreciate the reviewer’s constructive feedback, which has helped improve the depth of our study.

      Minor comments

      Sup-fig1  

      H: It is not clear if the polarisation experiment has been repeated three times (as it should) and pooled or is just the result of one experiment?

      We sincerely apologize for the lack of clarity regarding the experimental details for Supplementary Figure 1H. To clarify, the polarization experiment was repeated three times, and the results were pooled to generate the data presented. We have updated the figure legend for Supplementary Figure 1H to explicitly state this information (page 35, lines 27-29 in the revised manuscript).

      We hope this clarification resolves the reviewer’s concern. We greatly appreciate the reviewer’s careful review and constructive feedback, which have helped us improve the accuracy and transparency of our manuscript.

      Sup-fig2  

      C: "Immunofluorescence staining plots" formula used in the legend is not clear. Which condition is presented in the panel, parental HT1080 or CAMSAP2 KO cells?  

      We thank the reviewer for pointing out the lack of clarity regarding the conditions presented in Supplementary Figure 2C. To clarify, the immunofluorescence staining plots shown in this panel are from parental HT1080 cells. We have updated the figure legend to include this information (page 36, line 14 in the revised manuscript).

      We hope this clarification resolves the reviewer’s concern and improves the transparency of our data presentation. We greatly appreciate the reviewer’s feedback, which has helped us refine the manuscript.

      Figure 1  

      D: In the plot, the colour of the points for the "red cells" are red but the one for the "blue cells" are green, this is confusing.

      E: Once again, the colour choice is confusing as blue cells (t=0.5h) are quantified using red dots and red cells (t=2h) quantified using green dots. The t=0h condition should be quantified as well and added to the graph.  

      F: Representative CAMSAP2 immunofluorescence pictures for the three time points should be provided in addition to the drawings.  

      We thank the reviewer for their valuable comments regarding Figure 1D (revised Figure 1E), Figure 1E (revised Figure 1B), and Figure 1F (revised Supplementary Figure 2C).

      - Figure 1D (revised Figure 1E): we have modified the x-axis labels and adjusted the color scheme of the data points to ensure consistency and avoid confusion.

      - Figure 1E (revised Figure 1B): we have updated the x-axis and included the quantification of the t=0h condition, which has been added to the graph.

      - Figure 1F (revised Supplementary Figure 2C): we have provided representative immunofluorescence images of CAMSAP2 for the three-time points to complement the schematic drawings.

      We hope these revisions address the reviewer’s concerns and improve the clarity and completeness of our data presentation. We greatly appreciate the reviewer’s constructive feedback, which has significantly contributed to enhancing our manuscript.

      Figure 2  

      A: No methodology in the material and methods is provided for this analysis.  

      B: Can the authors be more precise regarding the source of the CAMSAP2 interactants? Can the author provide the citation of the publication describing the CAMSAP2-MARK2 interaction?  

      D: Genotyping for the MARK2 KO cell line should be provided the same way it was provided for the CAMSAP2 cell line in Sup-fig1. "MARK2 was enriched around the Golgi apparatus in a  significant proportion of HT1080 cells": which proportion of the cells?  

      F: The time point of fixation is missing  

      G: It is not clear if the polarisation experiment has been repeated three times (as it should) and pooled or is just the result of one experiment?  

      We thank the reviewer for their detailed comments and suggestions regarding Figure 2. Below, we provide clarifications and outline the modifications made:

      - Figure 2A: The methodology for this analysis has been added to section 5.14 (Data statistics). Specifically, we have stated: “GO analysis of proteins was plotted using https://www.bioinformatics.com.cn, an online platform for data analysis and visualization” (page 26 lines 5-6 in the revised manuscript).

      - Figure 2B: The CAMSAP2 interactants were derived from the study by Wu et al., 2016, which provides the source of these interactants. The interaction between CAMSAP2 and MARK2 is referenced from Zhou et al., 2020. These citations have been added to the relevant sections of the manuscript (page 30, lines 10-11 and 13-14).

      - Figure 2D (removed in the revised manuscript): Genotyping for the MARK2 KO cell line has been provided in the same format as for the CAMSAP2 KO cell line in Figure 2G. Additionally, as the MARK2 Golgi localization discussion cannot yet be fully elucidated, we have removed this portion from the manuscript.

      - Figure 2F (revised Figure 2D): The time point of fixation, which occurred 2 hours after the scratch wound assay, has been added to the figure legend (page 30, lines 15-16).

      - Figure 2G (revised Figure 2E-F): The polarization experiment was repeated three times, and the results were pooled. This information has been included in the figure legend (page 30, lines 26 and 29).

      We hope these updates address the reviewer’s concerns and improve the clarity and completeness of the manuscript. We are grateful for the reviewer’s constructive feedback, which has greatly enhanced the rigor of our study. References:

      Wu, J., de Heus, C., Liu, Q., Bouchet, B.P., Noordstra, I., Jiang, K., Hua, S., Martin, M., Yang, C., Grigoriev, I., et al. (2016). Molecular Pathway of Microtubule Organization at the Golgi Apparatus. Dev Cell 39 (1): 44-60.

      Sup-fig3  

      E: Although colocalisation between CAMSAP2 and MARK2 is clear in your serum conditions in HT1080 and RPE1 cells, the deletion domain analysis appears weak and insufficient to implicate the role of the spacer domain. This part should be deleted or strengthened, but the data do not satisfactorily support your conclusion as it stands.  

      We sincerely thank the reviewer for their critical comments regarding the deletion domain analysis of MARK2 and its role in colocalization with CAMSAP2. As the current data do not satisfactorily support our conclusions, we have removed all related content on MARK2 and the deletion domain analysis from the manuscript to maintain scientific rigor.

      We appreciate the reviewer’s valuable feedback, which has helped us refine and improve the quality and focus of our study.

      Figure 3  

      A: Can the reduced CAMSAP2 Golgi localisation phenotype be rescued by the overexpression of MARK2 cDNA in the MARK2 KO cells?  

      F: Presence of a white dot on the HT1080 plot  

      G: The composition of the homogenization buffer is not indicated in the material and methods  

      We thank the reviewer for their valuable comments and suggestions regarding Figure 3. Below, we detail the modifications made:

      - Figure 3A: To address whether the reduced CAMSAP2 Golgi localization phenotype can be rescued, we overexpressed MARK2 cDNA in MARK2 KO cells. Our results show that overexpression of MARK2 successfully rescues the reduced CAMSAP2 localization to the Golgi, as demonstrated in Supplementary Figures 3C and 3E (page 8, lines 5-7).

      - Figure 3F: We have removed the white dot on the HT1080 plot to ensure clarity and accuracy.

      - Figure 3G: The composition of the homogenization buffer used in the experiment has been added to the Materials and Methods section for completeness (page 24, lines 34-41 and page 25, lines 1-10).

      We hope these revisions address the reviewer’s concerns and enhance the clarity and rigor of our study. We are grateful for the reviewer’s constructive feedback, which has significantly improved the quality of our manuscript.

      Figure 4  

      B: Quantification of the effect of the S835A mutation should be provided  

      D: Top left panel: Why Ha antibody stains Golgi structure in absence of Ha-CAMSAP2 transfection ? IF the Ha antibody has unspecific affinity towards the Golgi apparatus, may be it is not the good tag to use in this assay?  

      E: The number of cells studied should be standardized. 119 cells were analyzed in the CAMSAP KO vs only 35 cells in the CAMSAP2 KO (HA-CAMSAP2-S835D) conditions. This could introduce strong bias to the analysis. Furthermore the CAMSAP2 S835A seems to provide a certain level of rescue. It would be interesting to see what is the result of the T test between the HT1080 and HA-CAMSAP S835A conditions.  

      We thank the reviewer for their thoughtful comments and suggestions regarding Figure 4. Below, we detail the revisions and clarifications made:

      - Figure 4B: The S835A mutation renders CAMSAP2 non-phosphorylatable by MARK2. This conclusion is based on our experimental observations and previously reported mechanisms.

      - Figure 4D: The HA antibody does not exhibit non-specific affinity toward the Golgi apparatus. The observed labeling in the top left panel was due to an error in our annotation. We have corrected the label, replacing "HA" with "CAMSAP2" to accurately reflect the experimental conditions.

      - Figure 4E: To standardize the number of cells analyzed across conditions, we reduced the number of CAMSAP2 KO cells analyzed to 50 and balanced the sample sizes for comparison. Additionally, we performed a t-test between the HT1080 and HACAMSAP2 S835A conditions. The results support that CAMSAP2 S835A provides partial rescue, as reflected in the updated analysis (page 32, lines 19-23).

      We hope these revisions address the reviewer’s concerns and improve the accuracy and reliability of our results. We greatly appreciate the reviewer’s constructive feedback, which has significantly enhanced the quality of our study.

      Figure 6  

      6A: The wound position should be indicated on the picture.  

      6B: Given that microtubule labelling is present on the vast majority of the cell surface, this type of quantification provides very little information using conventional light microscopy and should not be used to conclude any change in the microtubule network using Pearson's coefficient.  The text describing the figure 6A and 6B needs re written as I do not understand what the author want to say. "In cells located before the wound edge..." : I do not understand how a cell could be located before the wound edge. Which figure corresponds to the trailing edge of the wounding?

      We thank the reviewer for their valuable comments on Figure 6A (revised Supplementary Figure 6E) and Figure 6B (revised Supplementary Figure 6F). Below, we detail the modifications made:

      - Figure 6A (revised Supplementary Figure 6E), we have added arrows to indicate the wound position, providing clearer guidance for interpreting the image.

      - Figure 6B (revised Supplementary Figure 6F), we revised our quantification method based on the approach used in literature (Wu et al., 2016). Specifically, we analyzed the relationship between microtubules and the Golgi apparatus in cells at the leading edge of the wound. The x-axis represents the distance from the Golgi center, while the y-axis shows the normalized radial fluorescence intensity of microtubules and the Golgi apparatus.

      Additionally, we revised the accompanying text for clarity and accuracy. The original description:

      “In cells located before the wound edge, the Golgi apparatus maintained a ribbon-like shape, with a higher density of microtubules. In contrast, at the trailing edge of the wounding, the Golgi apparatus appeared more as stacks around the nucleus, with fewer microtubules”  was replaced with:

      “Finally, to comprehensively understand the dynamics between non-centrosomal microtubules and the Golgi apparatus during Golgi reorientation, we conducted cell wound-healing experiments (Supplementary Figure 6 E-F). Our observations revealed notable changes in the Golgi apparatus and microtubule network distribution in relation to the wounding. These findings corroborate our earlier results and suggest a highly dynamic interaction between the Golgi apparatus and microtubules during Golgi reorientation” (Revised manuscript page 11 lines 3-10).

      We hope these changes address the reviewer’s concerns and improve the clarity and robustness of our study. We greatly appreciate the reviewer’s constructive feedback, which has significantly enhanced the presentation and interpretation of our data. References:

      Wu, J., de Heus, C., Liu, Q., Bouchet, B.P., Noordstra, I., Jiang, K., Hua, S., Martin, M., Yang, C., Grigoriev, I., et al. (2016). Molecular Pathway of Microtubule Organization at the Golgi Apparatus. Dev Cell 39 (1): 44-60.

      Reviewer #3:  

      Summary  

      In this study, Xu et al. analyzed the wound healing process of HT1080 cells to elucidate the molecular mechanisms by which the Golgi apparatus exhibits transient dispersion before reorienting to the wound edge in the compact assembly structure. They focused on the role of the microtubule minus-end binding protein CAMSAP2, which mediates the linkage between microtubules and the Golgi membrane. At first, they noticed that CAMSAP2 transiently lost Golgi colocalization during the initial phase of the wound healing process. They further found that the cell polarity-regulating kinase MARK2 binds and phosphorylates S835 of CAMSAP2, thereby enhancing the interaction between CAMSAP2 and the Golgi protein Uso1. Together with the phenotypes of CAMSAP2, MARK2, and Uso1 KO cells, these authors argue that the MARK2dependent phosphorylation of CAMSAP2 plays an important role in the reassembly and reorientation of the Golgi apparatus after a transient dispersion observed during the wound healing process.

      We sincerely thank the reviewer for their thoughtful summary of our study and constructive feedback. Your comments have been invaluable in refining our research and enhancing the clarity and impact of our manuscript.

      Major comments

      (1) The premise of this study was that during the wound healing process, the Golgi apparatus exhibits transient dispersion before reorientation to the front of the nucleus.  

      In the first place, this claim has not been well established in previous studies or this paper. Therefore, the authors should present a proof of this claim in a clearer manner.  

      To introduce this cellular event, the authors cite several papers in the introduction (page 4) and the results (page 6) sections. However, many papers cited are review articles, and some of them do not describe this change in the Golgi assembly structure before reorientation. Only two original articles discussed this phenomenon (Bisel et al. 2008 and Wu et al. 2016), and direct evidence was provided by only one paper (Wu et al. 2016) in which changes in the Golgi apparatus in wound-healing RPE1 cells were recorded by live imaging (Fig.7A in Wu et al. 2016).

      Furthermore, it should be noted that this previous paper demonstrated that depletion of CAMSAP2 inhibits Golgi dispersion. Obviously, this conclusion is inconsistent with their statement to introduce this study (page4) that ‟This emphasizes CAMSAP2's role in sustaining Golgi integrity during critical cellular events like migration." In addition, it also contradicts the authors' model of the present paper (Fig. 6E), which argued that disruption of the Golgi association of CAMSAP2 facilitates the Golgi dispersion.  

      We sincerely thank the reviewer for their detailed comments and for providing us with the opportunity to clarify the premise and conclusions of our study. Below, we address the main concerns raised:

      First, to provide direct evidence of Golgi apparatus changes during the wound-healing process, we conducted live-cell imaging experiments. Our observations, presented in revised Supplementary Figure 2A, clearly demonstrate that the Golgi apparatus exhibits a transient dispersion state before reorienting toward the leading edge of the nucleus during migration.

      Regarding the interpretation of previous studies, we acknowledge the reviewer’s concerns about the citation of review articles. To address this, we have revisited the literature and clarified that the phenomenon of Golgi dispersion during reorientation has been directly demonstrated in Wu et al (Wu et al., 2016), where live imaging of wound-healing RPE1 cells showed this dynamic behavior. Furthermore, we note that in Wu et al paper explicitly demonstrates that CAMSAP2 depletion promotes Golgi dispersion, contrary to the reviewer’s interpretation that "depletion of CAMSAP2 inhibits Golgi dispersion."

      Our model focuses on the role of CAMSAP2 in restoring the Golgi from a transiently dispersed structure back to an intact ribbon-like structure during reorientation. Specifically, we propose that during this process, the disruption of CAMSAP2’s association with the Golgi affects this restoration, rather than directly promoting Golgi dispersion as suggested by the reviewer. We believe this distinction aligns with our data and the existing literature.

      To strengthen the background of our study, we have revised the introduction and results sections (page 6, lines 6-13 and page 7, lines 1-17) to minimize reliance on review articles and have provided more explicit citations to original research papers. We hope this addresses the reviewer’s concern about the sufficiency of the cited literature.

      We trust these clarifications and revisions resolve the reviewer’s concerns and enhance the robustness of our study. Once again, we are grateful for the reviewer’s constructive feedback, which has greatly helped refine our manuscript. References:

      Wu, J., de Heus, C., Liu, Q., Bouchet, B.P., Noordstra, I., Jiang, K., Hua, S., Martin, M., Yang, C., Grigoriev, I., et al. (2016). Molecular Pathway of Microtubule Organization at the Golgi Apparatus. Dev Cell 39 (1): 44-60.

      The authors did not provide experimental data for this temporal change in the Golgi assembly structures during the wound-healing process of HT1080 that they analyzed. They only provide an illustration of wound-healing cells (Fig.1F), in which cells are qualitatively discriminated and colored based on the Golgi states, without indicating the experimental basis of the discrimination.

      According to their ambiguous descriptions in the text (page7), the reader can speculate that Fig. 1F is illustrated based on the images in Supplementary Fig. 2C. However, because of the low quality and presentation style of these data, it is impossible to recognize the assembly structures of the Golgi apparatus in wound-edge cells.  

      If the authors hope to establish this premise claim for the present paper, they should provide their own data corresponding to the present Supplementary Fig. 2C in more clarity and present qualitative data verifying this claim, as Wu et al. did in Fig. 7A in their paper.

      We sincerely thank the reviewer for their constructive feedback and the opportunity to address the concern regarding the lack of experimental data supporting the temporal changes in Golgi assembly during the wound-healing process.

      To establish this premise, we conducted live-cell imaging experiments to observe the dynamic changes in the Golgi apparatus during directed cell migration. Our data, now presented in Supplementary Figure 2A, clearly demonstrate that the Golgi apparatus undergoes a transient dispersed state before reorganizing into an intact structure. These findings provide direct experimental evidence supporting our claim.

      In addition, we have revised the data originally presented in Supplementary Figure 2C and enhanced its quality and presentation style. This supplementary figure now includes clearer images and annotations to better illustrate the Golgi assembly structures in wound-edge cells. The improved data presentation aligns with the standards set by Wu et al reported (Wu et al., 2016) and provides qualitative support for our observations.

      We hope these additions and revisions address the reviewer’s concerns and strengthen the scientific rigor and clarity of our manuscript. We are grateful for the reviewer’s valuable suggestions, which have significantly improved the quality of our study. References:

      Wu, J., de Heus, C., Liu, Q., Bouchet, B.P., Noordstra, I., Jiang, K., Hua, S., Martin, M., Yang, C., Grigoriev, I., et al. (2016). Molecular Pathway of Microtubule Organization at the Golgi Apparatus. Dev Cell 39 (1): 44-60.

      (2) In Fig.1A-D, the authors claim that CAMSAP2 dissociates from the Golgi apparatus in cells "that have not yet completed Golgi reorientation and exhibit a transitional Golgi structure, characterized by relative dispersion and loss of polarity (page7)." However, I these analyses, they do not analyze the initial stage (0.5h after wound addition) of cells facing the wound edge, as they do in Supplementary Fig. 2C. Instead, they analyze cells separated from the wound edge at 2 h after wound addition when the wound-edge cells complete their polarization. These data are highly misleading because there is no evidence that the cells separated from the wound edge are really in the transitional state before polarization.  

      In this regard, Fig. 1E shows the analysis of the wound-edge cells at 0.5 and 2 h after the addition of wound, which provides suitable data to verify the authors' claim. However, the corresponding legend indicates that these statistical data are based on the illustration in Fig. 1F, which is probably based on highly ambiguous data in Supplementary Fig. 2C (see above).  

      Taken together, I strongly recommend the authors to remove Fig.1A-D. Instead, they should include the improved figure corresponding to the present Supplementary Fig.2C and present its statistical analysis similar to the present Fig.1E for this claim.

      We sincerely thank the reviewer for their constructive feedback and recommendations. Below, we address the concerns raised regarding Figure 1A-D and Supplementary Figure 2C.

      To provide stronger evidence for the transitional state of the Golgi apparatus during reorientation and the dynamic regulation of CAMSAP2 localization, we conducted live-cell imaging experiments. These results, now presented in Supplementary Figure 2A, clearly demonstrate that the Golgi apparatus undergoes a transitional state characterized by dispersion before reorienting toward the leading edge.

      Additionally, we analyzed fixed wound-edge cells at different time points during directed migration to observe CAMSAP2’s colocalization with the Golgi apparatus. The results, shown in Figures 1A and 1B, reveal dynamic changes in CAMSAP2 localization, confirm its regulation during Golgi reorientation, and include a corresponding statistical analysis (page 7, lines 1-17).

      These updates ensure that our claims are supported by robust and unambiguous data.

      We hope these revisions address the reviewer’s concerns and provide clear and reliable evidence for the transitional state of the Golgi apparatus and CAMSAP2’s dynamic regulation. We are grateful for the reviewer’s constructive suggestions, which have greatly improved the quality and focus of our manuscript.

      (3) In Supplementary Fig. 5 and Fig. 4, the authors claim that MARK2 phosphorylates S835 of CAMSAP2.  

      There are many issues to be addressed. Otherwise, the above claim cannot be assumed to be reliable.  

      First, the descriptions (in the text and method sections) and figures (Supplementary Fig.5) concerning the in vitro kinase assay and subsequent phosphoproteomic analysis are too immature and contain many errors.  

      Legend to Supplementary Fig. 5 is too immature for comprehension. It should be completely rewritten in a more comprehensive manner. The figure in Supplementary Fig. 5C is also too immature for understanding. They simply paste raw mass spectrometric data without any modification for presentation.  

      We sincerely apologize for the lack of clarity and inaccuracies in the original descriptions and figure legends for the in vitro kinase assay and phosphoproteomic analysis. We greatly appreciate the reviewer’s detailed comments, which have allowed us to address these issues comprehensively.

      To improve clarity and accuracy, we have rewritten the figure legend for the original Supplementary Figure 5 (now Supplementary Figure 4) as follows:

      (A): CBB staining of a gel with GFP-CAMSAP2, GST, and GST-MARK2. GFP-CAMSAP2 was expressed in Sf9 cells and purified. GST and GST-MARK2 were expressed in E. coli and purified.

      (B): Western blot analysis of an in vitro kinase assay. GST or GST-MARK2 was incubated with GFP-CAMSAP2 in kinase buffer (50 mM Tris-HCl pH 7.5, 12.5 mM MgCl2, 1 mM DTT, 400 μM ATP) at 30°C for 30 minutes. Reactions were stopped by boiling in the loading buffer.

      (C): Detection of phosphorylation at S835 in CAMSAP2 by mass spectrometry. The observed mass increases in b4, b5, b6, b7, b8, b10, b11, and b12 fragments indicate phosphorylation at Ser835.

      (D): Kinase assay samples analyzed using Phos-tag SDS-PAGE. HEK293 cells were cotransfected with the indicated plasmids. Band shifts of CAMSAP2 mutants were examined via western blot. Phos-tag was used in SDS-PAGE, and arrowheads indicate the shifted bands caused by phosphorylation.

      To address the reviewer’s concern about Supplementary Figure 5C, we have reformatted the mass spectrometry data to improve readability and presentation quality. The revised figure includes clearer annotations and graphical representations of the mass spectrometric evidence for phosphorylation at S835.

      We believe these updates enhance the comprehensibility and reliability of our data, providing robust support for our claim that MARK2 phosphorylates CAMSAP2 at S835. We hope these

      revisions address the reviewer’s concerns and demonstrate our commitment to improving the quality of our manuscript.

      The readers cannot understand how the authors purified GFP-CAMSAP2 for the kinase assay.

      The method section incorrectly states that the product was purified using Ni-resin.  

      We thank the reviewer for their comment regarding the purification of GFP-CAMSAP2 for the kinase assay. We would like to clarify that GFP-CAMSAP2 carries a His-tag, which allows for purification using Ni-resin, as described in the Methods section (page 23, Lines 32-40). Therefore, the description in the Methods section is correct.

      To avoid any potential misunderstanding, we have revised the Methods section to provide more detailed and precise descriptions of the purification process. Specifically, GFP-CAMSAP2 was cloned into the pOCC6_pOEM1-N-HIS6-EGFP vector, which includes a His-tag, and was expressed in Sf9 cells. The His-GFP-CAMSAP2 protein was purified using Ni-resin chromatography. Relevant details have been added to the Methods section (page 21, Lines 34-36:

      “CAMSAP2 was cloned into the pOCC6_pOEM1-N-HIS6-EGFP vector expressed in Sf9, purified as His-GFP-CAMSAP2.”; page 23, Lines 32-33: “His-GFP-CAMSAP2 was cotransfected with bacmids into Sf9 cells to generate the passage 1 (P1) virus.”).

      We hope these clarifications and revisions address the reviewer’s concern and improve the comprehensibility of our experimental details. We appreciate the reviewer’s feedback, which has helped us refine the manuscript.

      In this relation, GST and GST-MARK2 are described as having been purified from Sf9 insect cells in the text section (page9) and legend to Supplementary Fig. 5, but from E. coli in the method section. Which is correct?  

      We thank the reviewer for pointing out the inconsistencies in the descriptions regarding the source of GST and GST-MARK2. To clarify, both GST and GST-MARK2 were purified from E. coli, as stated in the Methods section (page 23, Lines 26-31). We have corrected the erroneous descriptions in the main text (page 8, Lines 35-36) and the legend to Supplementary Figure 4 to ensure consistency.

      Additionally, we have updated the legend for Supplementary Figure 4A to state the sources of each protein explicitly:

      “GFP-CAMSAP2 were expressed in Sf9 cells and purified. GST and GST-MARK2 were expressed in E. coli and purified.” (page 38, Lines 2-3)

      These revisions ensure that the experimental details are accurate and consistent across the manuscript, eliminating any potential confusion. We appreciate the reviewer’s careful review and constructive feedback, which have helped us improve the clarity and reliability of our study.

      Because the phosphoproteomic data (Supplementary Fig. 5C) are not provided clearly, the experimental data for Fig.4A, in which possible CAMSAP2 phosphorylation sites are illustrated, are completely unknown. For me, it is highly strange that only the serine residues are listed in Fig. 4A.

      We sincerely thank the reviewer for raising this important point regarding Figure 4A and the phosphoproteomic data in Supplementary Figure 5C.

      - Phosphorylation Sites in Figure 4A

      The phosphorylation sites illustrated in Figure 4A are derived from our analysis of the original mass spectrometry data. These sites were included based on their high confidence scores and data reliability. Importantly, only serine residues met the stringent criteria for inclusion, as no threonine or tyrosine residues had sufficient evidence for phosphorylation. To clarify this, we have updated the figure legend for Figure 4A (page 32, Lines3-7).

      - Improvements to Supplementary Figure 5C (Supplementary Figure 4D in the revised manuscript)

      To enhance transparency and clarity, we have reformatted Supplementary Figure 4D to include clearer annotations. The revised figure highlights the phosphopeptides used to identify the phosphorylation sites and provides a more comprehensive presentation of the mass spectrometry data. To clarify this, we have updated the figure legend for Supplementary Figure 4D (page 38, Lines 11-13).

      - Data Availability

      We will follow the journal’s guidelines by uploading the raw mass spectrometry data to the required public database upon manuscript acceptance. This ensures that the data are accessible and reproducible in compliance with journal standards.

      We hope these clarifications and updates address the reviewer’s concerns and improve the reliability and comprehensibility of our data presentation. We greatly appreciate the reviewer’s constructive feedback, which has helped us enhance the rigor and clarity of our manuscript.

      Considering the crude nature of the GST-MARK2 sample used for the in vitro kinase assay (Supplementary Fig. 5A), it is unclear whether MARK2 is responsible for all phosphorylation sites on CAMSAP2 detected in the phosphoproteomic analysis. Furthermore, if GFP-CAMSAP2 was purified from Sf9 insect cells, these sites might have been phosphorylated before incubation for the in vitro kinase assay. The authors should address these issues by including a negative control using the kinase-dead mutant of MARK2 in their in vitro kinase assay.

      We sincerely thank the reviewer for raising these important points regarding the potential prephosphorylation of GFP-CAMSAP2 and the role of MARK2 in the phosphorylation sites detected in our analysis.

      To address the possibility that GFP-CAMSAP2 may have been pre-phosphorylated during its expression in Sf9 insect cells, we conducted an in vitro comparison. Specifically, we compared the band shifts observed in GST-MARK2 + GFP-CAMSAP2 versus GST + GFP-CAMSAP2 under identical conditions. As shown in Supplementary Figure 4B, the GST-MARK2 + GFP-CAMSAP2 group exhibited a clear upward band shift compared to the GST + GFP-CAMSAP2 group, indicating additional phosphorylation events induced by MARK2.

      Regarding the inclusion of a kinase-dead MARK2 mutant as a negative control, we acknowledge this as a valuable suggestion for further confirming the specificity of MARK2 in phosphorylating CAMSAP2. While this experiment is not currently included, we plan to conduct it in our future studies to strengthen our findings.

      We hope this clarification and the provided evidence address the reviewer’s concerns. We are grateful for this constructive feedback, which has helped us critically evaluate and refine our experimental approach.

      (4) In Supplementary Fig.6A-C and Fig.5A-B, the authors claim that the phosphorylation of CAMSAP2 S835 is required for restoring the reduced reorientation of the Golgi in wound-healing cells and the delay in wound closure observed in MARK2 KO cells.  

      If the aforementioned claim is adequately supported by experimental data, it indicates that the defects in Golgi repolarization and wound closure in MARK2 KO cells can be mainly attributed to the reduced phosphorylation of S835 of CAMSAP2 in HT1080. Considering the presence of many well-known substrates of MARK2 for regulating cell polarity, this claim is highly striking.  

      However, to strongly support this conclusion, the authors should first perform a rescue experiment using MARK2 KO cells exogenously expressing MARK2. This step is essential for determining whether the defects observed in MARK2 KO cells are caused by the loss of MARK2 expression, but not by other artificial effects that were accidentally raised during the generation of the present MARK2 KO clone.  

      We sincerely thank the reviewer for their insightful suggestion regarding the rescue experiment to confirm that the defects observed in MARK2 KO cells are specifically caused by the loss of MARK2 expression.

      To address this, we performed a rescue experiment in MARK2 KO HT1080 cells by exogenously expressing GFP-MARK2. Our results, presented in Supplementary Figures 3C-E, demonstrate that GFP-MARK2 expression successfully restores the localization of CAMSAP2 on the Golgi apparatus in MARK2 KO cells.

      These findings strongly support the conclusion that the defects in Golgi architecture and CAMSAP2 Golgi localization are directly attributable to the loss of MARK2 expression, rather than any artificial effects potentially introduced during the generation of the MARK2 KO clone.

      We hope these additional experimental results address the reviewer’s concerns and provide robust evidence for the role of MARK2 in regulating Golgi reorientation and wound closure. We are grateful for the reviewer’s constructive feedback, which has significantly improved the rigor and clarity of our study.

      In addition, to evaluate the impact of the rescue effect of CAMSAP2, the authors should include the data of wild-type HT1080 and MARK2 KO cells in Fig. 5B to reliably demonstrate the aforementioned claim.  

      We thank the reviewer for their valuable suggestion to include data from wild-type HT1080 and MARK2 KO cells in Figure 5A-C to better evaluate the rescue effects of CAMSAP2.

      In response, we have incorporated data from wild-type HT1080 and MARK2 KO cells into Figure 5A-C. These additions provide a comprehensive comparison and further demonstrate the impact of CAMSAP2-S835A and CAMSAP2-S835D on Golgi reorientation relative to the wild-type and MARK2 KO conditions.

      These changes are reflected in Figures 5A-C.

      We hope these updates address the reviewer’s concerns and strengthen the reliability of our conclusions. We greatly appreciate the reviewer’s constructive feedback, which has significantly enhanced the robustness of our study.

      Principally, before checking the rescue effects in MARK2 KO cells, the authors should examine the rescue activity of the CAMSAP2 S835 mutants in restoring the reduced reorientation of the Golgi in wound-healing cells and the delay in wound closure observed in CAMSAP2 KO cells (Supplementary Fig.1F-H and Supplementary Fig.2A, B). These experiments are more essential experiments to substantiate the authors' claim.

      We thank the reviewer for their insightful suggestion to examine the rescue activity of CAMSAP2 S835 mutants in CAMSAP2 KO cells to further substantiate our claims.

      In Figure 4D-F, we observed significant differences between CAMSAP2 S835 mutants in their ability to restore Golgi structure and localization, indicating functional differences between these mutants. To better reflect the regulatory role of MARK2-mediated phosphorylation of CAMSAP2, we performed scratch wound-healing experiments in MARK2 KO cells by establishing stable cell lines expressing CAMSAP2 S835 mutants. These experiments allowed us to assess Golgi reorientation during wound healing and are presented in Figure 5A-C.

      We also attempted to generate stable cell lines expressing GFP-CAMSAP2 and its mutants in CAMSAP2 KO cells. Unfortunately, these cells consistently failed to survive, preventing successful construction of the cell lines.

      We hope these experiments and explanations address the reviewer’s concerns. We are grateful for the reviewer’s constructive feedback, which has helped us refine and improve our study.

      (5) The data presented in Fig. 6A and B are not sufficient to support the authors' notion that "our observation revealed notable changes in the Golgi apparatus and microtubule network distribution in relation to the wounding. (page 11)"  

      Fig. 6A, which includes only a single-cell image in each panel, does not demonstrate the general state of microtubules and the Golgi in the wound-edge cells. The reader cannot even know the migration direction of each cell.  

      Fig.6 B are not suitable to quantitatively support the authors' claim. The authors should find a way to quantitatively estimate the microtubule density around the Golgi and the shape and compactness of the Golgi in each cell facing the wound, not estimating the colocalization of microtubules and the Golgi, as in the present Fig. 6B.  

      We sincerely apologize for the confusion caused by our unclear descriptions and presentation.

      Here, we clarify the purpose and improvements made to address the reviewer’s concerns. In this study, we primarily aimed to observe the relationship between microtubules and the Golgi apparatus in cells at the leading edge of the wound during directed migration. In Figure 6A (now Supplementary Figure 6E), the images represent cells located at the wound edge at different time points. To improve clarity, we have added arrows indicating the migration direction and updated the figure legend to describe these details (page 40 lines 13-14).

      To better quantify the relationship between microtubules and the Golgi apparatus, we revised our analysis by referring to the quantitative method used in Figure 3F of the paper Molecular Pathway of Microtubule Organization at the Golgi Apparatus. Specifically, we performed a radial analysis of fluorescence intensity in cells at the wound edge, measuring the distance from the Golgi center (x-axis) and the normalized radial fluorescence intensity of microtubules and the Golgi (y-axis). These results are now presented in Supplementary Figure 6E and 6F.

      We hope these improvements address the reviewer’s concerns and provide stronger evidence for the changes in the Golgi apparatus and microtubule network distribution in relation to wound healing. We greatly appreciate the reviewer’s constructive feedback, which has significantly enhanced the clarity and rigor of our study.

      The legends to Fig. 6A and B indicate that they compared immunofluorescent staining of cells at the edge of the wound after 0.5h and 2 h of migration. However, the authors state in the text that they compared "the cells located before the wound" and "the cells at the trailing edge of the wounding (page 11)."Although this description is highly ambiguous and misleading, if they compared the wound-edge cells and the cells separated from the wound edge at 2 h after cell migration here, they should improve the experimental design as I pointed out in the 2nd major comment.  

      We thank the reviewer for their detailed feedback regarding the experimental design and the need to clarify our descriptions. We have addressed these concerns as follows:

      - Clarification of descriptions:

      We recognize that the previous description in the text regarding "the cells located before the wound" and "the cells at the trailing edge of the wounding" was ambiguous and potentially misleading. We have revised this text to accurately describe the experimental design. Specifically, we compared cells at the leading edge of the wound at different time points (0.5h and 2h post-migration). These corrections are reflected in figure legends (Supplementary Figure 6E and 6F ) and the Results section (page 11,lines 3-8).

      - Improved experimental design:

      To better support our conclusions, we performed live-cell imaging to observe the dynamic changes in the Golgi apparatus during directed migration. As shown in Supplementary Figure 2A, our results confirm that the Golgi apparatus undergoes a transient dispersed state before reorganizing into an intact structure.

      Additionally, we performed fixed-cell staining at different time points to analyze the colocalization of CAMSAP2 with the Golgi apparatus in cells at the leading edge of the wound. The colocalization analysis, presented in Figures 1A-C, further demonstrates the dynamic regulation of CAMSAP2 during Golgi reorientation.

      We hope these updates address the reviewer’s concerns and provide a clearer and more robust foundation for our conclusions. We are grateful for the reviewer’s constructive feedback, which has greatly enhanced the clarity and rigor of our study.

      Minor comments  

      (1) In Fig. 2 and Supplementary Fig. 3, the authors claim that MARK2 is enriched around the Golgi. However, this claim was based on immunofluorescent images of single cells and single-line scans.  

      It is better to present the statistical data for Pearson's coefficient as shown in Figs. 1D and E. To demonstrateMARK2 enrichment around Golgi, but not localization in Golgi, the authors should find a way to quantify the specific enrichment of MARK2 signals in the Golgi region.  

      We thank the reviewer for raising this important point regarding the enrichment of MARK2 around the Golgi apparatus. Upon further consideration, we acknowledge that our current data do not provide sufficient evidence to fully elucidate the mechanism of MARK2 localization to the Golgi.

      To maintain the scientific rigor of our study, we have removed this claim and the corresponding content from the manuscript, including original Figures 2 and Supplementary Figure 3 that specifically discuss MARK2 enrichment. These changes do not affect the primary conclusions of the study, which focus on the role of MARK2-mediated phosphorylation of CAMSAP2.

      We hope this clarification addresses the reviewer’s concerns. In the future, we plan to investigate the precise mechanism of MARK2 localization using additional experimental approaches. We are grateful for the reviewer’s constructive feedback, which has helped us refine the scope and focus of our manuscript.

      (2) In Fig. 3 and Supplementary Fig. 4, the authors report that CAMSAP2 localization on the Golgi is reduced in cells lacking MARK2.  

      Essentially, the present results support this claim. However, the authors should analyze the Golgi localization of CAMASP2 with the same quantification parameter because they used Pearson's coefficient in Fig. 1D, E and Supplementary Fig.4D but Mander's coefficient in Fig. 3C and Fig.4F.  

      We thank the reviewer for their insightful comment regarding the consistency of quantification parameters used in our analysis of CAMSAP2 localization on the Golgi apparatus.

      To address this concern, we have revised Figure 3C to use Pearson’s coefficient for consistency with Figure 1D, 1E (Figure 1B and 1E in the revised manuscript), and Supplementary Figure 4D (Supplementary Figure 3I in the revised manuscript). This ensures uniformity in the quantification parameters across these analyses.

      For Figure 4F, we have retained Mander’s coefficient, as it accounts for variability in expression levels due to overexpression in individual cells. We believe this approach provides a more accurate reflection of CAMSAP2 localization under the experimental conditions shown in Figure 4F.

      We hope these adjustments clarify our analysis and address the reviewer’s concerns. We greatly appreciate the reviewer’s constructive feedback, which has helped improve the consistency and accuracy of our study.

      (3) In Fig.4D-F, the authors claim that S835 phosphorylation of CAMSAP2 is essential for its localization to the Golgi apparatus and for restoring the Golgi dispersion induced by CAMASAP2 depletion.  

      Fig.4E indicates that the S835A mutant of CAMSAP2 significantly restores the compact assembly of the Golgi apparatus, and the differences in the rescue activities of the wild type, S835A, and S835D are rather small. These data contradict the authors' conclusions regarding the pivotal role of MARK2-mediated phosphorylation at the S835 site of CAMSAP2 in maintaining the Golgi architecture (page 9). The authors should remove the phrase "MARK2-mediated" from the sentence unless addressing the aforementioned issues (see 3rd major comment) and describe the role of S835 phosphorylation in more subdued tone.  

      We thank the reviewer for their constructive feedback regarding the conclusions drawn about the role of MARK2-mediated phosphorylation of CAMSAP2 at S835.

      In response, we have revised the relevant sentence to reflect a more nuanced interpretation of the data. Specifically, the original statement:

      “These observations indicate that the phosphorylation of serine 835 in CAMSAP2 is essential for its proper localization to the Golgi apparatus.”

      has been updated to:

      “These observations indicate that MARK2 phosphorylation of serine at position 835 of CAMSAP2 affects the localization of CAMSAP2 on the Golgi and regulates Golgi structure” (page 9, Lines 27-29).

      We hope this modification addresses the reviewer’s concerns. We are grateful for the feedback, which has helped us refine our conclusions and enhance the clarity of our manuscript.

      (4) In Figs. 5I, J and Supplementary Fig.7A-E, the authors claim that the S835 phosphorylationdependent interaction of CAMSAP2 with Uso1 is essential for its localization to the Golgi apparatus.  

      This claim was made based on immunofluorescent images of single cells and single-line scans, and was not sufficiently verified (Supplementary Fig.7B, C). Because this is a crucial claim for the present paper, the authors should present statistical data for Pearson's coefficient, as shown in Fig. 1D and E, to quantitatively estimate the Golgi localization of CAMSAP2.  

      We thank the reviewer for their suggestion to present statistical data using Pearson's coefficient for a more robust quantification of the Golgi localization of CAMSAP2.

      In response, we have revised the statistical analysis for Supplementary Figures 7B-C (Revised Figures 6F and 6G) to use Pearson's coefficient. This change ensures consistency with the quantification methods used in Figures 1D and 1E (Revised Figures 1B and 1E), allowing for a more standardized evaluation of CAMSAP2’s localization to the Golgi apparatus.

      We hope this modification addresses the reviewer’s concerns and strengthens the quantitative support for our claims. We are grateful for the reviewer’s constructive feedback, which has helped improve the rigor of our study.

      (5) The signal intensities of the immunofluorescent data in Fig. 4D, Fig. 5A, Sup-Fig. 3C and E, and Sup-Fig. 7S are very weak for readers to clearly estimate the authors' claims. They should be improved appropriately.  

      We thank the reviewer for highlighting the need to improve the clarity of the immunofluorescent data presented in several figures.

      In response, we have enhanced the signal intensities in Figures 4D, 5A, and Supplementary Figure 7D (Revised Supplementary Figure 6A) to make the signals clearer for readers, while ensuring that the adjustments do not alter the integrity of the original data. Supplementary Figures 3C and 3E was remove from our manuscript.

      Additionally, to improve consistency and readability across the manuscript, we have standardized the quantification methods for similar analyses:

      For CAMSAP2 localization to the Golgi, Pearson's coefficient has been used throughout the manuscript. Figure 3C has been updated to use Pearson's coefficient for consistency.

      For Golgi state analysis in wound-edge cells, we have used the Golgi position relative to the nucleus as a uniform metric. This has been applied to Supplementary Figures 1F and 1G, Figures 2D and 2E, and Figures 5A and 5B.

      We hope these adjustments address the reviewer’s concerns and improve the clarity and consistency of our study. We greatly appreciate the reviewer’s constructive feedback, which has significantly enhanced the quality of our manuscript.

      (6) As indicated above, the authors frequently change the parameters or methods for quantifying the same phenomena (for example, the localization of CAMSAP on the Golgi and Golgi state in wound edge cells) in each figure. This is highly confusing. They should unify them.  

      We thank the reviewer for their valuable feedback regarding the inconsistency in quantification methods across the manuscript.

      To address this concern, we have carefully reviewed the entire manuscript and standardized the methods used for quantifying similar phenomena:

      - CAMSAP2 localization on the Golgi: 

      Pearson's coefficient is now consistently used throughout the manuscript. For example, Figure 3C has been updated to use Pearson's coefficient to align with other figures, such as Figures 1B and 1E.

      - Golgi state in wound-edge cells: 

      The Golgi state is now uniformly measured based on the position of the Golgi relative to the nucleus. This method has been applied to Supplementary Figures 1F and 1G, Figures 2D and 2E, and Figures 5A and 5B.

      We believe these changes significantly improve the clarity and consistency of the manuscript, ensuring that readers can easily interpret the data. We are grateful for the reviewer’s constructive feedback, which has greatly helped us enhance the quality and rigor of our study.

      (7) The legends frequently fail to clearly indicate the number of independent experiments on which each statistical analysis was based.  

      We thank the reviewer for highlighting the need to clearly indicate the number of independent experiments for each statistical analysis.

      In response, we have carefully reviewed the entire manuscript and updated the figure legends to include the number of independent experiments for every statistical analysis. This ensures transparency and allows readers to better evaluate the reliability of the data.

      We hope these updates address the reviewer’s concerns and improve the clarity and rigor of the manuscript. We appreciate the reviewer’s constructive feedback, which has helped us enhance the quality of our work.

      (8) Supplemental Figs. 4E and 4F are not cited in the text.  

      We thank the reviewer for pointing out that Supplemental Figures 4E and 4F were not cited in the text.

      To address this, we have updated the manuscript to cite these figures (Revised Figures 2H and 2I) in the appropriate section (page 8, lines 1-5).

      “the absence of MARK2 can also influence the orientation of the Golgi apparatus during cell wound healing and cause a delay in wound closure (Figure 2 D-I and Figure 3 D).”

      We hope this revision resolves the reviewer’s concern and improves the clarity and completeness of the manuscript. We appreciate the reviewer’s feedback, which has helped us refine our work.

      (9) The data in Fig. 3 analyzed MARK2 knockout cells (not knockdown cells). The caption should be corrected.  

      We thank the reviewer for pointing out the incorrect use of "knockdown" in the caption of Figure 3.

      To address this, we have revised the title of Figure 3 from:

      “MARK2 knockdown reduces CAMSAP2 localization on the Golgi apparatus.”

      to:

      “MARK2 affects CAMSAP2 localization on the Golgi apparatus.”

      This updated caption reflects the inclusion of both MARK2 knockout and knockdown cell lines analyzed in Figure 3.

      We hope this correction resolves the reviewer’s concern and ensures the accuracy of our manuscript. We greatly appreciate the reviewer’s attention to detail, which has helped us improve the clarity and consistency of our work.

      (10) The present caption in Fig. 6 disagrees with the content of the figure.  

      We thank the reviewer for pointing out the inconsistency between the caption and the content of Figure 6.

      To address this issue, we have revised the content of Figure 6 to ensure it aligns accurately with the caption. The updated figure now reflects the description provided in the caption, eliminating any discrepancies and improving clarity for the readers.

      We appreciate the reviewer’s constructive feedback, which has helped us enhance the accuracy and presentation of our manuscript.

      (11) What do "CS" indicate in Fig. 4B and Supplementary Fig. 5D? The style used to indicate point mutants of CAMSAP2 should be unified. 835A or S835A?  

      We thank the reviewer for pointing out the inconsistency in the naming of CAMSAP2 mutants.

      To address this, we have revised all relevant figures and text to use the consistent format "S835A" and "S589A" for CAMSAP2 mutants. Specifically, in Figure 4B and Supplementary Figure 5D (now Supplementary Figure 4C), we have replaced the abbreviation "CS2" with "CAMSAP2" and updated the mutant names from "835A" and "589A" to "S835A" and "S589A," respectively. We hope these updates resolve the reviewer’s concerns and ensure clarity and consistency throughout the manuscript. We are grateful for the reviewer’s attention to detail, which has helped us improve the quality of our work.

      (12) Uso1 is not a Golgi matrix protein.  

      We thank the reviewer for pointing out the incorrect description of Uso1 as a Golgi matrix protein.

      In response, we have revised the manuscript to replace all references to “USO1 as a Golgi matrix protein” with “USO1 as a Golgi-associated protein.” This correction ensures that the terminology used in the manuscript is accurate and consistent with current scientific understanding.

      We appreciate the reviewer’s attention to detail, which has helped us improve the accuracy and quality of our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, De La Forest Divonne et al. build a repertory of hemocytes from adult Pacific oysters combining scRNAseq data with cytologic and biochemical analyses. Three categories of hemocytes were described previously in this species (i.e. blast, hyalinocyte, and granulocytes). Based on scRNAseq data, the authors identified 7 hemocyte clusters presenting distinct transcriptional signatures. Using Kegg pathway enrichment and RBGOA, the authors determined the main molecular features of the clusters. In parallel, using cytologic markers, the authors classified 7 populations of hemocytes (i.e. ML, H, BBL, ABL, SGC, BGC, and VC) presenting distinct sizes, nucleus sizes, acidophilic/basophilic, presence of pseudopods, cytoplasm/nucleus ratio and presence of granules. Then, the authors compared the phenotypic features with potential transcriptional signatures seen in the scRNAseq. The hemocytes were separated in a density gradient to enrich for specific subpopulations. The cell composition of each cell fraction was determined using cytologic markers and the cell fractions were analysed by quantitative PCR targeting major cluster markers (two per cluster). With this approach, the authors could assign cluster 7 to VC, cluster 2 to H, and cluster 3 to SGC. The other clusters did not show a clear association with this experimental approach. Using phagocytic assays, ROS, and copper monitoring, the authors showed that ML and SGC are phagocytic, ML produces ROS, and SGC and BGC accumulate copper. Then with the density gradient/qPCR approach, the authors identified the populations expressing anti-microbial peptides (ABL, BBL, and H). At last, the authors used Monocle to predict differentiation trajectories for each subgroup of hemocytes using cluster 4 as the progenitor subpopulation.

      The manuscript provides a comprehensive characterisation of the diversity of circulating immune cells found in Pacific oysters.

      Strengths:

      The combination of the two approaches offers a more integrative view.

      Hemocytes represent a very plastic cell population that has key roles in homeostatic and challenged conditions. Grasping the molecular features of these cells at the single-cell level will help understand their biology.

      This type of study may help elucidate the diversification of immune cells in comparative studies and evolutionary immunology.

      Weaknesses:

      The study should be more cautious about the conclusions, include further analyses, and inscribe the work in a more general framework.

      Reviewer #1 (Recommendations for the authors):

      The manuscript provides a comprehensive characterisation of the diversity of circulating immune cells found in Pacific oysters.

      Major comments:

      (1) The introduction would benefit from a clear description of what is known about immune cell development and diversity in this model. The bibliography on the three subtypes origins and properties (i.e. blast, hyalinocyte, and granulocytes) should be described in the introduction.

      We thank Reviewer #1 for their valuable comments, which have allowed us to further improve our manuscript. We have enriched the introduction with the following addition (line 79 to 82):

      “Blast-like cells are considered as undifferentiated hemocyte types (20), hyalinocytes (21) seem to be more involved in wound repair, and granulocytes, more implicated in immune surveillance. The latter are considered as the main immunocompetent hemocyte types (22).”

      (2) The authors mentioned a previous scRNAseq dataset produced in another oyster species. They should compare the two datasets to show the robustness of the molecular signatures determined in the present study. In addition, the authors do not mention markers identified in the literature that could be relevant to characterize the clusters (e.g. inflammatory pathway PMID: 29751033, proliferative markers PMID: 36591234/ PMID: 29317231, granulocyte markers PMID: 30633961 ... list not exhaustive). Overall, the comparison of this manuscript dataset and the available literature is too partial

      We appreciate the reviewer’s suggestion to compare our dataset with previously published scRNAseq data and to integrate markers from the literature. Below, we address these points in detail.

      The transcription factors involved in hematopoiesis, such as Tal1, Sox, Runx, and GATA, are highly conserved across metazoans. These markers were identified in our dataset, consistent with findings in other species (13), including the previously mentioned scRNA-seq dataset in C. hongkongensis (4). However, defining robust and specific markers for distinct hemocyte types remains an ambitious goal that requires validation across diverse biological contexts - work that is beyond the scope of the present study. Additionally, meaningful comparisons between datasets are constrained by differences in annotation frameworks and the absence of a standardized system for defining hemocyte subtypes. These limitations underscore the need for harmonization efforts to facilitate robust cross-study comparisons. Nonetheless, our dataset provides a strong foundation for future comparative analyses once such standardization is achieved.

      In response to the reviewer’s comment, we have added a paragraph to the discussion (lines 747 - 760) detailing that we identified conserved transcription factor markers in C. gigas and C. hongkongensis.

      (3) The authors sequenced 3000 cells without providing more comprehensive information/rationale on the analysed population. What is the number of hemocytes found in an adult? What proportion of the whole hemocyte population does this analysis represent? Does it include the tissue-interacting hemocytes? Also, what is the rationale for choosing that specific stage?

      We thank the reviewer for their insightful questions regarding the analyzed hemocyte population.

      Adult 18-month-old Crassostrea gigas contain approximately 1 million circulating hemocytes per mL of hemolymph, with an average of 1 mL of hemolymph per individual. Thus, this represents approximately 1 million circulating hemocytes per oyster. For our scRNA-seq analysis, we sampled 3,000 hemocytes, which corresponds to 0.3% of the total circulating hemocyte population.

      The number of cells processed was optimized to minimize the occurrence of doublets during scRNAseq. Following 10x Genomics Chromium guidelines, we loaded 4,950 cells to successfully recover a target of 3,000 cells, with a doublet rate of 2.4%, well below the target threshold of 2.5%. This information has been added on line 125 of the document. The target was 3,000 cells, and as reported in Supplementary Table S1, the estimated number of cells after STAR-solo alignment was 2,937. This ensures the reliability and accuracy of single-cell transcriptomic data.

      We selected 18-month-old oysters for two key reasons: (i) to facilitate hemolymph collection, as hemocyte counts are more stable and sufficient at this stage, enabling us to collect enough cells for all planned experiments, including functional and cytological analyses; and (ii) to use oysters that are not susceptible to OsHV-1 μVar herpesvirus, which predominantly affects younger animals. This ensured that the hemocyte populations analyzed were not influenced by viral infections or related immune responses.

      Our study focused on circulating hemocytes collected from hemolymph, which does not include tissue-interacting hemocytes. While these cells may represent an additional population of interest, they fall outside the scope of our current investigation.

      By carefully selecting the animal stage and optimizing cell sampling, we ensured that the scRNA-seq dataset provides a robust representation of circulating hemocyte diversity while maintaining high data quality.

      (4) For the GO term enrichment analysis, the authors included all genes presenting a cluster enrichment above L2FC>0.25. This seems extremely low to find distinct functions for each cluster. The risk is to call "cluster specific GO term" GO terms for which the genes are poorly enriched in the cluster. For the most important GO term mentioned in the text, the authors should show the expression levels of the genes (with DotPlot similar to Fig1D) to illustrate the specificity of the GO term. At last, the GO enrichment scores were apparently calculated using the whole genome as background. The analysis, aiming at finding differences between hemocyte subgroups, should use the genes detected in the dataset as background.

      We appreciate the reviewer's concerns regarding the threshold used for GO term enrichment analysis and the choice of background genes. Below, we provide clarification on these points.

      For nuanced comparisons, such as those between activation states of the same cell type, lower thresholds for log2FC (e.g., ≥0.25) are commonly used to detect subtle regulatory shifts. In single-cell RNA sequencing (scRNA-seq) analyses, it is typical to use a log2FC threshold between 0.25 and 0.5 to ensure that biologically relevant, yet subtle, changes are captured. For our analysis, this threshold was chosen to maintain sensitivity to such shifts, particularly given the diversity and functional specialization of hemocyte clusters.

      To address the reviewer's suggestion, we will include DotPlot representations (similar to Fig. 1D) for the most significant GO terms highlighted in the text. This will illustrate the expression levels of the associated genes across clusters and demonstrate their specificity to the identified GO terms.

      Regarding the background used in the GO enrichment analysis, we employed the Rank Based Gene Ontology Analysis (RBGOA) approach, which explicitly states in its documentation: "It is important to have the latter two tables representing the whole genome (or transcriptome) — at least the portion that was measured — rather than some select group of genes since the test relies on comparing the behavior of individual GO categories to the whole." Our analysis was conducted in agreement with these initial recommendations, ensuring that the results are consistent with the methodology outlined for RBGOA.

      (5) The authors reannotated the genes of C. gigas to reach 73.1% annotation. What are the levels of annotations found prior to the reannotation? What do the scores/scale bars from the RBGOA analysis mean in Figures 2B-D?

      Thank you for your comment. The original annotation for C. gigas was based on the work of Penaloza et al. (5), which provided GO annotations for 18,750 out of 30,724 genes, corresponding to 61% annotation. Following our reannotation efforts, we were able to increase the annotation coverage to 73.1%, enhancing the resolution of downstream analyses. In response to the reviewer’s comment, we have updated the results section (line 211 and 216) to explicitly include the original annotation coverage of 61% from the work of Penaloza et al., followed by details on our newly achieved annotation percentage of 73.1%.

      Thank you for pointing this out. We apologize for the oversight regarding the scale bar in Figures 2BD. The colors in the original figure correspond to a z-score calculated from the gene ratio, which was not clearly explained and may have caused confusion. In the revised version of the manuscript, we propose a new representation to facilitate understanding and improve the clarity of the data presentation (Figure 2B).

      (6) The authors describe first the result of the Kegg enrichment analysis and then of the RBGOA. To gain fluidity, I would suggest merging the results of both Kegg and RBGOA for each cluster.

      Thank you for the suggestion. To enhance the fluidity of the results section, we have redesigned the KEGG/RBGOA figure (see figure 2A and 2B) to present the results for each cluster in an integrated manner. This revised approach aims to provide a clearer and more cohesive representation of the findings.

      (7) The authors make correlations between gradient fraction containing multiple hemocyte populations and qPCR expression levels of cluster-specific markers to associated cytologic features with specific clusters. If feasible, I would recommend validating the association of several markers with hemocyte subgroups using in situ hybridisation or immunolabelling.

      Cytological identification of hemocytes in our study relies on MCDH staining, which provides detailed morphological and cytological information. Unfortunately, the fixation methods required for in situ hybridization (ISH) or immunolabeling are not compatible with those used for MCDH staining. We attempted to combine these approaches but found that the fixation protocols necessary for ISH or immunolabeling compromised the quality of the cytological features observed with MCDH staining. Consequently, such validation was not feasible within the constraints of our experimental setup.

      (8) Anti-microbial peptides are mentioned as enriched in agranular cells based on the gradient/qPCR analysis (Figure 6). Are these AMPs regulated by inflammatory pathways? Are any inflammatory pathways enriched in any scRNAseq cluster? In addition, without validating the data by directly labelling AMP in the different populations, it seems hard to conclude that AMP are expressed only by agranular cells.

      In oysters, two families of antimicrobial peptides/proteins appear to be transcriptionally regulated in hemocytes in response to an infection. The first is that of Cg-BigDefs (6). A 2020 article indicates that the expression of CgBigDef1 is regulated by CgRel, an ortholog of the NFkB transcription factor, which also control the expression of the proinflammatory cytokine CgIL17 (7). Cg-BPI is induced in response to infection but its regulatory pathways remain unknown (8). The last well characterized family of antimicrobial peptides is Cg-Defs. It exhibits constitutive expression in hemocytes.

      In our scRNA-seq analysis, CgRel (G12420) shows an increased expression in cluster 5, with a log2FC of 0.4 (equivalent to a 1.32-fold change or 32% higher expression compared to other clusters). Cluster 5 corresponds to blast-like cells, which are transcriptionally distinct and predominantly found in fractions 1, 2, and 3. These same fractions exhibit the highest CgBigDef expression, as demonstrated by qPCR.

      From our qPCR results, we see no expression of the three AMP families in cell-sorted granular cells while the cell-sorted agranular cells are positive for the three AMP families, even for inducible ones. Still, we agree that labelling of cell sorted hemocyte populations would reinforce our data. We now specify in the text that further staining would be necessary to confirm these transcriptomic results (Discussion, lines 695 to 296).

      (9) The authors should play down some statements concerning cluster identity. In the absence of a true lineage tracing approach, it is possible that those clusters represent states rather than true cell subtypes. Immune cells are very plastic in nature and able to adapt to the environment, even in conditions that are considered homeostatic.

      We appreciate the reviewer’s insightful comment regarding the plasticity of immune cells and the potential for clusters to represent states rather than distinct cell subtypes. We agree that, in the absence of a lineage tracing approach, definitive classification of clusters as fixed subtypes is challenging. Immune cells, including those in invertebrates, are known for their high degree of plasticity and adaptability to environmental cues.

      In response to the reviewer’s comment, we have revised the Discussion section to include a statement clarifying that these clusters may represent dynamic states rather than fixed subtypes, thereby acknowledging the plasticity of immune cells (lines 766 to 770).

      (10) Related to the above issue, there is no indication of stem cells being present in the cell population. Is there any possibility to look for proliferative or progenitor markers? In homeostatic and in challenged conditions (for example Zymosan treatment)? This would provide some hints into the cellular pathways involved in the response. Perhaps determining the number/fraction of phagocytic cells in challenged conditions would help as well, in the absence of time-lapse assays.

      Thank you for highlighting the possibility of stem cells or progenitor markers in our hemocyte populations. In our current analysis, we did not detect any known stem cell or proliferative markers, nor evidence of a clearly defined hematopoiesis site in the hemolymph. Indeed, previous work suggests that oyster hematopoiesis may occur in tissues such as the gills, implying that stem or progenitor cells might not circulate in the hemolymph under homeostatic conditions. Consequently, it is plausible that our observation of no proliferative cell populations partly reflects their absence in hemolymph, especially in naïve (unstimulated) oysters. To conclusively identify potential progenitor cells and their proliferative activity, further approaches involving deliberate perturbation of hemocyte homeostasis - such as immunological challenge (e.g., Zymosan treatment) combined with lineagetracing or proliferation assays - would be necessary. These future investigations would not only clarify whether proliferative cells emerge in the hemolymph in response to environmental or pathological stimuli but also help elucidate the broader cellular pathways underlying oyster immune responses.

      In response to the reviewer’s comment, we have revised the Discussion (lines 742 to 745) and added : “Nevertheless, we did not detect any canonical stem or progenitor cell populations in our dataset, underscoring the need for future investigations - potentially involving immunological challenges and lineage-tracing assays - to clarify whether proliferative cells circulate in the hemolymph or instead reside primarily in tissue compartments.”

      (11) Could the authors discuss the phagocytic hemocytes in light of scavenger receptor expression?

      We thank the reviewer for this insightful question. Our study identifies macrophage-like cells and small granule cells as the principal phagocytes in Crassostrea gigas, capable of robust pathogen engulfment. Transcriptomic data reveal that these cell types express markers associated with endocytosis and immune defense pathways, such as CLEC and LACC24, which are integral to their phagocytic functionality.

      Interestingly, our single-cell RNA sequencing analysis indicates that cluster 3, corresponding to small granule cells, expresses the scavenger receptor cysteine-rich (SRCR) gene G3876, annotated as an Low-density lipoprotein receptor-related protein with a Log2 fold change (Log2FC) of 0.77. This finding directly links small granule cells to scavenger receptor-mediated functions, supporting their role as professional phagocytes. Scavenger receptors, including SRCR proteins, are known for their ability to bind and internalize diverse ligands, including pathogens, and their presence in small granule cells highlights a potential mechanism for pathogen recognition and clearance.

      Additionally, scavenger receptors are significantly expanded in oysters, as shown in Wang et al. (9). These receptors exhibit dynamic upregulation in hemocytes upon pathogen exposure, particularly following stimulation with pathogen-associated molecular patterns (PAMPs) such as lipopolysaccharide (LPS). This evidence suggests that SRCR proteins, including the one identified in our study, play a pivotal role in the phagocytic activities of hemocytes by facilitating pathogen recognition and internalization.

      We propose to add this paragraph (lines 610 to 618) in the Discussion : “Interestingly, our scRNA-seq analysis indicates that SGC (cluster 3) expresses the scavenger receptor cysteine-rich (SRCR) gene G3876, annotated as an Low-density lipoprotein receptor-related protein with a Log2 fold change (Log2FC) of 0.77 linking them to scavenger receptor-mediated pathogen recognition and clearance. This aligns with findings by Wang et al. (9), who demonstrated significant expansion and dynamic regulation of SRCR genes in response to pathogen-associated molecular patterns. “

      (12) I am not convinced by the added value of the lineage analysis and the manuscript could stand without it. There is no experimental validation to substantiate the filiation between the clusters. In addition, rooting the lineage to cluster 4 is poorly justified (enrichment in the ribosomal transcript). Cluster 6 is also enriched in ribosomal transcripts and this enrichment can be caused by the low threshold used for the selection of cluster-specific genes (L2FC >0.25). At last, cluster 4 > VC and cluster 4 >SGC belong to the same lineage according to Figure 7 FH.

      We thank the reviewer for their detailed comments regarding the lineage analysis. We acknowledge the limitations in experimentally validating the proposed filiation between clusters, as hemocytes in Crassostrea gigas cannot currently be cultivated ex-vivo, and we lack the ability to isolate cells specifically from cluster 4 for further functional assays. Consequently, our lineage analysis is based solely on transcriptomic data and pseudo-time trajectory analysis.

      Hematopoietic stem cells (HSCs) are a population of stem cells that are largely cell-cycle-quiescent (G0 phase) with low biosynthetic activity. Upon stimulation and stress HScs undergo proliferation and differentiation and produce all lineages of hemocytes.

      Ribosomal proteins play a multifaceted role in preserving the balance between stem cell quiescence and activation. By ensuring precise regulation of protein synthesis, they allow stem cells to maintain their undifferentiated state while remaining poised for activation when needed. Furthermore, ribosomal proteins contribute to the cellular stress response, safeguarding stem cells from oxidative damage and other stressors that could compromise their functionality. Importantly, ribosomal biogenesis and the dynamic assembly of ribosomes provide a regulatory mechanism that fine-tunes the transition from self-renewal to differentiation, a critical feature of hematopoietic stem cells (HSCs) and other stem cell types. These mechanisms collectively highlight the indispensable role of ribosomal proteins in stem cell biology, underscoring their relevance to our study's findings.

      In vertebrate, the maintenance of hematopoietic stem cells (HSCs) and hematopoietic homeostasis is widely acknowledged to rely on the proper regulation of ribosome function and protein synthesis (10). This process necessitates the coordinated expression of numerous genes, including genes that encode ribosomal proteins (RP genes) and those involved in regulating ribosome biogenesis and protein translation. Disruptions or mutations in these critical genes are associated with the development of congenital disorders (11). Among these, Rpl22 (found in cluster 4 with a Log2FC of 1.59) has been shown to play a pivotal role in HSC maintenance by balancing ribosomal protein paralog activity, which is critical for the emergence and function of HSCs (12).

      Regarding the justification for rooting the lineage to cluster 4, our decision was informed by the enrichment of ribosomal transcripts and functional annotations suggesting a role in translation and cell proliferation, consistent with a precursor-like state. The use of a log2 fold-change (L2FC) threshold of >0.25, while conservative, allowed us to include subtle but meaningful transcriptional shifts essential for resolving lineage transitions.

      Finally, the lineage progression from cluster 4 to vesicular cells (VC), macrophage-like cells (ML), and ultimately small granule cells (SGC) is supported by trajectory analysis (Figure 7FH), which consistently places VC and ML as intermediates in the differentiation process toward SGC. Although experimental validation is currently not feasible, these findings provide a conceptual framework for future investigations when cell isolation and functional validation tools become available.

      (13) The figures containing heatmaps (Figure 7, Figure 2, Figure S10) or too many subpanels (Figure S5) and Table S5 are hardly readable.

      Thank you for highlighting the issues related to the clarity of the heatmaps (Figures 2, 7, and S10), the multi-panel figure (Figure S5), and Table S5. In response to your feedback, we have revised all of these elements to enhance readability and comprehension. Specifically, we increased font sizes, optimized color scales, and reorganized the layout of the subpanels to emphasize the key findings. We also updated Table S5 to ensure that the data are presented in a clear and easily interpretable format.

      We trust that these modifications address the concerns raised and improve the overall clarity of the figures and table.

      (14) A number of single-cell analyses are now available in different species and the authors allude to similar pathways/transcription factors being involved. Perhaps the authors could expand on this in the discussion section.

      Transcription factors involved in hematopoiesis, such as Tal1, Runx and GATA, are highly conserved across metazoans. Consistent with findings in other species, our dataset identifies these markers, reinforcing the evolutionary conservation of these pathways. Furthermore, these markers are also reported in the previous scRNA-seq dataset for C. hongkongensis (4), supporting the robustness of our molecular signatures. However, defining specific and robust markers for distinct hemocyte types remains an ambitious task, requiring additional validation in diverse biological and experimental contexts. This validation is beyond the scope of the present study.

      In addition, meaningful comparisons between scRNA-seq datasets are constrained by differences in annotation frameworks and the absence of standardized definitions for hemocyte subtypes. Harmonizing these datasets to enable robust cross-species comparisons is a critical challenge for future studies. Nonetheless, the insights provided by our dataset establish a strong foundation for such comparative analyses when these standardization efforts are realized.

      In crayfish (1), 16 transcriptomic clusters were identified corresponding to three hemocyte types, with markers such as integrin prominently expressed in hyalinocytes, consistent with our identification of integrin-related genes in hemocytes. In shrimp (1), 11 transcriptomic clusters were described, with markers of hemocytes in immune-activated states, that we observed also in our dataset. For Anopheles gambiae (2), 8 transcriptomic clusters were identified, including clusters with high ribosomal activity, analogous to those we described in our study. Finally, in Bombyx mori (3), 20 transcriptomic clusters were reported, corresponding to five cytological hemocyte types. Transcription factors such as bHLH, myc, and runt were identified in granulocytes and oenocytoid, showing parallels with markers identified in our dataset.

      Despite these similarities, cross-species comparisons are hindered by variability in genome availability and annotation quality, which complicates the precise identification and functional characterization of genes across datasets. Notably, we did not detect pro-phenoloxidase genes in our dataset, unlike shrimp and crayfish, suggesting potential species-specific differences in immune mechanisms.

      Regarding the previously published C. hongkongensis scRNA-seq dataset (4), we observe overlap in markers such as runx and GATA. However, direct comparisons remain limited due to differences in dataset annotations and definitions of hemocyte subtypes. This underscores the need for standardized frameworks to facilitate cross-study comparisons. While we emphasize that robust cross-species validation was beyond the scope of this study, our findings contribute valuable insights into the molecular signatures of oyster hemocytes and provide a framework for future comparative research.

      We have expanded our discussion to include comparisons with available scRNAseq data from other invertebrate specie (lines 747 to 760)

      Minor comments:

      (1) Figure 2A-D: to increase the readability of the figure, the authors should display only the GO terms mentioned in the text and keep the full list in supplementary data.

      To enhance the fluidity of the results section, we have redesigned the KEGG/RBGOA figure to present the results for each cluster in an integrated manner (See figure 2A and 2B).

      (2) Line 223: the authors mention that cluster 1 is characterized by its morphology without providing an explanation or evidence.

      We have revised the description of Cluster 1 to remove references to morphology, ensuring consistency with the data presented at this stage of the manuscript (lines 227 to 229) : ”Cluster 1, comprising 27.6 % of cells, is characterized by GO-terms related to myosin complex, lamellipodium, membrane and actin cytoskeleton remodelling, as well as phosphotransferase activity.”

      (3) Line 306: the authors mentioned expression levels and associated them with Log2FC, which represents an enrichment, not the level of expression.

      Thank you for pointing this out. We agree that log2FC represents enrichment rather than absolute expression levels. We have revised the text in the manuscript to clarify this distinction (line 309). The corrected text now states that log2FC reflects the degree of enrichment or depletion of a gene in a specific cluster relative to others, rather than its absolute expression level.

      (4) Figure 4B: the figure shows the distribution of all hemocytes subgroups for each fraction. To better appreciate the distribution of the subgroups in the different fractions, it would be good to have the number of cells of each subtype in the fractions.

      We thank the reviewer for their suggestion to include the number of cells of each subtype in the fractions. While we do not have the exact total number of cells per fraction, we systematically performed hemocyte counts for each fraction as part of our methodology. These counts provide a robust estimation of hemocyte distributions across fractions.

      Including these counts in the figure could be an alternative approach; however, we believe it would not significantly enhance the interpretability of the data, as the focus of this analysis is on the relative proportions of hemocyte subtypes rather than absolute numbers. The current representation provides a clear and concise overview of subtype distribution patterns, which aligns with the goals of the study.

      Nevertheless, if the reviewer considers it essential, we are open to integrating the hemocyte counts into the figure or supplementing the information in the text or supplementary materials to provide additional context.

      (5) Line 487-488: the authors mentioned that monocle 3 can deduce the differentiation pathway from the mRNA splice variant. I did not find this information in the publication associated with the statement.

      Thank you for pointing this out. We acknowledge the inaccuracy in our statement regarding Monocle3's capabilities. Monocle3 does not deduce differentiation pathways based on mRNA splice variants, as was erroneously suggested in the manuscript. Instead, Monocle3 performs trajectory inference using gene expression profiles. It calculates distances between cells based on their transcriptomic profiles, where cells with similar profiles are positioned closer together, and those with distinct profiles are farther apart. This method enables the construction of potential differentiation trajectories by identifying paths between transcriptionally related cells.

      We revise the text in the manuscript to accurately describe this process and remove the incorrect reference to mRNA splice variants (lines 495 to 497).

      (6) Figures 6C-H display heatmaps with two columns representing the beginning and the end of the lineage predicted. It would be more talkative to show the whole path presented in Figure S10.

      Thank you for pointing out that Figures 7C–H currently only show the beginning and end of the predicted lineage, limiting the clarity of the intermediate stages. In response to your suggestion, we have revised these figures to include the full trajectory as presented in Figure S10, ensuring that the intermediate transitions are more clearly visualized. We believe these modifications offer a more comprehensive overview of the entire lineage and enhance the interpretability of our results.

      Bibliography:

      (1) F. Xin, X. Zhang, Hallmarks of crustacean immune hemocytes at single-cell resolution. Front. Immunol. 14 (2023).

      (2) H. Kwon, M. Mohammed, O. Franzén, J. Ankarklev, R. C. Smith, Single-cell analysis of mosquito hemocytes identifies signatures of immune cell subtypes and cell differentiation. eLife 10, e66192 (2021).

      (3) M. Feng, L. Swevers, J. Sun, Hemocyte Clusters Defined by scRNA-Seq in Bombyx mori: In Silico Analysis of Predicted Marker Genes and Implications for Potential Functional Roles. Front. Immunol. 13 (2022).

      (4) J. Meng, G. Zhang, W.-X. Wang, Functional heterogeneity of immune defenses in molluscan oysters Crassostrea hongkongensis revealed by high-throughput single-cell transcriptome. Fish & Shellfish Immunology 120, 202–213 (2022).

      (5) C. Peñaloza, A. P. Gutierrez, L. Eöry, S. Wang, X. Guo, A. L. Archibald, T. P. Bean, R. D. Houston, A chromosome-level genome assembly for the Pacific oyster Crassostrea gigas. GigaScience 10, giab020 (2021).

      (6) R. D. Rosa, A. Santini, J. Fievet, P. Bulet, D. Destoumieux-Garzón, E. Bachère, Big Defensins, a Diverse Family of Antimicrobial Peptides That Follows Different Patterns of Expression in Hemocytes of the Oyster Crassostrea gigas. PLOS ONE 6, e25594 (2011).

      (7) Y. Li, J. Sun, Y. Zhang, M. Wang, L. Wang, L. Song, CgRel involved in antibacterial immunity by regulating the production of CgIL17s and CgBigDef1 in the Pacific oyster Crassostrea gigas. Fish & Shellfish Immunology 97, 474–482 (2020).

      (8) Evidence of a bactericidal permeability increasing protein in an invertebrate, the Crassostrea gigas Cg-BPI | PNAS. https://www.pnas.org/doi/abs/10.1073/pnas.0702281104.

      (9) L. Wang, H. Zhang, M. Wang, Z. Zhou, W. Wang, R. Liu, M. Huang, C. Yang, L. Qiu, L. Song, The transcriptomic expression of pattern recognition receptors: Insight into molecular recognition of various invading pathogens in Oyster Crassostrea gigas. Developmental & Comparative Immunology 91, 1–7 (2019).

      (10) R. A. J. Signer, J. A. Magee, A. Salic, S. J. Morrison, Haematopoietic stem cells require a highly regulated protein synthesis rate. Nature 509, 49–54 (2014).

      (11) A. Narla, B. L. Ebert, Ribosomopathies: human disorders of ribosome dysfunction. Blood 115, 3196–3205 (2010).

      (12) Y. Zhang, A.-C. E. Duc, S. Rao, X.-L. Sun, A. N. Bilbee, M. Rhodes, Q. Li, D. J. Kappes, J. Rhodes, D. L. Wiest, Control of Hematopoietic Stem Cell Emergence by Antagonistic Functions of Ribosomal Protein Paralogs. Developmental Cell 24, 411–425 (2013).

      Reviewer #2 (Public review):

      Summary:

      This work provides a comprehensive understanding of cellular immunity in bivalves. To precisely describe the hemocytes of the oyster C. gigas, the authors morphologically characterized seven distinct cell groups, which they then correlated with single-cell RNA sequencing analysis, also resulting in seven transcriptional profiles. They employed multiple strategies to establish relationships between each morphotype and the scRNAseq profile. The authors correlated the presence of marker genes from each cluster identified in scRNAseq with hemolymph fractions enriched for different hemocyte morphotypes. This approach allowed them to correlate three of the seven cell types, namely hyalinocytes (H), small granule cells (SGC), and vesicular cells (VC). A macrophage-like (ML) cell type was correlated through the expression of macrophage-specific genes and its capacity to produce reactive oxygen species. Three other cell types correspond to blast-like cells, including an immature blast cell type from which distinct hematopoietic lineages originate to give rise to H, SGC, VC, and ML cells. Additionally, ML cells and SGCs demonstrated phagocytic properties, with SGCs also involved in metal homeostasis. On the other hand, H cells, nongranular cells, and blast cells expressed antimicrobial peptides. This study thus provides a complete landscape of oyster hemocytes with functional validation linked to immune activities. This resource will be valuable for studying the impact of bacterial or viral infections in oysters.

      Strengths:

      The main strength of this study lies in its comprehensive and integrative approach, combining single-cell RNA sequencing, cytological analysis, cell fractionation, and functional assays to provide a robust characterization of hemocyte populations in Crassostrea gigas.

      (1) The innovative use of marker genes, quantifying their expression within specific cell fractions, allows for precise annotation of different cellular clusters, bridging the gap between morphological observations and transcriptional profiles.

      (2) The study provides detailed insights into the immune functions of different hemocyte types, including the identification of professional phagocytes, ROS-producing cells, and cells expressing antimicrobial peptides.

      (3) The identification and analysis of transcription factors specific to different hemocyte types and lineages offer crucial insights into cell fate determination and differentiation processes in oyster immune cells.

      (4) The authors significantly advance the understanding of oyster immune cell diversity by identifying and characterizing seven distinct hemocyte transcriptomic clusters and morphotypes.

      These strengths collectively make this study a significant contribution to the field of invertebrate immunology, providing a comprehensive framework for understanding oyster hemocyte diversity and function.

      Weaknesses:

      (1) The authors performed scRNAseq/lineage analysis and cytological analysis on oysters from two different sources. The methodology of the study raises concerns about the consistency of the sample and the variability of the results. The specific post-processing of hemocytes for scRNAseq, such as cell filtering, might also affect cell populations or gene expression profiles. It's unclear if the seven hemocyte types and their proportions were consistent across both samples. This inconsistency may affect the correlation between morphological and transcriptomic data.

      We thank the reviewer for highlighting the importance of sample consistency and potential variability, and we acknowledge the need for clarification regarding the use of oysters from two different sources.

      Oysters from La Tremblade (known pathogen-free in standardized conditions) were used to establish the hemocyte transcriptomic atlas through scRNA-seq and for cytological analyses. Oysters from the Thau Lagoon (Bouzigues) were used for cytological, functional, and fractionation experiments. These oysters were sampled during non-epidemic periods and monitored under Ifremer’s microbiological surveillance to ensure pathogen free status.

      The cytological results (hemocytograms) presented in Figure 3 and Supplementary Figure S3 were derived from Thau Lagoon oysters. To clarify, we updated The Table 3 in Figure 3 and Supplementary Figure S3 to explicitly display hemocyte counts for oysters from both La Tremblade and Thau Lagoon. These data confirm consistent proportions of hemocyte types across both sources, with no significant differences (p > 0.05).

      Hemocyte isolation and filtering protocols were rigorously optimized to preserve cell viability and morphology during scRNA-seq library preparation. Viability assays and cytological evaluations confirmed that these procedures did not significantly alter hemocyte populations or their proportions. Sample processing times were minimized to ensure that the scRNA-seq results accurately reflect the native state of the hemolymph.

      Taken together, our results confirm that variability between oyster sources or methodological processes did not compromise our findings. This ensures that the correlations between morphological and transcriptomic data are reliable and robust.

      (2) The authors claim to use pathogen-free adult oysters (lines 95 and 119), but no supporting data is provided. It's unclear if the oysters were tested for bacterial and viral contaminations, particularly Vibrio and OsHV-1 μVar herpesvirus.

      The oysters used in this study were sourced from two distinct origins. First, the animals (18 months old) utilized for scRNA-seq and cytological analyses were obtained from the Ifremer controlled farm located in La Tremblade, France (GPS coordinates: 45.7981624714465, -1.150171788447683). This facility exclusively produces standardized oysters bred in controlled conditions with filtered seawater, entirely isolated from environmental known pathogens. The oysters from this source are certified “pathogen-free” upon arrival at the laboratory, following Ifremer's stringent quality control protocols. We have replaced the term 'pathogen-free' with 'known pathogen-free’ (line 123) to accurately reflect the animals' true status.

      Second, for the fractionation experiments and functional tests, oysters were either sourced from the aforementioned Ifremer farm or from a producer located in the Thau Lagoon, France (GPS coordinates: 43.44265228308842, 3.6359883059292057). The Thau Lagoon is subject to comprehensive environmental and microbiological surveillance by the Ifremer monitoring network and the regional veterinary laboratory. For these experiments, we specifically selected oysters aged 18 months - an age associated with reduced susceptibility to OsHV-1 μVar herpesvirus - and ensured that sampling occurred outside of any detected epidemic periods. Furthermore, prior to experimentation, hemocyte samples from all oysters were examined. Oysters showing signs of contamination or exhibiting abnormal hemocyte profiles were excluded from the study.

      These measures ensured that the oysters used in this work were of high health status and minimized the likelihood of bacterial or viral contamination, including Vibrio and OsHV-1 μVar.

      (3) The KEGG and Gene Ontology analyses, while informative, are very descriptive and lack interpretation. The use of heatmaps with dendrograms for grouping cell clusters and GO terms is not discussed in the results, missing an opportunity to explore cell-type relationships. The changing order of cell clusters across panels B, C, and D in Figure 2 makes it challenging to correlate with panel A and to compare across different GO term categories. The dendrograms suggest proximity between certain clusters (e.g., 4 and 1) across different GO term types, implying similarity in cell processes, but this is not discussed. Grouping GO terms as in Figure 2A, rather than by dendrogram, might provide a clearer visualization of main pathways. Lastly, a more integrated discussion linking GO term and KEGG pathway analyses could offer a more comprehensive view of cell type characteristics. The presentation of scRNAseq results lacks depth in interpretation, particularly regarding the potential roles of different cell types based on their transcriptional profiles and marker genes. Additionally, some figures (2B, C, D, and 7C to H) suffer from information overload and small size, further hampering readability and interpretation.

      Thank you for your valuable suggestions regarding the presentation and interpretation of our KEGG and Gene Ontology (GO) analyses. In response, we revised Figure 2 to enhance clarity and provide deeper insights into cell-type relationships and biological processes.

      The revised figure 2 reorganizes GO term analysis into a more intuitive layout, grouping related biological processes and pathways in a structured manner. This approach replaces the dendrogram organization and provides a clearer visualization of key pathways for each cell cluster.

      (4) The pseudotime analysis presented in the study provides modest additional information to what is already manifest from the clustering and UMAP visualization. The central and intermediate transcriptomic profile of cluster 4 relative to other clusters is apparent from the UMAP and the expression of shared marker genes across clusters (as shown in Figure 1D). The statement by the authors that 'the two types of professional phagocytes belong to the same granular cell lineage' (lines 594-596) should be formulated with more caution. While the pseudotime trajectory links macrophage-like (ML) and small granule-like (SGC) cells, this doesn't definitively establish a direct lineage relationship. Such trajectories can result from similarities in gene expression induced by factors other than lineage relationships, such as responses to environmental stimuli or cell cycle states. To conclusively establish this lineage relationship, additional experiments like cell lineage tracing would be necessary, if such tools are available for C. gigas.

      We appreciate the reviewer’s detailed feedback on the pseudotime analysis and its interpretation. While we acknowledge that the clustering and UMAP visualization provide valuable insights, the pseudotime analysis offers a complementary approach by highlighting significantly expressed genes, including key transcription factors, that might otherwise be overlooked in differential expression analysis based solely on Log2FC between clusters. In our study, the pseudotime analysis revealed transcription factors known to play crucial roles in hemocyte differentiation, providing additional depth to our understanding of hemocyte lineage relationships and functional specialization.

      Regarding the statement on lines 594 - 596, we agree that the evidence provided by pseudotime trajectories does not definitively establish a direct lineage relationship between macrophage-like (ML) and small granule-like (SGC) cells. Instead, these trajectories suggest potential developmental connections that warrant further investigation. We propose the following revised sentence (lines 616 to 618) :

      "The pseudotime trajectory linking macrophage-like (ML) and small granule-like (SGC) cells suggests a potential developmental relationship within the granular cell lineage; however, this hypothesis requires further validation."

      We also concur with the reviewer that additional experiments, such as cell lineage tracing, would be necessary to definitively establish this relationship. Unfortunately, the long-term cultivation of hemocytes in C. gigas is currently not feasible. However, we are planning to develop FACS-based approaches to separate the seven hemocyte subtypes, which will allow us to refine their ontology and explore their potential lineage relationships more precisely.

      (6) Given the mention of herpesvirus as a major oyster pathogen, the lack of discussion on genes associated with antiviral immunity is a notable omission. While KEGG pathway analysis associated herpesvirus with cluster 1, the specific genes involved are not elaborated upon.

      Thank you for your valuable observation regarding the lack of discussion on genes associated with antiviral immunity, particularly in the context of herpes virus infection. The KEGG pathway analysis indeed identified a weak signature associated with herpesvirus in Cluster 1, primarily involving genes encoding beta integrins. In humans, beta integrins have been described as receptors facilitating herpesvirus entry (1). However, in the case of naive oysters used in this study, the KEGG signature was subtle, likely reflecting the absence of active viral infection. Additionally, beta integrins are multifunctional molecules that also play critical roles in processes such as cell adhesion, a function attributed to hyalinocytes, as highlighted in our results.

      Given the naive status of the oysters and the weak antiviral signature observed, we chose not to discuss these findings in detail in this study. However, ongoing work in our laboratory aims to further investigate the specific hemocyte populations targeted by OsHV-1, which may shed light on the role of integrins in antiviral immunity in oysters.

      We hope this clarifies our approach and the context of the KEGG findings. Thank you for bringing this important perspective to our attention.

      (7) The discussion misses an opportunity for comparative analysis with related species. Specifically, a comparison of gene markers and cell populations with Crassostrea hongkongensis, could highlight similarities and differences across systems.

      In response to the reviewer’s comment, we have added a comparative analysis between C. hongkongensis and C. gigas hemocyte populations, situating our findings within the broader context of invertebrate immune cell diversity and specialization (lines 747 to 760)

      Reviewer #2 (Recommendations for the authors):

      (1) Lines 92-93: The authors should add references associated with transcriptomic studies of C. gigas hemocytes.

      Thank you for pointing this out. In the revised manuscript, we have added references to previous transcriptomic studies of C. gigas hemocytes (line 83).

      (2) Line 121 and 127: The authors should clarify whether 3,000 represents the number of cells loaded or their target for analysis.

      The number of cells processed was optimized to minimize the occurrence of doublets during scRNAseq. Following 10x Genomics Chromium guidelines, we loaded 4,950 cells to successfully recover a target of 3,000 cells, with a doublet rate of 2.4%, well below the target threshold of 2.5%. This information has been added on line 125 of the document. The target was 3,000 cells, and as reported in Supplementary Table S1, the estimated number of cells after STAR-solo alignment was 2,937. This ensures the reliability and accuracy of single-cell transcriptomic data.

      (3) Line 129: "Supp. Table 1" in the text and "Supp. Table S1" in the figure title should be edited.

      The inconsistency between "Supp. Table 1" in the text and "Supp. Table S1" in the figure title has been corrected for uniformity throughout the manuscript (line 134).

      (4) Line 138-139: The authors should clarify that the heatmap displays the top 10 positively enriched marker genes for each cluster, as identified by Seurat's differential expression analysis. It is important to note that the analysis does not explicitly show under-represented transcripts, but rather highlights the contrast between cluster-specific overexpressed genes and their lower expression in other clusters.

      We have clarified that the heatmap displays the top 10 positively enriched marker genes for each cluster, as identified by Seurat's differential expression analysis, and that the analysis highlights cluster-specific overexpressed genes rather than explicitly showing under-represented transcripts (lines 143 - 145).

      (5) Figure 1: The authors should consider improving or potentially removing Figure 1C. The gene IDs are not readable due to their small size, which significantly reduces the informative value of the figure. In addition, the data presented in this heatmap is largely redundant with the more informative and readable dot plot in Figure 1D, which shows both expression levels and the percentage of cells expressing each gene.

      Thank you for your suggestion regarding Figure 1C. In the revised manuscript, we have removed the original panel C from the main figure and transferred it to Supplementary Figure S1K, which improves readability while retaining the relevant data. We have also renumbered the remaining panels for clarity, with the former panel D now designated as panel C. We believe these adjustments address the reviewer’s concerns and streamline the presentation of the data.

      (6) Table 1: The authors should clarify in the legend the statistical significance criteria (adjusted p-value) for the genes listed.

      As requested, we have added the adjusted p-value threshold (adj. p-value < 0.05) to the legend of Table 1.

      (7) Line 188: The authors should align the text description of the KEGG pathways in cluster 7 with Figure 2A, describing Wnt signaling pathway and clarifying the terminology "endosome pathway" to ensure consistency.

      In the revised text, we have aligned our description with Figure 2A by explicitly mentioning the Wnt signaling pathway in cluster 7 (lines 193 to 194).

      The endo-lysosomal pathway encompasses a series of membrane-bound compartments and trafficking events responsible for the uptake of macromolecules from the extracellular environment, their subsequent sorting in endosomes, and eventual degradation in lysosomes. This pathway is tightly regulated, ensuring not only the breakdown of macromolecules but also the recycling of membrane components and signaling receptors essential for maintaining cellular homeostasis (2). In our study, the KEGG signatures of cluster 7 highlight the involvement of the endo-lysosomal pathway.

      (8) Line 223: The authors should revise the description of cluster 1, avoiding references to morphology at this point in the manuscript, as no morphological data has been presented yet.

      We have revised the description of Cluster 1 to remove references to morphology, ensuring consistency with the data presented at this stage of the manuscript (lines 227 to 229) : ”Cluster 1, comprising 27.6 % of cells, is characterized by GO-terms related to myosin complex, lamellipodium, membrane and actin cytoskeleton remodelling, as well as phosphotransferase activity.”

      (9) Figure 2: The authors should revise Figure 2 to improve the clarity. For Figure 2A, they should address the redundancy in the "Global and overview maps" category by removing overlapping pathways such as carbon metabolism and biosynthesis of amino acids, which are likely represented in more specific metabolic categories (glycolysis, pentose). They could consider grouping similar pathways together, such as combining "Amino acid metabolism" with "Metabolism of other amino acids," and separating metabolic pathways from cellular processes for easier interpretation. They should also address the surprising absence of certain expected pathways like lipid metabolism, nucleotide metabolism, and cofactor/vitamin metabolism, as well as cellular processes such as cell growth and chromatin modeling. Even if these pathways are not enriched in specific clusters, mentioning their absence could provide valuable context for the reader.

      In the revised version of the manuscript, we propose a new representation to facilitate understanding and improve the clarity of the data presentation.

      (10) For Figures 2B, C, and D, the authors should significantly increase the font size of text and numbers, ensuring readability at 100% scale in PDF format. They could also add labels directly on each graph to clearly indicate the type of GO terms represented, (Biological Process, Cellular Component, or Molecular Function).

      In the revised version of the manuscript, we propose a new representation to facilitate understanding and improve the clarity of the data presentation.

      (11) Line 247-250: The authors should revise their description of cell types to follow the same order as presented in Figure 3A.

      We have revised the description of cell types in the manuscript to follow the same order as presented in Figure 3A, as requested.

      (12) Line 265-266: The authors should develop the significance of the nucleo-cytoplasmic ratio in hemocyte morphology and identification.

      We thank the editor for bringing this to our attention and apologize for the discrepancy between the terminology used in the text and the results presented in Figure 3. The text refers to the nuclear-tocytoplasmic ratio (N/C), while the figure mistakenly displays the inverse ratio, cytoplasmic-to-nuclear ratio (C/N). We recognize that this inversion may cause confusion and will ensure consistency between the text and the figure.

      To address this, we propose correcting the figure legend and labels in Figure 3 to align with the terminology used in the text (N/C ratio). This will prevent confusion and maintain clarity throughout the manuscript.

      The nuclear-to-cytoplasmic (N:C) ratio, also known as the nucleus:cytoplasm ratio or N/C ratio, is a well-established measurement in cell biology that reflects the relative size of the nucleus to the cytoplasm. This ratio is frequently used as a morphologic feature in the diagnosis of atypia and malignancy in human cells, underscoring its diagnostic value. In the context of our study, we use the N:C ratio to provide a more precise and quantitative description of hemocyte types in Crassostrea gigas. Specifically, the N:C ratio allows us to distinguish between different hemocyte morphotypes, such as blasts and granular cells, and to enrich the characterization of their functional specialization. This quantitative measure supports the morphological classification and enhances the reproducibility and clarity of hemocyte identification.

      (13) Line 286-294: The authors should review and correct the legend for Figure 3. It seems that the description of results related to Figure 3C has been mistakenly inserted into the legend.

      We thank the reviewer for pointing out this issue with the legend of Figure 3. The description of results related to Figure 3C has now been removed from the legend. The revised legend focuses solely on the figure elements, improving clarity and consistency. We believe this adjustment addresses the reviewer's comment effectively.

      (14) Figure 3: The authors should revise the legend for Figure 3A to provide more detailed and explicit descriptions of the "Size, shape and particularities" of the ML, SGC, BGC, and VC hemocyte types.

      We thank the reviewer for their insightful suggestion to provide more explicit descriptions in the legend for Figure 3A. We have revised the legend to include detailed explanations of the "Size, shape, and particularities" for the ML, SGC, BGC, and VC hemocyte types. Specifically, we have clarified that size refers to the average granule diameter, shape describes the morphology of the granules (e.g., spherical or elongated), and particularities highlight distinguishing features such as granule color or fluorescence properties observed under specific staining or imaging conditions. We believe this updated legend provides the level of detail requested and enhances the clarity of the figure (lines 294 - 297).

      (15) Figure 4: The authors should clarify the method used for calculating relative gene expression in Figure 4A and Figure 6. They should explicitly state in the figure legend that the expression was normalized to the Cg-rps6 reference gene, as mentioned in line 835. The authors should also provide details on the calculation method used (e.g., 2-ΔCt method) and confirm whether the reference gene was expressed at similar levels across all clusters.

      We thank the reviewer for pointing out the need for additional clarity regarding the calculation of relative gene expression in Figures 4A and 6. To address this, we have revised the legends for both figures to explicitly state that gene expression levels were normalized to the reference gene Cg-rps6 and calculated using the 2^-ΔCt method. We have also confirmed that Cg-rps6 was stably expressed across all hemocyte clusters and explicitly mentioned this in the revised legends. These changes ensure greater transparency and address the reviewer’s concerns (lines 342 to 346).

      (16) The authors could consider removing or modifying Figure 4B, as it appears to be redundant with Figure 3C. Both figures show the average percentage of each hemocyte type in the seven Percoll gradient fractions.

      We thank the reviewer for highlighting potential redundancy between Figures 3C and 4B. While both figures present the distribution of hemocyte types across Percoll gradient fractions, Figure 4B serves a distinct and critical purpose in the manuscript. Specifically, it provides the numerical data necessary to understand the correlations shown in Figure 4A, where we analyze the relationship between gene expression levels and the distribution of hemocyte types. These detailed percentages are essential for interpreting the statistical robustness and biological relevance of the correlation matrix, which could not be derived solely from the qualitative visualization in Figure 3C.

      (17) Figure 5: The authors should address the redundancy between Figure S7B and Figure 5B, as they appear to present the same data. In Figure S7B, "SGC" is incorrectly abbreviated as "G".

      In the revised version of the manuscript, we addressed the redundancy between the two figures and we corrected the incorrectly abbreviated SGC.

      (18) Line 412: The authors should correct the typographical error, changing "Pecoll" to "Percoll".

      In the revised version of the manuscript, we correct this typographical error (line 417).

      (19) Line 417: The statement about the inhibitor apocynin likely refers to Figure 5D, not Figure 5C.

      In the revised version of the manuscript, we have corrected this reference error to accurately refer to Figure 5D (line 422).

      (20) Line 441-444: The authors should provide references to support their annotation of cluster 1 as macrophage-like cells based on macrophage-specific genes. These references should cite established literature on known macrophage gene markers, particularly in bivalves or related species if available. They need to clarify whether specific gene markers exist for each of the hemocyte morphotypes they have identified. If such markers are known from previous studies, they should be mentioned and referenced.

      We propose to modify lines 446 to 449 to address the reviewer's concerns. Cluster 1, which we have termed "macrophage-like" due to its pronounced phagocytic activity and reactive oxygen species (ROS) production, is enriched in Angiopoietin-1 receptor expression (Table 1). Angiopoietin receptors belong to the Tie receptor family, which is expressed in a subset of macrophages known as Tie2-expressing monocytes (TEMs) in humans (35). While our analysis reveals a strong overexpression of the Angiopoietin-1 receptor, we acknowledge that this receptor is not an exclusive marker for macrophages.

      In bivalves, including oysters, no definitive molecular markers have been established for macrophagelike cells as they are defined functionally in this study. Consequently, the identification of such cells relies on their functional characteristics rather than strict marker expression. To clarify, we propose the following revision to the sentence:

      Furthermore, this cluster expresses macrophage-related genes, including the macrophage-expressed gene 1 protein (G30226) (Supp. Data S1), along with maturation factors for dual oxidase, an enzyme involved in peroxide formation (Supp. Fig. S8), supporting its designation as macrophage-like based on functional characteristics.

      (21) Figure 7: For Figures 7C to 7H, the authors should increase the font size of gene names and descriptions to ensure legibility in both printed versions and digital formats. To simplify these figures, the authors could consider displaying less differentially expressed genes for each lineage, along with the top genes for each differentiation pathway. If detailed gene information is crucial, they could move the full list to a supplementary table and reference it in the figure legend. Regarding Figure 7I, the authors should reorder the transcription factor genes by cluster and specificity to improve visualization and interpretation, like in Figure 1D.

      Thank you for these valuable suggestions regarding Figure 7. We have revised Figures 7C–H to ensure improved readability. Furthermore, we have simplified these panels by highlighting fewer differentially expressed genes for each lineage. In Figure 7I, we have reordered the transcription factor genes by cluster and specificity, following a layout similar to Figure 1D, to facilitate clearer visualization and interpretation of the data.

      (22) Line 490: The authors should provide more precise references to the specific GO terms and figure panels they are discussing.

      To address this comment, we have revised the sentence and provided additional information in the text to clearly indicate where the corresponding figure panels can be found in the manuscript (line 499)

      (23) Line 510: The authors state that "5 cell lineages could be defined," but the subsequent text and Figure 7C to H actually present 6 distinct lineages.

      We have corrected in the manuscript. 6 lineages could be defined (line 521).

      (24) Line 534: The authors should consider further investigating the pluripotent potential of cluster 4 cells by exploring known or potential stem cell markers in their scRNAseq data.

      Thank you for highlighting the possibility of pluripotent potential of cluster 4. In our current analysis, we did not detect any known stem cell or proliferative markers, nor evidence of a clearly defined hematopoiesis site in the hemolymph. Indeed, previous work suggests that oyster hematopoiesis may occur in tissues such as the gills, implying that stem or progenitor cells might not circulate in the hemolymph under homeostatic conditions. Consequently, it is plausible that our observation of no proliferative cell populations partly reflects their absence in hemolymph, especially in naïve (unstimulated) oysters. To conclusively identify potential progenitor cells and their proliferative activity, further approaches involving deliberate perturbation of hemocyte homeostasis - such as immunological challenge (e.g., Zymosan treatment) combined with lineage-tracing or proliferation assays - would be necessary. These future investigations would not only clarify whether proliferative cells emerge in the hemolymph in response to environmental or pathological stimuli but also help elucidate the broader cellular pathways underlying oyster immune responses.

      In response to the reviewer’s comment, we have revised the Discussion (lines 695 to 696) and added : “Nevertheless, we did not detect any canonical stem or progenitor cell populations in our dataset, underscoring the need for future investigations - potentially involving immunological challenges and lineage-tracing assays - to clarify whether proliferative cells circulate in the hemolymph or instead reside primarily in tissue compartments.”

      (25) Figure S10: The authors should significantly improve the readability of Figure S10 by increasing the font size. Currently, the small font size makes it impossible for readers to discern the information presented.

      Thank you for highlighting the readability concerns regarding Figure S10. In response to your comment, we have increased the overall size and font of the figure, ensuring that all labels and legends are clearly legible in both printed and digital formats. We believe these adjustments will allow readers to more easily interpret the information presented.

      (26) Line 896: The authors should correct the typographical error on line 896 by deleting the additional bracket.

      In the revised version of the manuscript, we correct this typographical error.

      (27) Figure S12: The authors should address the absence of any reference to Figure S12 in the main text of the manuscript.

      The reference to Supp. Figure S12 has been corrected. It was a referencing error between Supp. Figure S11(in the discussion, line 670) and Supp. Figure S12.

      Bibliography:

      (1) G. Campadelli-Fiume, D. Collins-McMillen, T. Gianni, A. D. Yurochko, Integrins as Herpesvirus Receptors and Mediators of the Host Signalosome. Annual Review of Virology 3, 215–236 (2016).

      (2) J. P. Luzio, P. R. Pryor, N. A. Bright, Lysosomes: fusion and function. Nat Rev Mol Cell Biol 8, 622–632 (2007).

      (3) A. S. Harney, E. N. Arwert, D. Entenberg, Y. Wang, P. Guo, B.-Z. Qian, M. H. Oktay, J. W. Pollard, J. G. Jones, J. S. Condeelis, Real-Time Imaging Reveals Local, Transient Vascular Permeability, and Tumor Cell Intravasation Stimulated by TIE2hi Macrophage-Derived VEGFA. Cancer Discov 5, 932–943 (2015).

      (4) M. De Palma, R. Mazzieri, L. S. Politi, F. Pucci, E. Zonari, G. Sitia, S. Mazzoleni, D. Moi, M. A. Venneri, S. Indraccolo, A. Falini, L. G. Guidotti, R. Galli, L. Naldini, Tumor-targeted interferon-alpha delivery by Tie2-expressing monocytes inhibits tumor growth and metastasis. Cancer Cell 14, 299–311 (2008).

      (5) M. De Palma, M. A. Venneri, R. Galli, L. Sergi Sergi, L. S. Politi, M. Sampaolesi, L. Naldini, Tie2 identifies a hematopoietic lineage of proangiogenic monocytes required for tumor vessel formation and a mesenchymal population of pericyte progenitors. Cancer Cell 8, 211–226 (2005).

      Reviewer #3 (Public review):

      The paper addresses pivotal questions concerning the multifaceted functions of oyster hemocytes by integrating single-cell RNA sequencing (scRNA-seq) data with analyses of cell morphology, transcriptional profiles, and immune functions. In addition to investigating granulocyte cells, the study delves into the potential roles of blast and hyalinocyte cells. A key discovery highlighted in this research is the identification of cell types engaged in antimicrobial activities, encompassing processes such as phagocytosis, intracellular copper accumulation, oxidative bursts, and antimicrobial peptide synthesis.

      A particularly intriguing aspect of the study lies in the exploration of hemocyte lineages, warranting further investigation, such as employing scRNA-seq on embryos at various developmental stages.

      In the opinion of this reviewer, the discussion should compare and contrast the transcriptome characteristics of hemocytes, particularly granule cells, across the three species of bivalves, aligning with the published scRNA-seq studies in this field to elucidate the uniformities and variances in bivalve hemocytes.

      Reviewer #3 (Recommendations for the authors):

      Minor Concerns:

      (1) In the context of C. gigas, the notable expansion of stress and immune-related genes in its genome stands out. It is anticipated that the article will discuss the expression patterns of classical immune-related genes like TLR and RLR across different cell clusters.

      We appreciate the reviewer's interest in the expression patterns of classical immune-related genes, such as Toll-like receptors (TLRs) and RIG-I-like receptors (RLRs), across different cell clusters in Crassostrea gigas. In our single-cell RNA sequencing (scRNA-seq) analysis, we did not detect significant expression of TLR or RLR genes. This absence can be attributed to several factors. First, technical limitations of scRNA-seq: The droplet-based scRNA-seq technology employed in our study captures only a fraction of the transcripts present in each cell approximately 10–20% (https://kb.10xgenomics.com/hc/en-us/articles/360001539051-What-fraction-of-mRNA-transcriptsare-captured-per-cell). This inherent limitation often results in the underrepresentation of genes with low expression levels. Consequently, TLRs and RLRs, which may be expressed at low levels in certain hemocytes, could be undetected due to this capture inefficiency. TLRs are typically expressed at low basal levels under resting conditions and are upregulated in response to specific stimuli or pathogenic challenges (1, 2). Given that our study analyzed hemocytes in their basal state, the expression levels of these receptors may have been below the detection threshold of the scRNA-seq platform. Furthermore, as highlighted by De Lorgeril et al. (3) the expression of these immune receptors varies depending on the resistance of the oyster. This variability further underscores the dynamic and context-dependent nature of TLR and RLR expression

      To comprehensively assess the expression patterns of TLRs and RLRs across different hemocyte clusters, future studies could incorporate targeted enrichment strategies, such as bulk RNA-seq or single-cell technologies with higher capture efficiencies. Additionally, analyzing hemocytes under stimulated conditions or comparing oysters with varying levels of resistance could provide insights into the inducible and context-specific expression of these immune receptors.

      (2) Clarification is needed in lines 265-266 regarding the nucleo-cytoplasmic ratio (N/C) terminology to prevent confusion, considering the discrepancy with the results presented in Figure 3.

      We thank the editor for bringing this to our attention and apologize for the discrepancy between the terminology used in the text and the results presented in Figure 3. The text refers to the nuclear-tocytoplasmic ratio (N/C), while the figure mistakenly displays the inverse ratio, cytoplasmic-to-nuclear ratio (C/N). We recognize that this inversion may cause confusion and will ensure consistency between the text and the figure.

      To address this, we propose correcting the figure legend and labels in Figure 3 to align with the terminology used in the text (N/C ratio). This will prevent confusion and maintain clarity throughout the manuscript.

      (3) The selection of cluster 4 as the root for pseudotime analysis based on high ribosomal protein expression raises questions. It would be beneficial to elaborate on the inclusion of other genes, such as cell cycle or mitotic-related genes, to validate the pseudotime analysis outcomes.

      We appreciate the reviewer’s insightful comment on the significance of ribosomal proteins in stem cell maintenance.

      Hematopoietic stem cells (HSCs) are a population of stem cells that are largely cell-cycle-quiescent (G0 phase) with low biosynthetic activity. Upon stimulation and stress HScs undergo proliferation and differentiation and produce all lineages of hemocytes.

      Ribosomal proteins play a multifaceted role in preserving the balance between stem cell quiescence and activation. By ensuring precise regulation of protein synthesis, they allow stem cells to maintain their undifferentiated state while remaining poised for activation when needed. Furthermore, ribosomal proteins contribute to the cellular stress response, safeguarding stem cells from oxidative damage and other stressors that could compromise their functionality. Importantly, ribosomal biogenesis and the dynamic assembly of ribosomes provide a regulatory mechanism that fine-tunes the transition from self-renewal to differentiation, a critical feature of hematopoietic stem cells (HSCs) and other stem cell types. These mechanisms collectively highlight the indispensable role of ribosomal proteins in stem cell biology, underscoring their relevance to our study's findings.

      In vertebrate, the maintenance of hematopoietic stem cells (HSCs) and hematopoietic homeostasis is widely acknowledged to rely on the proper regulation of ribosome function and protein synthesis (4). This process necessitates the coordinated expression of numerous genes, including genes that encode ribosomal proteins (RP genes) and those involved in regulating ribosome biogenesis and protein translation. Disruptions or mutations in these critical genes are associated with the development of congenital disorders (5). Among these, Rpl22 (found in cluster 4 with a Log2FC of 1.59) has been shown to play a pivotal role in HSC maintenance by balancing ribosomal protein paralog activity, which is critical for the emergence and function of HSCs (6).

      (4) What is the resolution of the cell clustering employed in the study? Given that cluster 1 potentially encompasses two distinct cell types, Macrophage-Like and Big Granule cells, further sub-clustering efforts and correlation analyses between cluster markers and cell morphologies could aid in their differentiation.

      Thank you for your inquiry regarding the resolution of our cell clustering. As described in the Materials and Methods section, we used the Seurat FindClusters function with a resolution parameter of r = 0.1 for the scRNA-seq dataset. We performed sub-clustering within Cluster 1, resulting in four distinct subclusters. However, despite analyzing various specific markers, we did not identify any marker uniquely associated with the Big Granule Cell (BGC) morphology. Notably, LACC24 specifically marks a subset of cells within Cluster 1, as shown in Supplementary Figure S8, although this gene alone was insufficient to definitively distinguish a distinct BGC population.

      (5) Line 78's statement regarding the primary identification of three hemocyte cell types in C. gigas-blast, hyalinocyte, and granulocyte cells would benefit from including references to substantiate this claim.

      We thank Reviewer #1 for their valuable comments, which have allowed us to further improve our manuscript. We have enriched the introduction with the following addition (lines 79 to 82):

      “Blast-like cells are considered undifferentiated hemocyte types (Donaghy et al., 2010), hyalinocytes appear to play a key role in wound repair (de la Ballina et al., 2020), and granulocytes are primarily involved in immune surveillance. Among these, granulocytes are regarded as the main immunocompetent hemocyte type (Wang et al., 2017).”

      Conclusion:

      The authors largely achieved their primary objective of providing a comprehensive characterization of oyster immune cells. They successfully integrated multiple approaches to identify and describe distinct hemocyte types. The correlation of these cell types with specific immune functions represents a significant advancement in understanding oyster immunity. However, certain aspects of their objectives have not been fully achieved. The lineage relationships proposed on the basis of pseudotime analysis, while interesting, require further experimental validation. The potential of antiviral defense mechanisms, an important aspect of oyster immunity, has not been discussed in depth.

      This study is likely to have a significant impact on the field of invertebrate immunology, particularly in bivalve research. It provides a new standard for comprehensive immune cell characterization in invertebrates. The identification of specific markers for different hemocyte types will facilitate future research on oyster immunity. The proposed model of hemocyte lineages, while requiring further validation, offers a framework for studying hematopoiesis in bivalves.

      Bibliography:

      (1) J. Chen, J. Lin, F. Yu, Z. Zhong, Q. Liang, H. Pang, S. Wu, Transcriptome analysis reveals the function of TLR4-MyD88 pathway in immune response of Crassostrea hongkongensis against Vibrio Parahemolyticus. Aquaculture Reports 25, 101253 (2022).

      (2) Y. Zhang, X. He, F. Yu, Z. Xiang, J. Li, K. L. Thorpe, Z. Yu, Characteristic and Functional Analysis of Toll-like Receptors (TLRs) in the lophotrocozoan, Crassostrea gigas, Reveals Ancient Origin of TLR-Mediated Innate Immunity. PLOS ONE 8, e76464 (2013).

      (3) J. de Lorgeril, B. Petton, A. Lucasson, V. Perez, P.-L. Stenger, L. Dégremont, C. Montagnani, J.M. Escoubas, P. Haffner, J.-F. Allienne, M. Leroy, F. Lagarde, J. Vidal-Dupiol, Y. Gueguen, G.

      Mitta, Differential basal expression of immune genes confers Crassostrea gigas resistance to Pacific oyster mortality syndrome. BMC Genomics 21, 63 (2020).

      (4) R. A. J. Signer, J. A. Magee, A. Salic, S. J. Morrison, Haematopoietic stem cells require a highly regulated protein synthesis rate. Nature 509, 49–54 (2014).

      (5) A. Narla, B. L. Ebert, Ribosomopathies: human disorders of ribosome dysfunction. Blood 115, 3196–3205 (2010).

      (6) Y. Zhang, A.-C. E. Duc, S. Rao, X.-L. Sun, A. N. Bilbee, M. Rhodes, Q. Li, D. J. Kappes, J. Rhodes, D. L. Wiest, Control of Hematopoietic Stem Cell Emergence by Antagonistic Functions of Ribosomal Protein Paralogs. Developmental Cell 24, 411–425 (2013).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      This paper provides a compelling analysis of chiton genomes, revealing extensive genomic rearrangements despite the group's apparent morphological stasis. By examining five reference-quality genomes, the study identifies 20 conserved molluscan linkage groups that are subject to significant rearrangements, fusions, and duplications in chitons, particularly in the basal Lepidopleurida clade. The high heterozygosity observed adds complexity to genome assembly but also highlights notable genetic diversity.

      We also note the comment from this reviewer that “more information is needed to clarify how this affects genome assembly and evolutionary outcomes.” We strongly agree; although it is outside the scope of this study, this may help develop future work on that topic.

      The research challenges the assumption that morphological stability implies genomic conservatism, suggesting that dynamic genome structures may play a role in species diversification. Although limited by the small number of molluscan genomes available for comparison, this study offers valuable insights into evolutionary processes and calls for further genomic exploration across molluscan clades. Some minor comments need to be tackled:

      (1) Line 39: 'major changes'. Please, better explain what you mean here?

      Clarified as major morphological change

      (2) Lines 70-73: refer to 'extant' cephalopods.

      Corrected

      (3) There is an inconsistency in the use of "Callochitonida" (lines 76, 85, 140, 145, Table S3, Figure S3) and "Chitonida s.l." (Figures 2, 3, and 4) throughout the text, figures, and supplementary material. To maintain clarity and avoid confusion, I recommend choosing one taxon and using it consistently across all sections of the manuscript. This will ensure coherence and help readers follow the discussion without ambiguity.

      An explanation has been added to the introduction and other instances in the text changed to Chitonida s.l. for consistency

      (4) Overall, the conclusions introduce several important topics and additional information that were not addressed earlier in the paper. It would enhance the coherence and impact of the study to introduce these points in the introduction, as they highlight the broader significance and relevance of the research. Integrating these key aspects earlier on would better frame the study's objectives and provide readers with a clearer understanding of its importance from the outset.

      The paragraph about chiton natural history and some additional lines have been moved to the introduction

      (5) Lines 242-245 and 254-256: While I agree with the authors on the remarkable results found in molluscs, particularly in polyplacophorans, I suggest toning down the comparisons with lepidopterans. The current framing may come across as dismissive towards butterflies, which does not seem necessary. It's true that biases exist in studying taxa that are more charismatic due to factors like diversity or aesthetic appeal, but the goal should be to emphasize the value of polyplacophorans without downplaying the significance of butterfly research. Instead, the focus should be on highlighting chitons as an exciting new model for understanding key evolutionary processes like synteny, polyploidy, and genome evolution. This shift would underscore the importance of polyplacophorans in a positive light without diminishing the value of lepidopteran studies.

      This sentence has been rephrased to adjust the tone of this paragraph

      (6) Figure 3: should be read 'Polyplacophora'.

      Corrected

      Reviewer #2 (Recommendations for the authors):

      I hope these comments by line number are helpful, despite my lack of experience with comparative genomics:

      We note the general comment from this reviewer that “most chiton genomes seem to be relatively conserved” may be  a misunderstanding from our presentation; we have added some additional notes in the first part of the discussion to ensure that this is clear to all readers.

      The reviewer also pointed out that “geologically recent events that do not especially represent the general pattern of genome evolution across this ancient molluscan taxon”. To clarify, the (limited) phylogenetic evidence suggests these changes are a longer term pattern throughout chiton evolution, since chromosomal rearrangements are found when comparing congeneric species (Acanthochitona spp., Fig 4C) and also across orders (Fig 4B). This has been added to the conclusions, as this is clearly an important point that was not adequately explained in the original text.

      (1) Line 72: It is true that adaptive radiations occur and are an interesting general model for how diversification can lead to species-rich taxa. However, there are other "non-adaptive" processes that can lead to geographically isolated species that are not much differentiated in their ecological or morphological diversity. The sentence here implies that such adaptive radiation is a necessary correlation of species richness. I agree that chitons have hardly frozen in time since the Paleozoic.

      This is clarified by moving some additional natural history aspects of chitons to the introduction, also as suggested by the first reviewer

      (2) L113: I am curious about how this character optimization was accomplished to allow the authors to reconstruct the HAM (hypothetical ancestral mollusc) chromosome number as 20 when the range of variation in Polyplacophora is 6 to 16 (mode 11), and chitons are part of the sister taxon to conchiferans. Is this dependent on the chromosome numbers found in the outgroup?

      We inferred ancestral linkage groups (“chromosomes”) based on comparison with other gastropods and bivalves noted in the methods; the other study cited (Simakov et al. 2022) used a broader selection of metazoans and also predicted an ancestral Mollusca karyotype of 1N=20.

      (3) L116: "Using five chromosome-level genome assemblies for chitons, we reconstructed the ancestral karyotype for Polyplacophora (more strictly the taxonomic order Neoloricata), and all intermediate phylogenetic nodes to demonstrate the stepwise fusion and rearrangement of gene linkage groups during chiton evolution (Fig. 3)."

      This is probably fine, but I had to struggle to understand what genome events happened between the Acanthochitona species. Are the chromosomes merely ordered and numbered by chromosome size and the switch in position between chromosomes 1 and 3 just has to do with the chromosomes 4+5, so they become the largest chromosome, and the former 1 is now 3? Confusing! The way it is drawn it seems like this implies more genome rearrangement than occurred, whereas if the order was maintained it would be more obvious that there were simply two chromosome fusions.

      The linkage groups are numbered in order of size, which is the typical way they would each be presented if the taxon was illustrated alone. Here this allows the reader to understand how the fusions or rearrangements have shifted the volume of genetic information between groups especially in comparison to the molluscan or polyplacophoran ancestor. In Fig 4 we instead decided to present the linkage groups in a revised form, so that each transition from the nearest ancestor is visible in more detail. We have added these points in the figure caption for Fig 3 which should make it easier for new readers to understand the presentation.

      (4) L481: Typo: A. rubrolineatain should be A. rubrolineata.

      Corrected

      (5) Figure 4: I am a little confused with what is meant by an "Ancestor" in these diagrams. For example, for comparing the two species of Acanthochitona with a hypothetical ancestor, it seems that the ancestor should be like one of the two, not different from both.

      I am looking at Ancestor "3" compared with the Acanthochitona rubrolineata "3" and A. discrepans "4". Again, I assume that the latter is "4" because it is slightly smaller than a new "3" and now the new "3" corresponds to "1" in the other Acanthochitona. This figure does help interpret Figure 3.

      To the point about reconstructing ancestral types; the two species both descended from a common ancestor. In morphology it is sometimes clear that one lineage retains more plesiomorphic character states; but in this case we must assume equal probability of change in any direction. The ancestor is a compromise that estimates the shortest distance to both descendants.

      We understand how the numbers were unclear and potentially distracting. This has been added to the figure caption, we are grateful for the feedback that will certainly help future readers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study investigates protein-protein interactions (PPIs) within the nuage, a germline-specific organelle essential for piRNA biogenesis in Drosophila melanogaster, using AlphaFold2 to predict interactions among 20 nuage-localizing proteins. The authors identify five novel interaction candidates and experimentally validate three of them, including Spindle-E and Squash, through co-immunoprecipitation assays. They confirm the functional significance of these interactions by disrupting salt bridges at the Spn-E_Squ interface. The study further expands its scope to analyze approximately 430 oogenesis-related proteins, validating three additional interaction pairs. A comprehensive screen of around 12,000 Drosophila proteins for interactions with the key piRNA pathway player, Piwi, identifies 164 potential binding partners. Overall, the research demonstrates that in silico approaches using AlphaFold2 can link bioinformatics predictions with experimental validation, streamlining the identification of novel protein interactions and reducing the reliance on extensive experimental efforts. The manuscript is commendably clear and easy to follow; however, areas for improvement should be addressed to enhance its clarity and rigor.

      Major Concerns:

      (1) While AlphaFold2 was developed and trained primarily for predicting protein structures and their interactions, applying it to predict protein-protein interactions is an extrapolation of its intended use. This introduces several important considerations and risks. First, it assumes that AlphaFold's accuracy in structure prediction extends to interactions, despite not being explicitly trained for this task. Additionally, the assumption that high-scoring models with structural complementarity imply biologically relevant interactions is not always valid. Experimental validation is essential to address these uncertainties, as over-reliance on computational predictions without such validation can lead to false positives and inaccurate conclusions. The authors should expand on the assumptions, limitations, and risks associated with using AlphaFold2 for predicting protein-protein interactions.

      We appreciate the reviewer's point. The prediction of protein-protein interactions using AlphaFold2 relies on the number of conserved homologous sequences and previous conformational data(8) (Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021)). We added sentences explaining the limitations and risks of the AlphaFold2 prediction method in Introduction and the end of Result and Discussion of the revised manuscript, respectively.

      Page 5, Line 67;

      “AlphaFold2 requires sequence homology information to predict protein-protein interactions and the complex structure model. The reliability of these predictions is basically dependent on the strength of co-evolutionary signals(9).”

      Page 6, Line 84;

      “AlphaFold2 was initially trained to predict the structure of individual proteins(8). Its application to complex prediction is an extrapolative use beyond its original intended scope, and its accuracy remains unverified. Even high-confidence predictions may not correspond to actual interactions, necessitating experimental validation to confirm whether predicted protein dimers truly bind.”

      Page 21, Line 361;

      “This study identifies several potential protein interactions, but AlphaFold2 predictions require caution. Protein-protein interactions involve conformational changes and dependencies on ligands, ions, and cofactors, which AlphaFold2 does not consider, potentially reducing prediction accuracy. Notably, the presence of a high-scoring model in terms of structural complementarity does not guarantee that the interaction is biologically significant.”

      (2) The authors experimentally validated three interactions, out of five predicted interactions, using co-immunoprecipitation (co-IP). They attributed the lack of validation for the other two predictions to the limitations of the co-IP method. However, further clarification on the potential limitations of the co-immunoprecipitation behind the negative results would strengthen the conclusions. While co-IP is a widely used technique, it may not detect weak or transient interactions, which could explain the failure to validate some predictions. Suggesting alternative validation methods such as FRET or mass spectrometry could further substantiate the results. On the other hand, AlphaFold2 predictions are not infallible and may generate false positives, particularly when dealing with structurally plausible but biologically irrelevant interactions. By acknowledging both the potential limitations of co-IP and the possibility of false positives from AlphaFold2, the authors can provide a more balanced interpretation of their findings.

      We appreciate the reviewer's point of view. We have used the co-IP method to detect interactions in this study. However, as the reviewer pointed out, it is likely that weak and transient interactions may not be detected. We added a note on the detection limits of the co-IP method and the possibility that AlphaFold2 method produces false positives in the revised manuscript.

      Page 12, Line 197;

      “While co-immunoprecipitation is a widely used method, it may not always detect weak or transient interactions. Other validation methods, such as FRET or co-localization assay in culture cells, could offer further insights to support the results. It is also important to note that AlphaFold2's predictions are not definitive and may lead to false positives, particularly when analyzing a large number of interactions.”

      (3) In line 143, the authors state that "This approach identified 13 pairs; seven of these were already known to form complexes, confirming the effectiveness of AlphaFold2 in predicting complex formations (Table 2). The highest pcScore pair was the Zuc homodimer, possibly because AlphaFold2 had learned from Zuc homodimer's crystal structure registered in the database." While the authors mentioned the presence of the Zuc homodimer's crystal structure, they do not provide a systematic bioinformatics analysis to evaluate pairwise sequence identity or check for the presence of existing structures for all the proteins or protein pairs (or their homologs) in databases such as the Protein Data Bank (PDB) or Swiss-Model. Conducting such an analysis is critical, as it significantly impacts the novelty and reliability of AlphaFold2 predictions. For instance, high sequence identity between the query proteins could lead to high-scoring models for biologically irrelevant interactions. Including this information would strengthen the conclusions regarding the accuracy and utility of the predictions.

      We appreciate the reviewer's critical point. The AlphaFold2 method generates a high confidence score when the 3D structure of the protein of interest, or of proteins with very similar sequences, is solved. We investigated whether the proteins used in this study are included in the 3D structure database (PDB) and added the information as a supplemental table S2. The following sentences were added to explain the structural references that AlphaFold2 has learned in the revised manuscript.

      Page 9, Line 150;

      The structures of the 20 proteins used in this study have been analyzed to varying extents in previous studies (Supplementary Table S2). A complex of Vas and the Lotus domain of Osk has been reported(20), and based on this complex structure, the interaction between Vas and Tej Lotus domain was predicted with a high score. Although the conformational analyses of the RNA helicase domain and the eTud domain have been reported previously, many of those cover only a subset of the regions and unlikely to affect our predictions in this study.

      The predicted 3D structures and the Predicted Aligned Error (PAE) plots for the 12 pairs, are shown in Fig. 1C.

      (4) While the manuscript successfully identifies novel protein interactions, the broader biological significance of these interactions remains underexplored. The manuscript could benefit from elaborating on how these findings may contribute to understanding the piRNA pathway and its implications on germline development, transposon repression, and oogenesis.

      We added to the revise manuscript the potential biological significance of the novel protein-protein interactions presented in this manuscript as follows;

      Page 16, Line 268;

      “In this study, three novel protein-protein interactions were predicted and experimentally confirmed. AlphaFold2 also predicted the 3D structure of these complexes, providing insight into the important regions involved in complex formation. These predictions will provide fundamental information to elucidate nuage assembly. Nuage is thought to form by liquid-phase separation; however, direct protein-protein interactions likely occur within protein-dense nuage, facilitating RNA processing. Although the precise roles of individual interactions require further study, characterization of protein-protein interactions within nuage will help clarify the mechanism of piRNA production.”

      Reviewer #1 (Recommendations for the authors):

      Minor Concerns:

      (1) In the Materials and Methods section, the authors thoroughly describe the computational infrastructure (SQUID at Osaka University) and the use of AlphaFold2. However, it would greatly benefit the readers to include a detailed breakdown of the computational cost. Understanding the computational cost (in terms of time, CPU/GPU hours, or other relevant metrics) for predicting 3D structures, especially for 400 protein pairs, would provide valuable insight into the efficiency and scalability of the approach. This would enhance the practical relevance of the methodology section and offer a better understanding of the resources required, beyond just the infrastructure description.

      Thank you for your valuable suggestion. The following descriptions were added in the revised manuscript.

      Page 24, Line 403;

      “The calculation of the MSA took on average 2-4 hours per protein, with the more homologs of the protein in query, the longer it took.”

      Page 24, Line 409;

      “Prediction of dimer structure took approximately 1-2 hours per pair on average, depending on protein size. Each user can compute 100~200 pairs of calculations per day, but since the supercomputer is shared, job availability varies with overall demand.”

      (2) The manuscript will benefit from a review for grammatical accuracy and clarity, especially in complex explanations. For example, in Line 160: "The predicted dimer structures of Me31B_Tral and Cup_Me31B showed the score of 0.74 and 0.68, respectively (Table 2)." could be revised to "The predicted dimer structures of Me31B_Tral and Cup_Me31B showed scores of 0.74 and 0.68, respectively.

      Thank you very much for pointing it out. Correction has been made to the text pointed out (Page 10, Line 170).

      (3) For alphafold3 webserver, please use (https://alphafoldserver.com/) instead of (https://golgi.sandbox.google.com/about).

      Thank you very much for pointing it out. The URL has been changed in the revised manuscript (Page 25, Line 422).

      Reviewer #2 (Public review):

      Summary:

      In this paper, the authors use AlphaFold2 to identify potential binding partners of nuage localizing proteins.

      Strengths:

      The main strength of the paper is that the authors experimentally verify a subset of the predicted interactions.

      Many studies have been performed to predict protein-protein interactions in various subsets of proteins. The interesting story here is that the authors (i) focus on an organelle that contains quite some intrinsically disordered proteins and (ii) experimentally verify some (but not all) predictions.

      Weaknesses:

      Identification of pairwise interactions is only a first step towards understanding complex interactions. It is pretty clear from the predictions that some (but certainly not all) of the pairs could be used to build larger complexes. AlphaFold easily handles proteins up to 4-5000 residues, so this should be possible. I suggest that the authors do this to provide more biological insights.

      We thank the reviewer for his kind suggestions. In this study, protein dimers were screened on the assumption that the two proteins bind 1:1; in some cases, multiple binding partners were predicted for a single protein. For example, Spn-E was predicted to bind Tej and Squ, respectively. Therefore, for Spn-E_Squ_Tej, we used the latest AlphaFold3 to predict the trimeric structure, which has already been described in the first manuscript. In addition, as suggested by the reviewer, other possible trimer results were also added in the revised manuscript as follows;

      Page 15, Line 249;

      “In addition to the Spn-E_Squ_Tej complex, 1:1 dimer prediction described above further suggested potential trimers (Fig. 1; Supplemental Fig. S4). For example, Tej protein is predicted to bind both Vas and Spn-E, and AlfaFold3 indeed further predicted a Vas_Tej_Spn-E trimer, where Tej’s Lotus and eTud domains interact with Vas and Spn-E, respectively. However, Lin et al. reported that Tej binds exclusively either with Vas or Spn-E, but not simultaneously(17), in Drosophila ovary, suggesting that the predicted trimers may be weak or transient. Similarly, the BoYb_Vret_Shu and the Me31B_Cup_Tral trimers remain hypothetical and require experimental verification (Supplemental Fig. S4).”

      Another weakness is the use of a non-standard name for "ranking confidence" - the author calls it the pcScore - while the name used in AlphaFold (and many other publications) is ranking confidence.

      “pcScore” has been changed to “ranking confidence”

      Reviewer #2 (Recommendations for the authors):

      (1) The pcScore is actually what is called RankingConfidence. Also, many other measures have been developed by other groups (based on PAE for instance) - these could be compared.

      Thank you for your valuable suggestions. While other indicators are being developed, we have computed the affinity of the complex based on the predicted three-dimensional structure by using PRODIGY web server. The description was added in the revised manuscript as follows;

      Page 18, Line 300;

      “The ranking confidence score reflects the reliability of AlphaFold2's predicted structure but does not always ensure accuracy. Therefore, we assessed complex affinity based on the predicted three-dimensional structures (Supplemental Table S6). Most dimers with high ranking confidence scores exhibited low Kd values indicative of high affinity, while some showed high Kd values indicating weak interactions (Supplemental Table S6). For example, the Baf_Vas complex had a high AlphaFold2 ranking confidence score (0.85) but a relatively high Kd value (1.1E-4 M), indicating low affinity. Consistently, Baf_Vas binding was not detected in Co-IP experiments (Fig. S5C). Although accurate Kd prediction may be limited due to insufficient structural optimization, it could serve as a valuable secondary screening tool following AlphaFold2 predictions.”

      (2) A statistical estimate of FDR for binding to the PIWI protein needs to be estimated. It is possible that 1.6% of random proteins (from another species for instance) also obtain ranking confidence over 0.6, i.e. how trustful are the predictions?

      Thank you for the insightful comments. Unfortunately, it is difficult to infer the FDR from the value of ranking confidence. Presumably, the accuracy will vary depending on the target protein, since the number of homologs and known conformational information will differ. In the case of Piwi, the FDR is expected to be relatively low since the conformation of the protein on its own has been experimentally determined. However, even for Piwi complexes with high values of ranking confidence, the estimated affinity varied from high to low (Supplemental Table S6). Therefore, it may be useful to conduct further secondary evaluation for AlphaFold2 predictions with high ranking confidence.

      (3) Identification of pairwise interactions is only a first step towards understanding complex interactions. It is pretty clear from the predictions that some (but certainly not all) of the pairs could be used to build larger complexes. AlphaFold easily handles proteins up to 4-5000 residues, so this should be possible. I suggest that the authors do this to provide more biological insights.

      Already mentioned above.

      (4) The comparisons of ranking confidence vs ipTM/pTM are less interesting (by definition ranking confidence is virtually identical to ipTM).

      Thank you for the thoughtful comment. As the reviewer pointed out, there is not much difference between ranking confidence and ipTM shown in Fig. 1A. A high value of pTM (firmly folding) tends to increase ranking confidence, while a low value of pTM (many disorder regions) tends to decrease ranking confidence. Therefore, it may be useful to change the threshold for confidence for each protein pair.

    1. Author response:

      We thank the reviewers for the detailed evaluations and thoughtful comments, which have improved the clarity and readability of this manuscript. We have responded to all reviewer comments and incorporated their suggested changes into the text and figures. We have also included new experimental results suggested by reviewer 2, which further strengthen our main conclusion.

      Point-by-point description of the revisions

      Reviewer #1:

      (1) Introduction, page 3: The statement "Single dimeric kinesin moves processively along microtubules in a hand-over-hand manner by alternately moving the two heads in an 8-nm step toward the plus-end of the microtubule" is inaccurate. The kinesin heads take ~16 nm steps, while the center of mass advances in ~8 nm increments. Please adjust the wording accordingly.

      (2) Introduction, page 5: In the sentence "These results are consistent with the closed and open conformations of the nucleotide-binding pocket in the rear and front heads of microtubule-bound kinesin dimers observed in cryo-electron microscopy (cryo-EM) studies," I recommend changing the order to align with the previous sentence. The correct order would be "These results are consistent with the open and closed conformations of the nucleotide-binding pocket in the front and rear heads."

      We thank the reviewer for pointing out our misunderstandings. We have corrected these sentences accordingly (lines 45-47 and lines 111-112).

      Reviewer #2:

      MAJOR CONCERNS

      Limitations of this study: The authors need to discuss the limitations of their work. 1) They used a cys-lite kinesins mutant and introduced new surface-exposed cysteines. These mutants have lower kcat values than WT. 2) They used fluorescently labeled ATP molecules, which are hydrolyzed 10 times slower than unlabeled nucleotides. 3) They still observe crosslinking under reducing conditions and partial (but almost complete) crosslinking under oxidized conditions. 4)They assumed that cysteine crosslinked orientation mimics the orientation of the neck-linker in the front and rear conditions. The authors clearly pointed to these issues in the Results section. While these assumptions are also supported by several control experiments, the authors need to acknowledge some of these limitations in the Discussion as well.

      We have now reiterated some of the key caveats in the Discussion, and newly described in the Results section those points not mentioned in the original manuscript that do not affect the conclusion. We also added a summary of the limitations and caveats into the first paragraph of the Discussion section (lines 425-431).

      (1) We added a sentence in the Results section to describe that the ATP-binding kinetics of the Cys-light mutant remained consistent with previous studies as follows: “First, we demonstrated that k<sub>+1</sub> and k<sub>-1</sub> of the wild-type head without Cys-modification were unchanged after oxidization (Table 1) and were comparable to those previously reported (Cross, 2004)” (lines 163-166). The reduced kcat values of cysteine pair-added mutants before crosslinking were primarily due to reduced microtubule association rate (data not included in this manuscript). We have added a sentence in the Results section describing the kcat results as follows: “The reduced ATPase activity primarily results from a decreased microtubule association rate (data to be presented elsewhere) with little change in ATP binding or microtubule dissociation rates (Table 1).” (lines 144-146).

      (2) Fluorescently-labeled ATP was used to determine the ATP off-rates of the E236A mutant monomer and E236A rear head of the E236A/WT heterodimer. Two caveats in these measurements could lead to underestimating the ATP off-rate: 1) The off rate of Alexa-ATP from the head may be reduced compared to unmodified ATP, as Alexa-ATP driven motility showed a 10-fold reduce velocity. 2) The ATP off-rate of the E236A mutant may differ from that of the rear head in the wild-type dimer, since the E236A mutant likely stabilizes the neck linker-docked state more strongly than in the rear head of the wild-type dimer. These points are crucial for evaluating the results of ATP off-rate and the affinity for ATP, so we have added sentences in the Discussion section as follows: “We note, however, that this K<sub>d</sub> of ATP may somewhat underestimate the true value in wild-type kinesin for two reasons: first, the E236A mutation likely stabilizes the neck linker-docked, closed state more than in the rear head of the wild-type dimer (Rice et al., 1999), and second, the Alexa-ATP used to measure the ATP off-rate of E236A head showed ~10-fold smaller velocity compared to unmodified ATP, partly due to a slower ATP off-rate (Figure 2-figure supplement 3).” (lines 449-454).

      (3) Under reducing condition, the rear head crosslink contained 30% crosslinked species, while under oxidized condition, the front head crosslink contained 11% un-crosslinked species (Figure 1-figure supplement 1). These heterogeneities likely affect the rate constants of K<sub>-1</sub> for rear head crosslink and K<sub>2</sub> for front head crosslink, as crosslinked and un-crosslinked species showed significantly different rate constants. However, we did not use the rear head crosslink result to determine K<sub>-1</sub>, since ATP hydrolysis likely occurred before reversible ATP dissociation. Instead, we used E236A monomer to estimate the K<sub>-1</sub> of the rear head. In addition, the result for K<sub>2</sub> of the front head crosslink was further validated using the E236A/WT heterodimer, which will be described in the next section.

      (4) This is an important point, and therefore, we conducted experiments using the E236A/WT heterodimer (including new experimental results of ATP binding kinetics of the front head) and obtained consistent results. To address this point, we have revised the following sentences in the Discussion: “In the front head, backward orientation of the neck linker has little effect on ATP binding and dissociation rates, both when measured for a monomer crosslink (Figure 2A, B) and for the front head of a E236A-WT heterodimer (Figure 4B, C, F).” (lines 432-433); “However, we found that the ATP-induced detachment rates from microtubule (K<sub>2</sub>) were similarly reduced for both the front head crosslink (7.0 s<sup>-1</sup>; Figure 3A) and the front WT head of the E236A/WT heterodimer (6.3 s<sup>-1</sup>; Figures 6D), suggesting that a step subsequent to ATP binding is gated in the front head.” (lines 437-441).

      Line 238, the authors wrote that "forward constraint on the neck linker in the rear head does not significantly accelerate the detachment from the microtubule." Can the authors comment on why the read-head-like construct has a low affinity for microtubules even in the absence of ATP (Line 220)? I believe that the low affinity of the head in this conformation is more striking (and potentially more important) than the changes they observe in detachment rates. The authors should also consider that they might not be able to reliably measure the changes in the dissociation rate in single molecule assays of this construct (especially if the release rate of the rear head in the oxidized condition increases a lot higher than that of WT). The kymographs show infrequent and brief events, which raises doubts about how reliably they can measure the release rates under those imaging conditions. Higher motor concentrations and faster imaging rates may address this concern.

      The low microtubule affinity of the rear-head-like crosslink stems from an extremely slow ADP release rate upon microtubule binding, not from a fast microtubule-detachment rate. Using stopped-flow measurements of microtubule-binding kinetics (microtubule-stimulated mant-ADP release and microtubule association rates), we found that the rear-head-crosslink resulted in a 2,000-fold decrease in the microtubule-stimulated ADP-release rate. This finding also explains the reduced ATPase of the rear-head-crosslink (Figure 1E). Since this low microtubule-affinity state occurs in the ADP-bound state rather than the ATP-bound state, we hypothesized that the neck-linker docked ADP-bound state cannot effectively bind to microtubules, requiring neck-linker undocking for microtubule binding (Mattson-Hoss et al., Proc. Natl. Acad. Sci., 111, 7000-7005 (2014)). While we acknowledge that understanding slow microtubule binding in the neck linker docked state is important for elucidating the mechanism and regulation of microtubule-binding of the head, this paper focuses specifically on the mechanism and regulation of “microtubule-detachment”. We plan to present these microtubule-binding kinetics data in a separate manuscript currently in preparation.

      To explain the low microtubule affinity of the rear-head-crosslink, we added this explanation to the text; “because this constraint on the neck linker dramatically reduces the microtubule-activated ADP release rate (data to be presented elsewhere), creating a weak microtubule binding state” (lines 226-228).

      Although the rear head crosslinking construct under oxidative condition showed fewer fluorescent spots per kymographs (images) due to its low microtubule binding rate, we collected more than one hundred spots by recording additional microscope movies (N=140; Figure 3-figure supplement 2B), ensuring sufficient data for statistical analysis.

      Figure 2: How do the rates shown in Figure 2A-B compare to the previous kinetics studies in the field? The authors compare the dissociation rate of WT measured in rapid mixing experiments to that of E236A in smFRET assays. It is not clear whether these comparisons can be made reliably using different assays. Can the authors perform rapid mixing of E236A or try to determine the rate for the WT from smFRET trajectories?

      The results of ATP on/off rates are comparable to the previous stopped flow measurements of ATP binding to monomeric kinesin-1 on microtubule, which are 2-5 µM<sup>-1</sup>s<sup>-1</sup> and ~150 s<sup>-1</sup>, respectively (summarized in the review by Cross (2004)). We added a sentence as follows: “First, we demonstrated that K<sub>+1</sub> and K<sub>-1</sub> of the wild-type head without Cys-modification were unchanged after oxidization (Table 1) and were comparable to those previously reported (Cross, 2004).” (lines 163-166).

      As the reviewer pointed out, the rapid mixing and smFRET data cannot be directly compared due to the differences in temporal resolution and fluorescent probe used. In Figure 2E (2F in the revised version), we measured ATP dissociation rate for both WT and E236A using smFRET. Due to the lower temporal resolution, we could not accurately determine ATP binding rate using smFRET. Therefore, to compare the ATP binding rate between WT and E236A heads, we now have added stopped-flow measurements of mant-ATP binding to the E236A monomer, as shown in Fig. 2C and Figure 2-supplement 2, and described in the text (lines 182-185).

      Line 396: One of the most significant conclusions of this work is that the backward orientation of the neck linker has little effect on ATP binding to the front head. This is only supported by the results shown in Fig. 2A-B. Can the authors perform/analyze smFRET assays on the E236A/WT heterodimer to directly show whether the ATP binding rate to the WT head is affected or not affected by the orientation of the neck linker of the WT head?

      We agree with the reviewer that our finding about ATP binding to the front head is potentially significant in the kinesin field, as it has been widely believed that ATP-binding is suppressed in the front head. In our original manuscript, this conclusion was supported only by the measurement of ATP on-rate of the front-head-crosslink, which may differ from the front head of a dimer in which the backward orientation of the neck linker is maintained by the backward strain. Although the reviewer suggested performing smFRET experiments using E236A/WT heterodimer, smFRET have relatively low temporal resolution (50-100 fps) and cannot accurately measure the frequency of ATP binding, so we used this technique only to determine ATP off rates. In this revised manuscript, we now have added stopped-flow experiments to separately measure the ATP binding to the front and rear heads of the E236A/WT heterodimer. By labeling the rear E236A head with a fluorophore to quench the mant-ATP signal bound to the rear head, we successfully measured mant-ATP binding rate to the front head. We found that the ATP-binding rate to the front head was comparable to that of an unconstrained monomer head, providing direct evidence for our conclusion. The revised version includes Fig. 4 A-C (with Figure 4-supplement 2; Figs. 4 and 5 are swapped in order) showing the kinetics of ATP binding to the front and rear heads of the E236A/WT heterodimer, with corresponding text in the result section (lines 315-324).

      MINOR CONCERNS

      Lines 31 and 32: I recommend replacing "ATP affinity" with "ATP binding rate" or "the dissociation of ATP" to be more specific. This is because they do not directly measure the affinity (Kd), but instead measure the on or off rates.

      Line 41: Replace "cellar" with "cellular".

      Line 83: The authors should cite Andreasson et al. here.

      We have corrected these sentences accordingly (lines 31, 40, 85).

      Lines 83-86: It seems this sentence belongs to the next paragraph. It also needs a citation(s).

      This statement lacks experimental evidence and may confuse readers, so we have removed it for clarity.

      Line 151: It would be helpful to add a conclusion sentence at the end of this paragraph to explain what these results mean to the reader.

      A conclusion sentence of this paragraph has been added: “These results demonstrate that neck linker constraints in both forward and rearward orientations inhibit specific steps in the mechanochemical cycle of the head (lines 151-153)”.

      Lines 175-180: I recommend combining and shortening these sentences, as follows, to avoid confusing the reader: "To detect the ATP dissociation event of the rear head, we employed a mutant kinesin with a point mutation of E236A in the switch II loop, which almost abolishes ATPase hydrolysis and traps in the microtubule-bound, neck-linker docked state,"

      We have corrected these sentences accordingly (line 179-181).

      Line 314: "which was rarely observed ...". This is out of place and confusing as is. I recommend moving this sentence after the sentence that ends in Line 295.

      This sentence explains how the dark-field microscopy data was analyzed to determine whether the labeled head was in the leading or trailing position before detaching from the microtubule, but the explanation needs clarification. We removed the phrase “which was rarely observed for E236A-WT heterodimer” and simplified this sentence as follows: “Moreover, these observations allow us to distinguish whether the gold-labeled WT head was in the leading or trailing position just before microtubule detachment; the backward displacement of the detached head indicates that the labeled WT head occupied the leading position prior to detachment (Figure 5-figure supplement 1).” (lines 347-351).

      Line 300: Can the authors comment on why E236A/WT has a substantially lower ATPase rate than WT homodimer? Is it possible to determine which step in the catalytic cycle is inhibited?

      We demonstrated that the k<sub>2</sub> (microtubule-detachment rate) of the front head matched the ATP turnover rate of the E236A/WT heterodimer (Figure 6 B and E), suggesting that the inhibited step occurs after ATP binding in the front head. In contrast, the rear E236A head showed virtually no ATP hydrolysis activity, since in high-speed dark field microscopy, we observed forward step caused by rear E236A head detachment from microtubule only rarely, approximately once every few seconds (Figure 5-figure supplement 1). We added a sentence in the text as follows: “As described later, the reduced ATPase rate results from suppressed microtubule detachment of the front WT head, while the rear E236A head is virtually unable to detach from microtubules” (lines 311-313).

      Line 323: Is the unbound dwell time unchanged?

      The unbound dwell time exhibited a weak ATP-dependence, which we described only in Figure 5-supplement 2 (Figure 4-supplement 2 in the old version). We observed three distinct phases in the unbound dwell time based on mobility differences, with ATP dependence appearing only in the third phase. This finding suggests that ATP binding to the microtubule-bound E236A head is sometimes necessary for the detached WT head to rebind to the forward-tubulin binding site, indicating that the microtubule-bound E236A head occasionally releases ATP during the one-head-bound state (without the forward neck linker strain). To describe the ATP-dependence of the unbound dwell time, we added a sentence in the main text as follows: “In contrast, the dwell time of the unbound state of the gold-labeled WT head showed weak ATP dependence (Figure 5-figure supplement 2), indicating that the rear E236A head occasionally releases ATP when the front head detaches from the microtubule and the neck linker of E236A head becomes unconstrainted. This finding further supports the idea that forward neck linker strain plays a crucial role in reducing the reversible ATP release rate.” (lines 372-377).

      Line 331: I recommend replacing "ATP-induced detachment" with "nucleotide-induced detachment" for clarity.

      We have revised the phrase accordingly (line 371).

      Line 344: I recommend replacing "affinity" with "forward strain prevents the release of the nucleotide" or similar to avoid confusion. Forward strain reduces the off-rate of the bound nucleotide, rather than allowing ATP to bind more efficiently to the rear head.

      We agree to the reviewer’s comment and have corrected this sentence accordingly (line 338).

      Lines 376-385: G7-12 constructs are introduced in Figure 6, but the results in this paragraph are shown in Figure 5. They should be moved to Figure 6 to avoid confusion.

      To improve the readability, we have reorganized Figures 4-6, such that all the figure panels related to the neck linker extended mutants are shown in Figure 6; Figure 5D has been moved to Figure 6F.

      Line 421: delete "not" before "does not".

      We have corrected this typo.

      Lines 433-441: Unless I am mistaken, more recent work in the kinesin field showed that backward trajectories of kinesin 1 reported by Carter and Cross are due to slips from the microtubule rather than backward processive runs of the motor.

      The slip motion demonstrated by Sudhakar et al. (2021) differs from the backstep motion reported by Carter and Cross (and many other laboratories). Slip motion occurs after kinesin detaches from the microtubule and continues until the bead returns to the trap center. In contrast, backstep motion occurs during processive movement when the trap force either exceeds or approaches the stall force. The kinetics of these motions also differ significantly: slip steps occur with a dwell time of 71 µs and are independent of ATP concentration, while backsteps take ~0.3 s (at 1 mM ATP) and depend on ATP concentration. These differences indicate that slip motion is phenomenologically distinct from backsteps occurring under supra-stall or near-stall force.

      Line 474: Replace "suppresses" with "suppressed".

      We have corrected this typo.

      Figure 4E: I would plot these results with increasing ATP concentration on the x-axis.

      We formatted Figure 4E to match Figure 4b from Isojima et al. (Nature Chem. Biol. 2015), to emphasize the difference in ATP dependence of the front and rear head.

      Figure 4B: The authors should explain how they distinguish between bound and unbound states in the main text or figure legends. For example, it is not clear how the authors score when the motor rebinds to the microtubule in the first unbinding event shown in Figure 4B (displacement plot).

      The method was described in the Materials and Methods section, but we have now described how to distinguish between bound and unbound states in the main text as follows: “Unlike the unbound trailing head of wild-type dimer that showed continuous mobility (Isojima et al., 2016), the unbound WT head of E236A-WT heterodimer exhibited a low-fluctuation state in the middle (Figure 5B, s.d. trace). This low-fluctuation unbound state was distinguishable from the typical microtubule-bound state, having a shorter dwell time of ~5 ms compared to the bound state and positioning backward, closer to the E236A head, relative to the bound state (Figure 5-figure supplement 2).” (lines 351-356).

      Reviewer #3:

      Minor Issues:

      - Line 22, Abstract - The phrase "move in a hand-over-hand manner" could be clearer if phrased as "move in a hand-over-hand fashion" to improve readability.

      We changed the word “manner” to “process” (line 23).

      - Abstract - Neck linker conformation in the leading head: The sentence "We demonstrate that the neck linker conformation in the leading kinesin head increases microtubule affinity without altering ATP affinity" would benefit from defining this conformation as "backward" for clarity.

      - Abstract - Neck linker conformation in the trailing head: The sentence "The neck linker conformation in the trailing kinesin head increases ATP affinity by several thousand-fold compared to the leading head, with minimal impact on microtubule affinity" should also clarify that this conformation is "forward."

      We have corrected these sentences accordingly (line 30, 32).

      - Abstract - Conformation-specific effects: The authors mention conformation-specific effects in the neck linker structure but do not define the neck linker's conformation or the motor domain's (MD) conformation. Clarifying these conformational changes would improve the explanation of how they promote ATP hydrolysis and dissociation of the trailing head before the leading head detaches from the microtubule, thereby providing a kinetic basis for kinesin's coordinated walking mechanism.

      We have revised the last sentence of the abstract accordingly by specifying the neck linker’s conformation as follows: “In combination, these conformation-specific effects of the neck linker favor ATP hydrolysis and dissociation of the rear head prior to microtubule detachment of the front head, thereby providing a kinetic explanation for the coordinated walking mechanism of dimeric kinesin.” (lines 34-37).

      - Line 306 - Use of ATP in the E236A-WT heterodimer: In discussing the "ATP-induced detachment rate of the WT head in the E236A-WT heterodimer," the authors should consider justifying their choice of ATP over ADP for inducing microtubule (MT) dissociation. Since ATP typically promotes tighter MT binding and ATP turnover is reduced in forward-positioned WT heads, it may be unclear to some readers why ATP was chosen.

      We measured the ATP-induced detachment rate k<sub>2</sub> of the front head of the E236A-WT heterodimer to validate our findings from the front-head-crosslinked monomer experiments, which demonstrated reduced k<sub>2</sub> after oxidation. To clarify this point, we have now included ATP binding kinetics measurements for both front and rear heads of the E236A-WT heterodimer, as suggested by reviewer 2. These additional data demonstrate consistency between the results from the crosslinked monomer and E236A-WT heterodimer experiments.

      - Discussion - Backward-oriented neck linker in the front head: The discussion mentions that the backward-oriented neck linker in the front head reduces its ATP-induced detachment rate, suggesting that a step after ATP binding (e.g., isomerization, ATP hydrolysis, or phosphate release) is gated in the front head. However, the authors do not clarify that the backward neck linker orientation would imply the nucleotide pocket should be open or at least not fully closed, thus inhibiting ATP turnover. This is important because, as demonstrated in other studies, full closure of the nucleotide pocket is linked to neck linker docking. This point should be addressed earlier in the discussion.

      We have addressed this point by revising this sentence as follows: “These results are consistent with an inability of the front head to fully close its nucleotide pocket to promote ATP hydrolysis and Pi release (Benoit et al., 2023), as will be discussed later.” (lines 441-443)

    1. Author response:

      We thank the reviewers for their thorough review of our manuscript and their constructive feedback. We will address their comments and concerns in a point-by-point response at a later stage but would like to clarify some minor misunderstanding to not confuse any readers in the meantime.

      - In regard to population ablation: When investigating the contribution of population size to reconstruction quality, we used 12.5, 25, 50 or 100% of the recorded neuronal population, which corresponds to ~1000/2000/4000/8000 neurons per animal. We did not produce reconstructions from only 1 neuron.

      - In regard to the training of the transparency masks: The transparency masks were not produced using the same movies we reconstructed. We apologize for the lack of clarity on this point in the manuscript. We calculated the masks using an original model instance rather than a retrained instances used in the rest of the paper. Specifically, the masks were calculated using the original model instance ‘fold 1’ and data fold 1, which is it’s validation fold. In contrast, the model instances used in the paper for movie reconstruction were retrained while omitting the same validation fold across all instances (fold 0) and all the reconstructed movies in the paper are from data fold 0.

      - In regard to reconstruction based on predicted activity: We always reconstructed the videos based on the true neural responses not the predicted neural response, with the exception of the Gaussian noise and drifting grating stimuli in Figure 4 and Supplementary Figure S2 where no recorded neural activity was available).

    1. Author response:

      We thank both reviewers for their suggestions on improving our manuscript, which is focused on demonstrating that the C3a-C3aR axis modulates trained immune responses in alveolar macrophages. The Short Report format precludes separating the Results and Discussion sections. However, we will work towards a clearer presentation of findings and providing a more comprehensive interpretation of the data in the Revision, by addressing the points brought up by both Reviewers.

      We agree with the suggestions from Reviewer 1 that (1) other cell types such as dendritic cells, neutrophils, and endothelial cells can also be involved in immune training, and (2) macrophages have other activities beyond releasing inflammatory cytokines, and will clarify both these points in the Revision. The mechanism of C3 being cleaved intracellularly and binding to lysosomal C3aR involves cathepsin-dependent cleavage of C3 to C3a and has been experimentally proven (Liszewski et al. Immunity 2013). However, we will clarify this mechanism in the revision. We also acknowledge that the observations need to be validated in human-based models. Currently, we do not have access to an adequate representation of human alveolar macrophages for our ex vivo testing to account for individual-level variation in immune responses. However, we anticipate this work will form the basis of these future studies.

      We also appreciate Reviewer 2’s suggestions regarding demonstrating the resolution of acute inflammation after the initial exposure to heat-killed Pseudomonas. We will address this critique by performing additional experiments, which will be included in the Revision. We also agree that the responses of trained C3-deficient cells should be compared to untrained C3-deficient controls after the LPS challenge. We will include this data in the Revision, in addition to the requested data for Figures 3 and 4. We would like to clarify that we do not observe baseline differences between untrained C3-sufficient (wildtype) and C3-deficient alveolar macrophages, even in their glycolytic capacity, and thus, anticipate that our revised data will strengthen the conclusions from the original manuscript.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors aimed to characterize neurocomputational signals underlying interpersonal guilt and responsibility. Across two studies, one behavioral and one fMRI, participants made risky economic decisions for themselves or for themselves and a partner; they also experienced a condition in which the partners made decisions for themselves and the participant. The authors also assessed momentary happiness intermittently between choices in the task. Briefly, results demonstrated that participants' self-reported happiness decreased after disadvantageous outcomes for themselves and when both they and their partner were affected; this effect was exacerbated when participants were responsible for their partner's low outcome, rather than the opposite, reflecting experienced guilt. Consistent with previous work, BOLD signals in the insula correlated with experienced guilt, and insula-right IFG connectivity was enhanced when participants made risky choices for themselves and safe choices for themselves and a partner.

      Strengths:

      This study implements an interesting approach to investigating guilt and responsibility; the paradigm in particular is well-suited to approach this question, offering participants the chance to make risky v. safe choices that affect both themselves and others. I appreciate the assessment of happiness as a metric for assessing guilt across the different task/outcome conditions, as well as the implementation of both computational models and fMRI.

      We thank Reviewer 1 for their positive assessment of our manuscript.

      Weaknesses:

      In spite of the overall strengths of the study, I think there are a few areas in which the paper fell a bit short and could be improved.

      We are looking forward to improving our manuscript based on the Reviewers’ comments. According to eLife’s policy, here are our provisional replies as well as plans for changes.

      (1) While the framing and goal of this study was to investigate guilt and felt responsibility, the task implemented - a risky choice task with social conditions - has been conducted in similar ways in past research that were not addressed here. The novelty of this study would appear to be the additional happiness assessments, but it would be helpful to consider the changes noted in risk-taking behavior in the context of additional studies that have investigated changes in risky economic choice in social contexts (e.g., Arioli et al., 2023 Cerebral Cortex; Fareri et al., 2022 Scientific Reports).

      We certainly agree that several previously published studies have relied on risky choice tasks with social conditions. We will happily refer to the studies mentioned when discussing changes in risk-taking behaviour in our revised manuscript.

      (2) The authors note they assessed changes in risk preferences between social and solo conditions in two ways - by calculating a 'risk premium' and then by estimating rho from an expected utility model. I am curious why the authors took both approaches (this did not seem clearly justified, though I apologize if I missed it). Relatedly, in the expected utility approach, the authors report that since 'the number of these types of trials varied across participants', they 'only obtained reliable estimates for [gain and loss] trials in some participants' - in study 1, 22 participants had unreliable estimates and in study 2, 28 participants had unreliable estimates. Because of this, and because the task itself only had 20 gains, 20 losses, and 20 mixed gambles per condition, I wonder if the authors can comment on how interpretable these findings are in the Discussion. Other work investigating loss aversion has implemented larger numbers of trials to mitigate the potential for unreliable estimates (e.g., Sokol-Hessner et al., 2009).

      We agree that we have not clearly justified why we have taken two approaches to assess risk preferences. In short, both approaches have advantages and inconveniences when applied to our experiment. We will happily detail our reasons in the revised manuscript. Regarding the second point of this comment: the small number of reliable estimates is one of the reasons that we have used another approach to assess risk preferences. We would certainly have obtained more reliable estimates if we had implemented more trials. We will discuss the interpretability of all the risk preference estimates we used in the revised Discussion.

      (3) One thing seemingly not addressed in the Discussion is the fact that the behavioral effect did not replicate significantly in study 2.

      We agree that we could have discussed more the fact that there were (slight but significant) differences in risk preferences between the Solo and Social conditions in Study 1 but not in Study 2. While the absence of a significant difference in Study 2 is helpful to compare the neural mechanisms involved in making decisions for oneself vs. for oneself and another person (because any differences could not be explained by differences in risk preferences), we certainly should expand our discussion of the differences in findings between the two studies, which we will do in the revised manuscript.

      (4) Regarding the computational models, the authors suggest that the Reponsibility and Responsibility Redux models provided the best fit, but they are claiming this based on separate metrics (e.g., in study 1, the redux model had the lowest AIC, but the responsibility only model had the highest R^2; additionally, the basic model had the lowest BIC). I am wondering if the authors considered conducting a direct model comparison to statistically compare model fits.

      We agree that we should run formal, direct model comparison tests using for example chi-square or log-likelihood-ratio tests. We will do so in the revised manuscript.

      (5) In the reporting of imaging results, the authors report in a univariate analysis that a small cluster in the left anterior insula showed a stronger response to low outcomes for the partner as a result of participant choice rather than from partner choice. It then seems as though the authors performed small volume correction on this cluster to see whether it survived. If that is accurate, then I would suggest that this result be removed because it is not recommended to perform SVC where the volume is defined based on a result from the same whole-brain analysis (i.e., it should be done a priori).

      As indicated in the manuscript, the small insula cluster centered at [-28 24 -4] and shown in Figure 4F survived corrections for multiple tests within the anatomically-defined anterior insula (based on the anatomical maximum probability map described in Faillenot et al., 2017), which is independent of the result of our analysis. We agree that one should not (and we did not) perform multiple corrections based on the results one is correcting – that would indeed be circular and misleading “double-dipping”. The anterior insula is one of the regions most frequently associated with guilt (see the explanations in our Introduction, which refers for example to Bastin et al., 2016; Lamm & Singer, 2010; Piretti et al., 2023). Thus we feel that performing small-volume correction within the anatomically-defined anterior insula is an acceptable approach to correct for multiple tests in this case. We fully acknowledge that, independently of any correction, the effect and the cluster are small. We will clarify these explanations in the revised manuscript.

      Reviewer #2 (Public review):

      Summary

      This manuscript focuses on the role of social responsibility and guilt in social decision-making by integrating neuroimaging and computational modeling methods. Across two studies, participants completed a lottery task in which they made decisions for themselves or for a social partner. By measuring momentary happiness throughout the task, the authors show that being responsible for a partner's bad lottery outcome leads to decreased happiness compared to trials in which the participant was not responsible for their partner's bad outcome. At the neural level, this guilt effect was reflected in increased neural activity in the anterior insula, and altered functional connectivity between the insula and the inferior frontal gyrus. Using computational modeling, the authors show that trial-by-trial fluctuations in happiness were successfully captured by a model including participant and partner rewards and prediction errors (a 'responsibility' model), and model-based neuroimaging analyses suggested that prediction errors for the partner were tracked by the superior temporal sulcus. Taken together, these findings suggest that responsibility and interpersonal guilt influence social decision-making.

      Strengths

      This manuscript investigates the concept of guilt in social decision-making through both statistical and computational modeling. It integrates behavioral and neural data, providing a more comprehensive understanding of the psychological mechanisms. For the behavioral results, data from two different studies is included, and although minor differences are found between the two studies, the main findings remain consistent. The authors share all their code and materials, leading to transparency and reproducibility of their methods.

      The manuscript is well-grounded in prior work. The task design is inspired by a large body of previous work on social decision-making and includes the necessary conditions to support their claims (i.e., Solo, Social, and Partner conditions). The computational models used in this study are inspired by previous work and build on well-established economic theories of decision-making. The research question and hypotheses clearly extend previous findings, and the more traditional univariate results align with prior work.

      The authors conducted extensive analyses, as supported by the inclusion of different linear models and computational models described in the supplemental materials. Psychological concepts like risk preferences are defined and tested in different ways, and different types of analyses (e.g., univariate and multivariate neuroimaging analyses) are used to try to answer the research questions. The inclusion and comparison of different computational models provide compelling support for the claim that partner prediction errors indeed influence task behavior, as illustrated by the multiple model comparison metrics and the good model recovery.

      We thank Reviewer 2 very much for their comprehensive description of our study and the positive assessment of our study and approach.

      Weaknesses

      As the authors already note, they did not directly ask participants to report their feelings of guilt. The decrease in happiness reported after a bad choice for a partner might thus be something else than guilt, for example, empathy or feelings of failure (not necessarily related to guilt towards the other person). Although the patterns of neural activity evoked during the task match with previously found patterns of guilt, there is no direct measure of guilt included in the task. This warrants caution in the interpretation of these findings as guilt per se.

      We fully agree that not directly asking participants about feelings of guilt is a clear limitation of our study. While we already mention this in our Discussion, we will happily expand our discussion of the consequences on interpretation of our results along the lines described by the reviewer in the revised manuscript. We would like to thank Reviewer 2 for proposing these lines of thought.

      As most comparisons contrast the social condition (making the decision for your partner) against either the partner condition (watching your partner make their decision) or the solo condition (making your own decision), an open question remains of how agency influences momentary happiness, independent of potential guilt. Other open questions relate to individual differences in interpersonal guilt, and how those might influence behavior.

      We fully agree that the way agency influences happiness has not been much discussed in our manuscript so far, and we would happily do so in the revised manuscript. The same goes for individual differences in interpersonal guilt which we have not investigated due to our relatively small sample sizes but would certainly be worth investigation in subsequent work.

      This manuscript is an impressive combination of multiple approaches, but how these different approaches relate to each other and how they can aid in answering slightly different questions is not very clearly described. The authors could improve this by more clearly describing the different methods and their added value in the introduction, and/or by including a paragraph on implications, open questions, and future work in the discussion.

      We again thank the reviewer for their praise of our approach and fully agree that we can improve the description of the benefit of combining methods in the Introduction, which we will do in the revised manuscript. We will also include a paragraph on implications, open questions, and future work in the Discussion of the revised manuscript.

      However, taken together, this study provides useful insights into the neural and behavioral mechanisms of responsibility and guilt in social decision-making, and how they influence behavior.

      We again thank Reviewer 2 for their attentive reading and thoughtful comments and look forward to submitting our revised and improved manuscript.

    1. Author response:

      Reviewer 1:

      (1) We appreciate the reviewer’s suggestion to test a multi-attribute attentional drift-diffusion model (maaDDM) that does not constrain the taste and health weights to the range of 0 and 1 and will test such a model.

      (2) Similarly, we will follow the reviewer’s suggestion to address potential demand effects. First, we will add “order” (binary: hungry-sated or sated hungry) as a predictor to our GLMM, to test for potential systematic effects of order on choices and response times. Second, we will split the participants by “order” and examine whether we see group differences of tasty and healthy decisions within the first testing session. Note that we already anticipate that looking at only 50% of the data and testing for a between-subject rather than within-subject effect is likely to reduce effect size and statistical sensitivity.

      (3) We thank the reviewer for their observant remark about faster tasty choices and potential markers in the drift rate. While our starting point models show that there might be a small starting point bias towards the taste boundary which result in faster decisions, we will take a closer look at the simulated value differences as obtained in our posterior predictive checks to see if the drift rate is systematically more extreme for tasty choices.

      (4) Regarding the mtDDM, we will verify that the relative starting time (rst) effects are minuscule. While we will follow the recommendation of correlating first fixations with rst, we would like to point out that a majority of fixations (see Figure 3b) and first fixations (see Figure S6b) are on food images. We will also provide a parameter recovery of the mtDDM.

      Reviewer 2:

      (1) We would like to verify the reviewer’s interpretation that hungry people in negative calorie balance simply prefer more calories and would like to point to our supplementary analyses, in which we show that hunger state also increases the probability of higher wanted and higher caloric decisions (see SOM4, SOM5, Figure S4). Moreover, we agree that high caloric items might not be unhealthy and are happy to demonstrate the correlations between health ratings and objective caloric content, to demonstrate the strong negative correlation in our dataset, which our principal component analyses hints at, too.

      Reviewer 3:

      (1) We agree that choosing tasty over healthy options under hunger may be evolutionarily adaptive. We will address the adaptiveness of this hunger driven mechanism in our discussion, reiterating the differentiation made in the introduction that this system no longer be adaptive in our obesogenic environment, leading to suboptimal decisions.

      (2) We will address alternative explanations of the observed effects in our discussion with respect to the macro-nutritional content of the Shake and potential placebo effects arising from the shake vs no shake manipulation.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work shows that a specific adenosine deaminase protein in Dictyostelium generates the ammonia that is required for tip formation during Dictyostelium development. Cells with an insertion in the ADGF gene aggregate but do not form tips. A remarkable result, shown in several different ways, is that the ADGF mutant can be rescued by exposing the mutant to ammonia gas. The authors also describe other phenotypes of the ADGF mutant such as increased mound size, altered cAMP signalling, and abnormal cell type differentiation. It appears that the ADGF mutant has defects in the expression of a large number of genes, resulting in not only the tip defect but also the mound size, cAMP signalling, and differentiation phenotypes.

      Strengths:

      The data and statistics are excellent.

      Weaknesses:

      (1) The key weakness is understanding why the cells bother to use a diffusible gas like ammonia as a signal to form a tip and continue development.

      Diffusion of a gas can affect the signalling process of the entire colony of cells and will be quicker than other signaling mechanisms. A number of findings suggest that ammonia acts as both a local and long-range regulatory signal, integrating environmental and cellular cues to coordinate multicellular development. Ammonia serves as a crucial signalling molecule, influencing both multicellular organization and differentiation in Dictyostelium (Francis, 1964; Bonner et al., 1989; Bradbury and Gross, 1989). By raising the pH of the intracellular acidic vesicles of prestalk cells (Poole and Ohkuma, 1981; Gross et al, 1983), and the cytoplasm, ammonia is known to increase the speed of chemotaxing amoebae (Siegert and Weijer 1989; Van Duijn and Inouye, 1991), triggering multicellular movement (Bonner et al., 1988, 1989) to favor tipped mound development. The slug tip is known to release ammonia while the slime sheath at the back of the slug prevents diffusion thus maintaining high ammonia levels to (Bonner et al., 1989) promote pre-spore differentiation (Newell et al., 1969). Ammonia has been found to favor slug migration rather than fruiting (Schindler and Sussman, 1977) and thus, tip-derived ammonia may stimulate synchronized development of the entire colony. The tip exerts negative chemotaxis towards ammonia, potentially directing the slugs away from each other to ensure equal spacing of fruiting bodies (Feit and Sollitto, 1987).  

      Ammonia released in pulses acts as a long-distance signalling molecule between colonies of yeast cells indicating depletion of nutrient resources and promoting synchronous development (Palkova et al., 1997; Palkova and Forstova, 2000). A similar mechanism may be at play to influence neighbouring Dictyostelium colonies. Furthermore, ammonia produced in millimolar concentrations (Schindler and Sussman, 1977) may also ward off predators in soil as observed in Streptomyces symbionts of leaf-cutting ants to inhibit fungal pathogens (Dhodary and Spiteller, 2021). Additionally, ammonia may be recycled into amino acids, within starving Dictyostelium cells to supporting survival and differentiation as observed in breast cancer cells (Spinelli et al., 2017). Therefore, using a diffusible gas like ammonia as a signalling molecule is likely to have bioenergetic advantages. Ammonia is a natural metabolic byproduct of amino acid catabolism and other cellular processes, making it readily available without requiring additional energy for synthesis. Instead of producing a dedicated signalling molecule, cells can exploit an existing by-product for developmental regulation.

      (2) The rescue of the mutant by adding ammonia gas to the entire culture indicates that ammonia conveys no positional information within the mound.

      Ammonia is known to influence rapid patterning of Dictyostelium cells confined in a restricted environment (Sawai et al., 2002). Both neutral red staining (a marker for prestalk and ALCs) (Fig. S2) and the prestalk marker ecmA/ ecmB expression (Fig. 8C) in the adgf mutants suggest that the mounds have differentiated prestalk cells but are blocked in development. The mound arrest phenotype can be reversed by exposing the adgf mutant mounds to ammonia.  

      Based on cell cycle phases, there exists a dichotomy of cell types, that biases cell fate to prestalk or prespore (Weeks and Weijer, 1994; Jang and Gomer, 2011). Prestalk cells are enriched in acidic vesicles, and ammonia, by raising the pH of these vesicles and the cytoplasm (Davies et al 1993; Van Duijn and Inouye 1991), plays an active role in collective cell movement (Bonner et al., 1989). Thus, ammonia reinforces or maintains the positional information by elevating cAMP levels, favouring prespore differentiation (Bradbury and Gross, 1989; Riley and Barclay, 1990; Hopper et al., 1993). 

      (3) By the time the cells have formed a mound, the cells have been starving for several hours, and desperately need to form a fruiting body to disperse some of themselves as spores, and thus need to form a tip no matter what.

      When the adgf mutants were exposed to ammonia just after tight mound formation, tips developed within 4 h (Fig. 6). In contrast, adgf mounds not exposed to ammonia remained at the mound stage for at least 30 h. This demonstrates that starvation alone is not sufficient to drive tip development and ammonia serves as a cue that promotes the transition from mound to tipped mound formation. 

      Many mound arrest mutants are blocked in development and do not proceed to form fruiting bodies (Carrin et al., 1994). Furthermore, not all the mound arrest mutants tested in this study were rescued by ADA enzyme (Fig. S3 A), and they continue to stay as mounds without dispersing as spores, suggesting that mound arrest in Dictyostelium can result from multiple underlying defects, whereas ammonia is an important factor controlling transition from mound to tip formation.

      (4) One can envision that the local ammonia concentration is possibly informing the mound that some minimal number of cells are present (assuming that the ammonia concentration is proportional to the number of cells), but probably even a minuscule fruiting body would be preferable to the cells compared to a mound. This latter idea could be easily explored by examining the fate of the ADGF cells in the mound - do they all form spores? Do some form spores?

      Or perhaps the ADGF is secreted by only one cell type, and the resulting ammonia tells the mound that for some reason that cell type is not present in the mound, allowing some of the cells to transdifferentiate into the needed cell type. Thus, elucidating if all or some cells produce ADGF would greatly strengthen this puzzling story.

      A fraction of adgf mounds form bulkier spore heads by the end of 36 h as shown in Fig. 3. This late recovery may be due to the expression of other ADA isoforms. Mixing WT and adgf mutant cell lines results in a slug with the mutants occupying the prestalk region (Fig. 9) suggesting that WT ADGF favours prespore differentiation. However, it is not clear if ADGF is secreted by a particular cell type, as adenosine can be produced by both cell types, and the activity of three other intracellular ADAs may vary between the cell types. To address whether adgf expression is cell type-specific, we will isolate prestalk and prespore cells, and thereafter examine adgf expression in each population.

      ADGF activity is likely to be higher in the tip to remove excess adenosine, the tip-inhibiting molecule (Wang and Schaap, 1985). Moreover, our results show that adgf<sup>-</sup> cells with high adenosine preferentially migrate to the prestalk rather than the prespore region when mixed with WT cells. Ammonia generated from adenosine deamination could thus drive tip development and prespore differentiation.

      Reviewer #2 (Public review):

      Summary:

      The paper describes new insights into the role of adenosine deaminase-related growth factor (ADGF), an enzyme that catalyses the breakdown of adenosine into ammonia and inosine, in tip formation during Dictyostelium development. The ADGF null mutant has a pre-tip mound arrest phenotype, which can be rescued by the external addition of ammonia. Analysis suggests that the phenotype involves changes in cAMP signalling possibly involving a histidine kinase dhkD, but details remain to be resolved.

      Strengths:

      The generation of an ADGF mutant showed a strong mound arrest phenotype and successful rescue by external ammonia. Characterization of significant changes in cAMP signalling components, suggesting low cAMP signalling in the mutant and identification of the histidine kinase dhkD as a possible component of the transduction pathway. Identification of a change in cell type differentiation towards prestalk fate

      Weaknesses:

      (1) Lack of details on the developmental time course of ADGF activity and cell type type-specific differences in ADGF expression.

      ADGF expression was examined at 0, 8, 12, and 16 h (Fig. 1), and the total ADA activity was assayed at 12 and 16 h (Fig. 4). As per the reviewer’s suggestion, we have now included the 12 h data (Fig. 4A) to provide additional insights into the kinetics of ADGF activity. The adgf expression was found to be highest at 16 h and hence, the ADA assay was carried out at that time point. However, the ADA assay will not exclusively reflect ADGF activity since it reports the activity of the three other isoforms as well.

      A fraction of adgf<sup>-</sup> mounds form bulkier spore heads by the end of 36 h as shown in Fig. 3. This late recovery may be due to the expression of the other ADA isoforms. Mixing WT and adgf mutant cell lines results in a slug with the mutants occupying the prestalk region (Fig. 9), suggesting that WT adgf favours prespore differentiation.

      However, it’s not clear if ADGF is secreted by a particular cell type, as adenosine can be produced by both cell types, and the activity of the other three intracellular ADAs may vary between the cell types. To address whether adgf expression is cell typespecific, we will isolate prestalk and prespore cells, and thereafter examine adgf expression in each population.

      ADGF activity is likely to be higher in the tip to remove excess adenosine, the tipinhibiting molecule (Wang and Schaap, 1985). Moreover, our results show that adgf<sup>-</sup> cells with high adenosine preferentially migrate to the prestalk rather than the prespore region when mixed with WT cells.

      (2) The absence of measurements to show that ammonia addition to the null mutant can rescue the proposed defects in cAMP signalling.

      The cAMP levels were measured at two time points 8 h and 12 h in the mutant. The adgf mutant has lower ammonia levels (Fig. 6), diminished acaA expression (Fig. 7) and reduced cAMP levels (Fig. 7) in comparison to WT at both 12 and 16 h of development. Since ammonia is known to increase cAMP levels (Riley and Barclay, 1990; Feit et al., 2001), addition of ammonia addition to the mutant is likely to increase acaA expression, thereby rescuing the defects in cAMP signalling.

      (3) No direct measurements in the dhkD mutant to show that it acts upstream of adgf in the control of changes in cAMP signalling and tip formation.

      The histidine kinases dhkD and dhkC are reported to modulate phosphodiesterase RegA activity, thereby maintaining cAMP levels (Singleton et al., 1998; Singleton and Xiong, 2013). By activating RegA, dhkD ensures proper cAMP distribution within the mound, which is essential for the patterning of prestalk and prespore cells, as well as for tip formation (Singleton and Xiong, 2013). Therefore, ammonia exposure to dhkD mutants is likely to regulate cAMP signalling and thereby tip formation. We will address this issue by measuring cAMP levels in the dhkD mutant.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Overview:

      We appreciate all the constructive comments from the reviewer and the reviewing editor, as their suggestions have significantly improved our manuscript. In response to their comments, we have made several key revisions: First, we have performed new colocalization analyses between the active zone marker UNC-10::GFP and all UNC-13L variants (UNC13L, UNC-13L<sup>HK</sup>, UNC-13L<sup>D1-5N</sup>, and UNC-13L<sup>HK+D1-5N</sup>, all tagged with mApple). These results confirm that the mutations do not affect synaptic localization. Second, we have provided a clearer explanation of the “gain-of-function” term used in this study, emphasizing that it reflects an increased SV release due to C1-C2B module dysfunction rather than a single mechanistic state. Third, we have expanded the discussion on the physiological implications of the C1-C2B model, particularly its role in regulating synaptic transmission under varying neuronal activity conditions. Finally, to improve clarity and focus, we have removed unnecessary speculative discussions, ensuring that the revised manuscript centers on the most relevant findings.

      We have reorganized the manuscript to incorporate these new results into the figures and text. Full responses to all reviewer comments are provided below. We hope that the reviewer and the editor find these revisions satisfactory and that our manuscript is now suitable for publication in eLife.

      Joint Public Review:

      Summary:

      In this manuscript, the authors investigate how different domains of the presynaptic protein UNC-13 regulate synaptic vesicle release in the nematode C. elegans. By generating numerous point mutations and domain deletions, they propose that two membrane-binding domains (C1 and C2B) can exhibit "mutual inhibition," enabling either domain to enhance or restrain transmission depending on its conformation. The authors also explore additional Nterminal regions, suggesting that these domains may modulate both miniature and evoked synaptic responses. From their electrophysiological data, they present a "functional switch" model in which UNC-13 potentially toggles between a basal state and a gain-of-function state, though the physiological basis for this switch remains partly speculative.

      Strengths:

      (1) The authors conduct a thorough exploration of how mutations in the C1, C2B, and other regulatory domains affect synaptic transmission. This includes single, double, and triple mutations, as well as domain truncations, yielding a large, informative dataset.

      (2) The study includes systematically measuring both spontaneous and evoked synaptic currents at neuromuscular junctions, under various experimental conditions (e.g., different Ca²⁺ levels), which strengthens the reliability of their functional conclusions.

      (3) Findings that different domain disruptions produce distinct effects on mEPSCs, mIPSCs, and evoked EPSCs suggest UNC-13 may adopt an elevated functional state to regulate synaptic transmission.

      Weaknesses:

      It remains unclear whether the various domain alterations truly converge on a single "gain-offunction" state or instead represent multiple pathways for enhancing UNC-13 activity. Different mutations selectively affect spontaneous or evoked release, suggesting that each variant may not share the same underlying mechanism. Moreover, many conclusions rely on combining domain deletions or point mutations, yet the electrophysiological data show distinct outcomes across EPSCs, IPSCs, mini, and evoked responses. This raises questions about whether these manipulations all act on the same pathway and whether their observed additivity or suppression genuinely reflects a single mechanistic process. A unifying model-or at least a clearer explanation of why the authors infer one mechanistic state across different domain manipulations would strengthen the paper's conclusions.

      We appreciate the comment and understand the potential confusion regarding the use of the term "gain-of-function" in the manuscript. To clarify, the gain-of-function state described in this study does not refer to a single specific mechanistic change in UNC-13 but rather to a high synaptic vesicle (SV) release state achieved by disrupting the C1-C2B module - either through dysfunction of the C1 domain or the C2B domain (as seen with the HK and DN mutations).

      Our findings support a "seesaw" model in which the C1 and C2B domains maintain a dynamic balance in their interaction with the plasma membrane, binding to DAG and PIP2. This balance may increase the energy barrier for SV release, preventing excessive neurotransmitter release under basal conditions. However, the C1-C2B toggle may be disrupted by high neuronal activity and act in an unbalanced state, thereby enhancing synaptic transmission (i.e., the gain-of-function state). To address these concerns, we have provided a clearer explanation of this functional switch in the revised version of the manuscript (page 27).

      Regarding the differences between spontaneous and evoked neurotransmitter release, our previous studies have revealed that these two forms of release do not always respond similarly to various unc-13 mutations. This is a common phenomenon observed in other synaptic protein mutants, including synaptotagmin, tomosyn, and complexin, which indicates distinct yet partially overlapping regulatory mechanisms. Our model is well supported by most of the electrophysiological results from HK, DN, and HK+DN mutations across different unc-13 isoforms (UNC-13L, UNC-13S, UNC-13R, UNC-13ΔC2A, UNC-13ΔX). The main exception is that in UNC-13ΔX<sup>HK+DN</sup> mutants, the changes in mEPSCs and mIPSCs differ from those observed in evoked EPSCs. This suggests that the mechanisms regulating the functional switch of unc-13 may differ slightly between spontaneous and evoked release. Since the X region of unc-13 and Munc13 remains largely uncharacterized, our findings provide intriguing insights into its potential functional role.

      The manuscript proposes that UNC-13 toggles from a basal to a "gain-of-function" state under normal synaptic activity. However, it does not address when or how this switch might occur in vivo, since it is demonstrated principally via artificial mutations. Providing direct evidence or additional discussion of such switching under physiological conditions would be particularly informative.

      What is the physiological significance of the proposed gain-of-function state? The data suggest that certain mutants (e.g., HK+D1-5N) lacking the gain-of-function state can still support synaptic transmission at wild-type levels. How do the authors reconcile this with the idea that the gain-of-function state plays a critical role at the synapse?

      We appreciate these comments. While our model is mainly based on the dysfunction of the C1-C2B module (through HK and DN mutations), it provides a potential physiological framework for understanding how the structural balance of C1-C2B relates to the variability of synaptic transmission in the nervous system. In the CNS, synaptic transmission is highly variable, and the temporal pattern of the presynaptic activity may require dynamic switching of the fusion machinery, including UNC-13, between different functional modes, thereby triggering synaptic transmission at various levels. Our model suggests that under conditions of high neuronal activity, the C1-C2B module may transition from a balanced to an unbalanced state (gain-of-function state), thereby enhancing synaptic transmission.

      Regarding the physiological significance of the gain-of-function state, we acknowledge that certain mutants (e.g., HK+D1-5N) lacking this state can still support wild-type levels of synaptic transmission. This observation suggests that the gain-of-function state may not be strictly required for baseline synaptic function but rather plays a modulatory role under specific conditions, such as heightened neuronal activity or synaptic plasticity. Further investigations will be needed to determine the precise in vivo triggers and functional consequences of this switch under physiological conditions. Moreover, we will focus on several linker regions (between C1 and C2B, C2B and MUN) to investigate their potential roles in regulating synaptic transmission and their broader functional significance in UNC-13 dynamics.

      The authors determined the fluorescence intensity of mApple-tagged UNC-13 variants (Figure 1J-K and Figure 7J-K), finding no significant changes compared to the wild-type. However, a more detailed analysis of the density or distribution of fluorescent puncta in axons could clarify whether certain mutations alter the localization of UNC-13 at synapses. Demonstrating colocalization with wild-type UNC-13 (or another presynaptic marker) would help rule out mislocalization effects.

      We appreciate the comment. In response, we have included a more detailed analysis of the synaptic localization of both wild-type and mutated UNC-13L in the revised manuscript. Our data show that in all scenarios, UNC-13 proteins exhibit strong colocalization with the active zone marker UNC-10::GFP (Figure 1L). Along with the fluorescence intensity data in Figure 1J, our findings indicate that the C1 and C2B mutations do not affect the expression level or the localization of UNC-13 at synapses. These results have been incorporated into the revised manuscript (page 8) and in Figure 1L.

      The study mainly relies on extrachromosomal transgenes, which can show variable copy numbers and expression levels among individual worm strains. This variability might complicate interpretation, as differences in expression could mask or exaggerate certain phenotypes.

      We agree that the expression levels of synaptic proteins can influence synaptic transmission levels. However, given the large number of mutations and truncations employed in this study, generating single-copy rescue lines for all transgenic strains would be a significant undertaking. On average, we need to microinject 50-100 worms to obtain one single-copy line, whereas injecting only 5-10 worms allows us to generate at least three independent extrachromosomal arrays. Based on our previous work, we found that the synaptic transmission levels are comparable between various extrachromosomal rescue arrays of unc13 and their single-copy rescue lines (e.g., UNC-13L, UNC-13S, UNC-13R, UNC-13ΔC2A, UNC-13ΔC2B, etc.). In future studies, we aim to use single-copy expression or CRISPRbased methods to induce deletions or mutations in various synaptic proteins.

      Finally, the discussion is somewhat diffused. Streamlining the text to focus on the most direct connections would help readers pinpoint the key conclusions and open questions.

      We appreciate the comment. As suggested, we have refined the discussion section. Specifically, we have removed the last part of the discussion (Functional roles of the linkers in UNC-13).  

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Clarify the "Gain-of-Function" State. Provide stronger justification or explicit discussion of whether all manipulations that enhance SV release truly correspond to the same mechanistic state or if multiple conformational states might be at play.

      The “gain-of-function” state in this manuscript refers to a specific conformational status of UNC-13 that enhances synaptic vesicle (SV) release probability (both spontaneous and evoked) as a result of mutations (HK and DN) in the C1 and C2B domains. This effect is observed across multiple UNC-13 isoforms, including UNC-13L, UNC-13S, and UNC-13R. Prior studies from our group and others have demonstrated that C1 and C2B exhibit conserved functions in regulating synaptic transmission (Li et al., 2019, Cell Reports; Liu et al., 2021, Cell Reports; Michelassi et al., 2017, Neuron), supporting the idea that these domains share a common mechanism for modulating SV release. Given that C1 and C2B act as a functional unit (Michelassi et al., 2017, Neuron; and this study), we define all synaptic states induced by the dysfunction of these two domains as the "gain-of-function" mode.

      However, it is important to note that this classification does not apply to high-release probability states induced by mutations in other domains.

      The concept of a gain-of-function state due to C1 and C2B dysfunction has been previously proposed in studies of Munc13. Basu et al. (2007, Journal of Neuroscience) demonstrated that the H567K mutation in Munc13-1 C1 increases both spontaneous and evoked release probability, leading to a gain-of-function mode. Similarly, work from the Südhof group showed that KW and DN mutations in Munc13-1 C2B also enhance release probability, thereby inducing a gain-of-function state (Shin et al., 2010, Nature Structural & Molecular Biology). Our recent findings further support this idea, showing that UNC-13 C2B D3,4N (Li et al., 2019, Cell Reports; Liu et al., 2021, Cell Reports; Michelassi et al., 2017, Neuron) and the newly identified D1-5N mutation (this study) significantly elevate SV release, consistent with the D1,2N mutations reported by Shin et al.

      Overall, our study integrates and extends previous findings, providing strong evidence that the C1 and C2B domains function as a regulatory switch between a basal physiological mode, a gain-of-function mode (enhanced release), and a loss-of-function mode (impaired release). This framework advances our understanding of how C1 and C2B dysfunction affects synaptic transmission and plasticity.

      (2) Add comparisons to wild-type UNC-13L: When presenting data for deletions/mutants as "controls," include a visual reference (e.g., dashed line in figures) showing wild-type UNC13L levels. This will help readers see whether each construct is above or below the normal activity baseline.

      As suggested, a dashed line showing the level of UNC-13L has been added to the bar graphs of all evoked EPSCs. The functional switch model is well supported by the results of the evoked EPSCs.

      (3) Mutant and wild-type UNC-13 colocalization analysis: Demonstrating whether each mutant localizes robustly to synapses, in comparison to wild-type UNC-13, would bolster the interpretation of electrophysiological changes. If the authors have these data, adding them would address the possibility of mislocalization.

      We agree with the reviewer that there would be value to address the possibility of mislocalization. However, in our experience working with UNC-13 mutant colocalization, we have found that neither deleting the X, C1 and C2B domains in UNC-13L  nor deleting C1 and C2B domain in UNC-13MR or UNC-13R altered the synaptic colocalization with the active zone protein UNC-10/RIM (Li 2019, Liu 2021), suggesting that C1 and C2B domains in UNC-13 are not involved in the regulation of protein localization. Thus, the mutations in the C1 and C2B domains are unlikely leading to protein mislocalization in the synaptic region.

      (4) If possible, adding analysis using single-copy transgenes to confirm that extrachromosomal array expression variability does not qualitatively change the conclusions.

      We strongly agree with the reviewer that single-copy transgenes would provide more stable protein expression levels and further consolidate our conclusions. However, several factors give us confidence that the extrachromosomal array rescue approach does not introduce significant variability in our results: First, our prior research has shown that SV release levels are generally comparable between extrachromosomal arrays carrying various unc13 transgenes and their corresponding single-copy rescue lines (e.g., UNC-13L, UNC-13S, UNC-13R, UNC-13ΔC2A, and UNC-13ΔC2B). Second, the major conclusions in this study are drawn from highly consistent and robust changes in SV release between different rescue lines (e.g., UNC-13L<sup>HK+DN</sup> vs UNC-13L<sup>DN</sup>; UNC-13S<sup>HK+DN</sup> vs UNC-13S<sup>HK</sup> or UNC-13S<sup>DN</sup> ). Third, our imaging data indicate that the protein levels are indistinguishable between different unc-13 rescue arrays carrying C1 and C2B mutations, further supporting the validity of our findings.

      Additionally, due to our recent relocation to a new institute, we are still in the process of setting up our microinjection system. Generating single-copy transgenes for all the extrachromosomal arrays used in this study would require significant time. We appreciate the reviewer’s understanding of our current situation. For our future studies regarding unc-13 and other synaptic proteins, we will prefer to use single-copy expression rather than extrachromosomal arrays.

      (5) Reduce the length and speculation in the Discussion. A concise discussion that focuses on the most direct implications of the present findings will help improve the readability of this paper.

      We appreciate the comment. As suggested, we have refined the discussion section.

      Specifically, the last part of the discussion (Functional roles of the linkers in UNC-13) was removed.

      (6) Minor formatting detail: In Figure 5C (left panel), adjust the y-axis label to ensure it aligns properly and improves clarity.

      We appreciate the reviewer’s suggestion and have adjusted the y-axis label accordingly in the revised version (see revised Figure 5).

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study introduces a useful deep learning-based algorithm that tracks animal postures with reduced drift by incorporating transformers for more robust keypoint detection. The efficacy of this new algorithm for single-animal pose estimation was demonstrated through comparisons with two popular algorithms. However, the analysis is incomplete and would benefit from comparisons with other state-of-the-art methods and consideration of multi-animal tracking.

      First, we would like to express our gratitude to the eLife editors and reviewers for their thorough evaluation of our manuscript. ADPT aims to improve the accuracy of body point detection and tracking in animal behavior, facilitating more refined behavioral analyses. The insights provided by the reviewers have greatly enhanced the quality of our work, and we have addressed their comments point-by-point.

      In this revision, we have included additional quantitative comparisons of multi-animal tracking capabilities between ADPT and other state-of-the-art methods. Specifically, we have added evaluations involving homecage social mice and marmosets to comprehensively showcase ADPT’s advantages from various perspectives. This additional analysis will help readers better understand how ADPT effectively overcomes point drift and expands its applicability in the field.

      Reviewer #1:

      In this paper, the authors introduce a new deep learning-based algorithm for tracking animal poses, especially in minimizing drift effects. The algorithm's performance was validated by comparing it with two other popular algorithms, DeepLabCut and LEAP.The accessibility of this tool for biological research is not clearly addressed, despite its potential usefulness. Researchers in biology often have limited expertise in deep learning training, deployment, and prediction. A detailed, step-by-step user guide is crucial, especially for applications in biological studies.

      We appreciate the reviewers' acknowledgment of our work. While ADPT demonstrates superior performance compared to DeepLabCut and SLEAP, we recognize that the absence of a user-friendly interface may hinder its broader application, particularly for users with a background solely in biology. In this revision, we have enhanced the command-line version of the user tutorial to provide a clear, step-by-step guide. Additionally, we have developed a simple graphical user interface (GUI) to further support users who may not have expertise in deep learning, thereby making ADPT more accessible for biological research.

      The proposed algorithm focuses on tracking and is compared with DLC and LEAP, which are more adept at detection rather than tracking.

      In the field of animal pose estimation, the distinction between detection and tracking is often blurred. For instance, the title of the paper "SLEAP: A deep learning system for multi-animal pose tracking" refers to "tracking," while "detection" is characterized as "pose estimation" in the body text. Similarly, "Multi-animal pose estimation, identification, and tracking with DeepLabCut" uses "tracking" in the title, yet "detection" is also mentioned in the pose estimation section. We acknowledge that referencing these articles may have contributed to potential confusion.

      To address this, we have clarified the distinction between "tracking" and "detection" Results section under " Anti-drift pose tracker." (see lines 118-119). In this paper, we now explicitly use “track” to refer to the tracking of all body points or poses of an individual, and “detect” for specific keypoints.

      Reviewer #1 recommendations:

      (1) DLC and LEAP are mainly good in detection, not tracking. The authors should compare their ADPT algorithm with idtracker.ai, ByteTrack, and other advanced tracking algorithms, including recent track-anything algorithms.

      (2) DeepPoseKit is outdated and no longer maintained; a comparison with the T-REX algorithm would be more appropriate.

      We appreciate the reviewer's suggestion for a more comprehensive comparison and acknowledge the importance of including these advanced tracking algorithms. However, we have not yet found suitable publicly available datasets for such comparative testing. We appreciate this insight and will consider incorporating T-REX into future comparisons.

      (3) The authors primarily compared their performance using custom data. A systematic comparison with published data, such as the dataset reported in the paper "Multi-animal pose estimation, identification, and tracking with DeepLabCut," is necessary. A detailed comparison of the performances between ADPT and DLC is required.

      In the previous version of our manuscript, we included the SLEAP single-fly public dataset and the OMS_dataset from OpenMonkeyStudio for performance comparisons. We recognize that these datasets were not comprehensive. In this revision, we have added the marmoset dataset from "Multi-animal pose estimation, identification, and tracking with DeepLabCut" and a customized homecage social mice dataset to enhance our comparative analysis of multi-animal pose estimation performance. Our comprehensive comparison reveals that ADPT outperforms both DLC and SLEAP, as discussed in the Results section under "ADPT can be adapted for end-to-end pose estimation and identification of freely social animals.". (Figure 1, see lines 303-332)

      (4) Given the focus on biological studies, an easy-to-use interface and introduction are essential.

      In this revision, we have not only developed a GUI for ADPT but also included a more detailed tutorial. This can be accessed at https://github.com/tangguoling/ADPT-TOOLBOX

      Reviewer #2:

      The authors present a new model for animal pose estimation. The core feature they highlight is the model's stability compared to existing models in terms of keypoint drift. The authors test this model across a range of new and existing datasets. The authors also test the model with two mice in the same arena. For the single animal datasets the authors show a decrease in sudden jumps in keypoint detection and the number of undetected keypoints compared with DeepLabCut and SLEAP. Overall average accuracy, as measured by root mean squared error, generally shows similar but sometimes superior performance to DeepLabCut and better performance compared to SLEAP. The authors confusingly don't quantify the performance of pose estimation in the multi (two) animal case instead focusing on detecting individual identity. This multi-animal model is not compared with the model performance of the multi-animal mode of DeepLabCut or SLEAP.

      We appreciate the reviewer's thoughtful assessment of our manuscript. Our study focuses on addressing the issue of keypoint drift prevalent in animal pose estimation methods like DeepLabCut and SLEAP. During the model design process, we discovered that the structure of our model also enhances performance in identifying multiple animals. Consequently, we included some results related to multi-animal identity recognition in our manuscript.

      In recent developments, we are working to broaden the applicability of ADPT for multi-animal pose estimation and identity recognition. Given that our manuscript emphasizes pose estimation, we have added a comparison of anti-drift performance in multi-animal scenarios in this revision. This quantifies ADPT's capability to mitigate drift in multi-animal pose estimation.

      Using our custom Homecage social mice dataset, we compared ADPT with DeepLabCut and SLEAP. The results indicate that ADPT achieves more accurate anti-drift pose estimation for two mice, with superior keypoint detection accuracy. Furthermore, we also evaluated pose estimation accuracy on the publicly available marmoset dataset, where ADPT outperformed both DeepLabCut and SLEAP. These findings are discussed in the Results section under "ADPT can be adapted for end-to-end pose estimation and identification of freely social animals."

      The first is a tendency to make unsubstantiated claims that suggest either model performance that is untested or misrepresents the presented data, or suggest excessively large gaps in current SOTA capabilities. One obvious example is in the abstract when the authors state ADPT "significantly outperforms the existing deep-learning methods, such as DeepLabCut, SLEAP, and DeepPoseKit." All tests in the rest of the paper, however, only discuss performance with DeepLabCut and SLEAP, not DeepPoseKit. At this point, there are many animal pose estimation models so it's fine they didn't compare against DeepPoseKit, but they shouldn't act like they did.

      We appreciate the reviewer's feedback regarding unsubstantiated claims in our manuscript. Upon careful review, we acknowledge that our previous revisions inadvertently included statements that may misrepresent our model's performance. In particular, we have revised the abstract to eliminate the mention of DeepPoseKit, as our comparisons focused exclusively on DeepLabCut and SLEAP.

      In addition to this correction, we have thoroughly reviewed the entire manuscript to address other instances of ambiguity and ensure that our claims are well-supported by the data presented. Thank you for bringing this to our attention; we are committed to maintaining the integrity of our claims throughout the paper.

      In terms of making claims that seem to stretch the gaps in the current state of the field, the paper makes some seemingly odd and uncited statements like "Concerns about the safety of deep learning have largely limited the application of deep learning-based tools in behavioral analysis and slowed down the development of ethology" and "So far, deep learning pose estimation has not achieved the reliability of classical kinematic gait analysis" without specifying which classical gait analysis is being referred to. Certainly, existing tools like DeepLabCut and SLEAP are already widely cited and used for research.

      In this revision, we have carefully reviewed the entire manuscript and addressed the instances of seemingly odd and unsubstantiated claims. Specifically, we have revised the statements "largely limited" to "limited" to ensure accuracy and clarity. Additionally, we thoroughly reviewed the citation list to ensure proper attribution, incorporating references such as "A deep learning-based toolbox for Automated Limb Motion Analysis (ALMA) in murine models of neurological disorders" to better substantiate our claims and provide a clearer context.

      We have also added an additional section to comprehensively discuss the applications of widely-used tools like DeepLabCut and SLEAP in behavioral research. This new section elaborates on the challenges and limitations researchers encounter when applying these methods, highlighting both their significant contributions and the areas where improvements are still needed.

      The other main weakness in the paper is the validation of the multi-animal pose estimation. The core point of the paper is pose estimation and anti-drift performance and yet there is no validation of either of these things relating to multi-animal video. All that is quantified is the ability to track individual identity with a relatively limited dataset of 10 mice IDs with only two in the same arena (and see note about train and validation splits below). While individual tracking is an important task, that literature is not engaged with (i.e. papers like Walter and Couzin, eLife, 2021: https://doi.org/10.7554/eLife.64000) and the results in this paper aren't novel compared to that field's state of the art. On the other hand, while multi-animal pose estimation is also an important problem the paper doesn't engage with those results either. The two methods already used for comparison in the paper, SLEAP and DeepPoseKit, already have multi-animal models and multi-animal annotated datasets but none of that is tested or engaged with in the paper. The paper notes many existing approaches are two-step methods, but, for practitioners, the difference is not enough to warrant a lack of comparison.

      We appreciate the reviewer's insights regarding the validation of multi-animal pose estimation in our paper. While our primary focus has been on pose estimation and anti-drift performance, we recognize the importance of validating these aspects within the context of multi-animal videos.

      In this revision, we have included a comparison of ADPT's anti-drift performance in multi-animal pose estimation, utilizing our custom Homecage social mouse dataset (Figure 1A). Our findings indicate that ADPT achieves more accurate pose estimation for two mice while significantly reducing keypoint drift, outperforming both DeepLabCut and SLEAP. (see lines 311-322). We trained each model three times, and this figure presents the results from one of those training sessions. We calculated the average RMSE between predictions and manual labels, demonstrating that ADPT achieved an average RMSE of 15.8 ± 0.59 pixels, while DeepLabCut (DLC) and SLEAP recorded RMSEs of 113.19 ± 42.75 pixels and 94.76 ± 1.95 pixels, respectively (Figure 1C). ADPT achieved an accuracy of 6.35 ± 0.14 pixels based on the DLC evaluation metric across all body parts of the mice, while DLC reached 7.49 ± 0.2 pixels (Figure 1D). ADPT achieved 8.33 ± 0.19 pixels using the SLEAP evaluation Metric across all body parts of the mice, compared to SLEAP’s 9.82 ± 0.57 pixels (Figure 1E).

      Furthermore, we have conducted pose estimation accuracy evaluations on the publicly available marmoset dataset from DeepLabCut, where ADPT also demonstrated superior performance compared to DeepLabCut and SLEAP. These results can be found in the "ADPT can be adapted for end-to-end pose estimation and identification of freely social animals" section of the Results. (see lines 323-329)

      We acknowledge the existing literature on multi-animal tracking, such as the work by Walter and Couzin (2021). While individual tracking is crucial, our primary focus lies in the effective tracking of animal poses and minimizing drift during this process. This dual emphasis on pose tracking and anti-drift performance distinguishes our work and aligns with ongoing advancements in the field. Engaging with relevant literature, highlights the importance of contextualizing our results within the broader tracking literature, demonstrating that while our findings may overlap with existing methods, the unique focus on improving tracking stability and reducing drift presents valuable contributions to the field. Thank you for your valuable feedback, which has helped us improve the robustness of our manuscript.

      The authors state that "The evaluation of our social tracking capability was performed by visualizing the predicted video data (see supplement Videos 3 and 4)." While the authors report success maintaining mouse ID, when one actually watches the key points in the video of the two mice (only a single minute was used for validation) the pose estimation is relatively poor with tails rarely being detected and many pose issues when the mice get close to each other.

      We acknowledge that there are indeed challenges in pose estimation, particularly when the two mice get close to each other, leading to tracking failures and infrequent detection of tails in the predicted videos. The reasons for these issues can be summarized as follows:

      Lack of Training Data from Real Social Scenarios: The training data used for the social tracking assessment were primarily derived from the Mix-up Social Animal Dataset, which does not fully capture the complexities of real social interactions. In future work, we plan to incorporate a blend of real social data and the Mix-up data for model training. Specifically, we aim to annotate images where two animals are in close proximity or interacting to enhance the model's understanding of genuine social behaviors.

      Challenges in Tail Tracking in Social Contexts: Tracking the tails of mice in social situations remains a significant challenge. To validate this, we have added an assessment of tracking performance in real social settings using homecage data. Our findings indicate that using annotated data from real environments significantly improves tail tracking accuracy, as demonstrated in the supplementary video.

      We appreciate your feedback, which highlights critical areas for improvement in our model.

      Finally, particularly in the methods section, there were a number of places where what was actually done wasn't clear.

      We have carefully reviewed and revised the corresponding parts to clarify the previously incomprehensible statements. Thank you for your valuable feedback, which has helped enhance the clarity of our methods.

      For example in describing the network architecture, the authors say "Subsequently, network separately process these features in three branches, compute features at scale of one-fourth, one-eight and one-sixteenth, and generate one-eight scale features using convolution layer or deconvolution layer." Does only the one-eight branch have deconvolution or do the other branches also?

      We apologize for the confusion this has caused. Upon reviewing our manuscript, we identified an error in the diagram. In the revised version, we have clarified that the model samples feature maps at multiple resolutions and ultimately integrates them at the 1/8 resolution for feature fusion. Specifically, the 1/4 feature map from ResNet50's stack 2 is processed through max-pooling and convolution to generate a 1/8 feature map. Additionally, the 1/4 feature map from ResNet50's stack 2 is also transformed into a 1/8 feature map using a convolution operation with a stride of 2. Finally, both the input and output of the transformer are at the 1/16 resolution, which can be trained on a 2080Ti GPU. The 1/16 feature map is then upsampled to produce the final 1/8 feature map. We have updated the manuscript to reflect these changes, and we also modified the model architecture diagram for better clarity.

      Similarly, for the speed test, the authors say "Here we evaluate the inference speed of ADPT. We compared it with DeepLabCut and SLEAP on mouse videos at 1288 x 964 resolution", but in the methods section they say "The image inputs of ADPT were resized to a size that can be trained on the computer. For mouse images, it was reduced to half of the original size." Were different image sizes used for training and validation? Or Did ADPT not use 1288 x 964 resolution images as input which would obviously have major implications for the speed comparison?

      For our inference speed evaluation, all models, including ADPT, used images with a resolution of 1288 x 964. In ADPT's processing pipeline, the first layer is a resizing layer designed to compress the images to a scale determined by the global scale parameter. For the mouse images, we set the global scale to 0.5, allowing our GPU to handle the data at that resolution during transformer training.

      We recorded the time taken by ADPT to process the entire 15-minute mouse video, which included the time taken for the resizing operation, and subsequently calculated the frames per second (FPS). We have clarified this process in the manuscript, particularly in the "Network Architecture" section, where we specify: "Initially, ADPT will resize the images to a390 scale (a hyperparameter, consistent with the global scale in the DLC configuration)."

      Similarly, for the individual ID experiments, the authors say "In this experiment, we used videos featuring different identified mice, allocating 80% of the data for model training and the remaining 20% for accuracy validation." Were frames from each video randomly assigned to the training or validation sets? Frames from the same video are very correlated (two frames could be just 1/30th of a second different from each other), and so if training and validation frames are interspersed with each other validation performance doesn't indicate much about performance on more realistic use cases (i.e. using models trained during the first part of an experiment to maintain ids throughout the rest of it.)

      In our study, we actually utilized the first 80% of frames from each video for model training and the remaining 20% for testing the model's ID tracking accuracy. We have revised the relevant description in the manuscript to clarify this process. The updated description can be found in the "Datasets" section under "Mouse Videos of Different Individuals."

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study aims to uncover molecular and structural details underlying the broad substrate specificity of glycosaminoglycan lyases belonging to a specific family (PL35). They determined the crystal structures of two such enzymes, conducted in vitro enzyme activity assays, and a thorough structure-guided mutagenesis campaign to interrogate the role of specific residues. They made progress towards achieving their aims but I see significant holes in data that need to be determined and in the authors' analyses.

      Impact on the field:

      I expect this work will have a limited impact on the field, although, with additional experimental work and better analysis, this paper will be able to stand on its own as a solid piece of structure-function analysis.

      Strengths:

      The major strengths of the study were the combination of structure and enzyme activity assays, comprehensive structural analysis, as well as a thorough structure-guided mutagenesis campaign.

      Weaknesses:

      There were several weaknesses, particularly:

      (1) The authors claim to have done an ICP-MS experiment to show Mn2+ binds to their enzyme but did not present the data. The authors could have used the anomalous scattering properties of Mn2+ at the synchrotron to determine the presence and location of this cation (i.e. fluorescence spectra, and/or anomalous data collection at the Mn2+ absorption peak).

      Thank you for your kind comment and suggestion. Many studies utilized ICP-MS for the detection of metal ions within proteins (doi: 10.1016/j.jbc.2023.103047; doi: 10.1074/jbc.RA119.011790), so we utilized this method to determine the type of atoms within GAGases. In the revised manuscript, the data of ICP-MS experiment has been presented in “Supplemental Table S1”

      (2) The authors have an over-reliance on molecular docking for understanding the position of substrates bound to the enzyme. The docking analysis performed was cursory at best; Autodock Vina is a fine program but more rigorous software could have been chosen, as well we molecular dynamics simulations. As well the authors do not use any substrate/product-bound structures from the broader PL enzyme family to guide the placement of the substrates in the GAGases, and interpret the molecular docking models.

      Thank you for your kind comments. The interaction between the enzyme and ligand should be confirmed by resolving the structure of enzyme-ligand complex. Unfortunately, we tried to prepare the co-crystals of GAGases with various oligosaccharide substrates but ultimately failed. Thus, we tried to use docking to explain the catalytic mechanism of polysaccharide lyases using Autodock Vina although this method may be questionable. In the revised manuscript, we predicted the substrate binding site of GAGase II using Caver Web 1.2 and performed molecular docking near the substrate binding site simultaneously using Molecular Operating Environment (MOE) to verify the accuracy of the docking results (Figure 6, Supplemental Figure S4). In addition, a series of enzyme-substrate complex structures of identified PL family enzymes with structural similarities to the GAGases are showed in Supplemental Figure S2, and the positions of the catalytic cavities and the substrate binding modes are similar to those of the molecular docking results, which may also corroborate the referability of our molecular docking results in another aspect.

      (3) The conclusion that the structures of GAGase II and VII are most similar to the structures of alginate lyases (Table 2 data), and the authors' reliance on DALI, are both questioned. DALI uses a global alignment algorithm, which when used for multi-domain enzymes such as these tends to result in sub-optimal alignment of active site residues, particularly if the active site is formed between the two domains as is the case here. The authors should evaluate local alignment methods focused on the optimization of the superposition of a single domain; these methods may result in a more appropriate alignment of the active site residues and different alignment statistics. This may influence the overall conclusion of the evolutionary history of these PL35 enzymes.

      Thank you for your kind question. As your suggestion, multiple structural alignment assays were carried out for the (α/α)<sub>n</sub> toroid and the antiparallel β-sheet domain, respectively, based on the structures of GAGs/alginate lyases from PL5, PL8, PL12, PL15, PL17, PL21, PL23, PL36, PL38 and PL39 families. The results showed that the overall structure of GAGases is more similarity to that of PL15, PL17 and PL39 family alginate lyases, which have an (α/α)<sub>6</sub> toroid and an antiparallel β-sheet domain (Table 3). In terms of the toroid and antiparallel β-sheet domains, most of them have an (α/α)<sub>6</sub> toroid and an antiparallel β-sheet as shown in Table 3. We also noticed that GAGases possess such a (α/α)<sub>6</sub> toroid structure rather than a (α/α)<sub>7</sub> toroid structure, and revised the relevant statement in the manuscript.

      (4) The data on the GAGase III residue His188 is not well interpreted; substitution of this residue clearly impacts HA and HS hydrolysis as well. The data on the impact on alginate hydrolysis is weak, which could be due to the fact that the WT enzyme has poor activity against alginate to start with.

      Thank you very much for your helpful comments and questions. To verify your suggestion that the weak impact of alginate hydrolysis could be due to poor activity of wild type GAGase III, we degraded alginate using different enzyme concentrations (3 to 30 μg) and analyzed the degradation products. The results showed that the alginate-degrading activity of GAGase III-H188A and GAGase III-H188N was abolished, even at a quite high ratio of the mutated enzyme to substrate such as 30 μg enzyme to 30 μg substrate (Supplemental Figure S3A), while their GAG-degrading activity was only partially affected, indicating that this residue plays a more important role for the digestion of alginate than other substrates. Unfortunately, we were unable to confer the ability to GAGase III through the mutation of N191H in GAGase II. Therefore, we suggest that His<sup>188</sup> play a key role in the specificity of alginate degradation by GAGase III, but that other determinants also contribute to this process. We will try more methods to obtain the structure of enzyme-substrate co-crystals and explain its substrate-selective mechanism in future studies.

      (5) The authors did not use the words "homology", "homologous", or "homolog" correctly (these terms mean the subjects have a known evolutionary relationship, which may or may not be known in the contexts the authors used these targets); the words "similarity" and "similar" are recommended to be used instead.

      Thank you for your helpful suggestions. We have revised the relevant part of the description in the manuscript.

      (6) The authors discuss a "shorter" cavity in GAGases, which does not make sense and is not supported by any figure or analysis. I recommend a figure with a surface representation of the various enzymes of interest, with dimensions of the cavity labeled (as a supplemental figure). The authors also do not specifically define what subsites are in the context of this family of enzymes, nor do they specifically label or indicate the location of the subsites on the figures of the GAGase II and IV enzyme structures.

      Thank you for your helpful suggestions. Figures (Supplemental Figure S2) with surface representations of the GAGase II and some structurally similar GAGs/alginate lyases with the dimensions of the cavity labeled, were added to the supplementary data as you suggested. Considering the correlation between enzyme specificity and substrate binding sites, we speculated that a shorter substrate binding cavity might allow the enzyme to accommodate a wider variety of substrates, resulting in a smaller restriction of the catalytic cavity to substrate binding, although this speculation needs to be verified by the resolution of the crystal structure of the enzyme-substrate complexes.

      Reviewer #2 (Public review):

      Summary:

      Wei et al. present the X-ray crystallographic structures of two PL35 family glycosaminoglycan (GAG) lyases that display a broad substrate specificity. The structural data show that there is a high degree of structural homology between these enzymes and GAGases that have previously been structurally characterized. Central to this are the N-terminal (α/α)7 toroid domain and the C-terminal two-layered β-sheet domain. Structural alignment of these novel PL35 lyases with previously deposited structures shows a highly conserved triplet of residues at the heart of the active sites. Docking studies identified potentially important residues for substrate binding and turnover, and subsequent site-directed mutagenesis paired with enzymatic assays confirmed the importance of many of these residues. A third PL35 GAGase that is able to turn over alginate was not crystallized, but a predicted model showed a conserved active site Asn was mutated to a His, which could potentially explain its ability to act on alginate. Mutation of the His into either Ala or Asn abrogated its activity on alginate, providing supporting evidence for the importance of the His. Finally, a catalytic mechanism is proposed for the activity of the PL35 lyases. Overall, the authors used an appropriate set of methods to investigate their claims, and the data largely support their conclusions. These results will likely provide a platform for further studies into the broad substrate specificity of PL35 lyases, as well as for studies into the evolutionary origins of these unique enzymes

      Strengths:

      The crystallographic data are of very high quality, and the use of modern structural prediction tools to allow for comparison of GAGase III to GAGase II/GAGase VII was nice to see. The authors were comprehensive in their comparison of the PL35 lyases to those in other families. The use of molecular docking to identify key residues and the use of site-directed mutagenesis to investigate substrate specificity was good, especially going the extra distance to mutate the conserved Asn to His in GAGase II and GAGase VII.

      Weaknesses:

      The structural models simply are not complete. A cursory look at the electron density and the models show that there are many positive density peaks that have not had anything modelled into them. The electron density also does not support the placement of a Mn2+ in the model. The authors indicate that ICP-MS was done to identify the metal, but no ICP-MS data is presented in the main text or supplementary. I believe the authors put too much emphasis on the possibility of GAGase III representing an evolutionary intermediate between GAG lyases and alginate lyases based on a single Asn to His mutation in the active site, and I don't believe that enough time was spent discussing how this "more open and shorter" catalytic cavity would necessarily mean that the enzyme could accommodate a broader set of substrates. Finally, the proposed mechanism does not bring the enzyme back to its starting state.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor points:

      (1) The number of significant digits used in Table 1 and Figure 3 legend are not justified. The authors should use a maximum of 2 significant digits.

      Thank you for your kind suggestion. We have verified the relevant data and retained two significant digits.

      (2) The authors should use the words "mutant" or "mutation" only when discussing DNA, but when discussing protein, the words "variant" and "substitution" should be used instead as these are more appropriate.

      Thank you for your helpful suggestions. We have revised the relevant description in the manuscript as you suggested.

      (3) Lines 102-110 are a long, run-on sentence that should be split into shorter sentences. Similarly, lines 367-378 should be split into shorter sentences.

      Thank you for your suggestions. In the revised manuscript, the long sentences in lines 102-110 and 367-378 have been rewritten into shorter ones.

      (4) Lines 174-175: His, Tyr, Glu, and Trp are not positively charged residues and this wording should be changed.

      Thank you for your suggestions. We have revised the relevant description in the manuscript as you suggested.

      (5) Lines 423-426 require a reference.

      Thank you for your suggestion. We have provided the reference at the right position and revised the relevant description in the manuscript as you suggested.

      (6) Grammar/language:

      -line 90 - change "should emerge" to "likely emerged"

      -line 145 - delete "Finally"

      -line 264 - delete "their"

      -line 265 - delete "active sites"

      -line 265-266 - change to "To confirm this hypothesis, site-directed mutagenesis followed by enzyme activity assay was performed"

      -line 311 - change "residue in the catalytic cavity of GAGase III, which.." to "residue in its catalytic cavity, which..."

      -line 318 - change "affect" to "affected"

      -line 323 - change to "degrading activity of GAGase II remains to be determined outside of the His188 residue"

      -line 345 - delete "assays"

      -line 359 - change to "evidence"

      -line 397 - change "folds" to "3D fold"

      -line 420 - change to "share similar catalytic sites"

      -lines 411, 433 - change "conversed" to "conserved"

      -line 441 - change to "Mutational analysis showed that the His188.."

      -line 450 - delete "which"

      Thank you for your suggestions. Grammatical errors in the revised manuscript have been corrected in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      Major Concerns

      The electron density in your model clearly does not support the placement of a Mn ion. In the GAGase II structure, the placement of the Mn and the placement of waters around it still results in two density peaks of > 12 rmsd. The manuscript suggests that ICP-MS was done but the results of this are not shown anywhere. Please include your ICP-MS data. I see the structures have already been deposited, and if they have been deposited unchanged, please see if you can modify them to actually finish building the models. I don't find your data in Figure 2B particularly convincing that Mn is necessarily important for activity.

      Thank you for your kind comments. As we known, ICP-MS is a common method used for the detection of metal ions within proteins (doi: 10.1016/j.jbc.2023.103047; doi: 10.1074/jbc.RA119.011790), and thus we utilized it to determine the type of atoms within GAGases in this study. In the revised manuscript, the data of ICP-MS experiment has been presented in “Supplemental Table S1”, and the data clearly showed that the content of Mn<sup>2+</sup> rather than others in test sample is much higher than that in the negative control, suggesting the involvement of Mn<sup>2+</sup> in the protein. We agree that the addition of Mn<sup>2+</sup> does not show very strong promotion to the activity of GAGase II just like other tested metal ions, but the addition of EDTA significantly inhibited the enzyme activity (Figure 2), indicating that metal ion such as Mn<sup>2+</sup> is necessary for the function of GAGases. Regarding the role of metal ion, whether it participates in the catalytic reaction or only stabilize the structure of enzyme remains to be further explored in our further study.

      Minor Concerns

      (1) Please include CC1/2 in your Table 1.

      Thank you for your kind suggestions. CC1/2 parameters have been added in the revised manuscript (Table 1).

      (2) If possible please include SDS-PAGE gel images of your purified proteins. Particularly for the point mutations. Ideally, you would have done SEC on your mutants to show that the reduction in activity is not due to aggregation/misfolding, but at the very least I would to see that you have similar levels of purity.

      Thank you for your kind suggestions. As your suggestion, we have added SDS-PAGE gel images of purified GAGase II, GAGase III, GAGase VII, and their mutant enzymes to the supplementary data. As shown in Figure S5, site-directed mutagenesis did not affect the soluble expression levels of GAGase II, GAGase III or GAGase VII, indicating that the reduction in activity is not due to aggregation or misfolding. Due to the large number of variants, we used crude enzyme for the activity assay of substrate binding sites, while for some catalytic key residues, we purified the corresponding mutant enzymes and then verified their activities by HPLC.

      (3) When referring to your structural predictions, it is not appropriate to say that you used Robetta. Your reference is correct though - you should say that the structures were predicted using RoseTTAfold.

      Thank you for your helpful suggestions. We have revised the relevant description in the manuscript.

      (4) If possible expand on how the shorter/more open active site cavity would result in broader substrate specificity.

      Thank you for your kind comment. In the revised manuscript, figures (Supplemental Figure S2) with surface representations of the GAGase II and some representatively structurally similar GAGs/alginate lyases, with the dimensions of the cavity labeled, were added to the supplementary data. Considering the correlation between enzyme specificity and substrate binding sites, we speculated that a shorter substrate binding cavity might allow the enzyme to accommodate a wider variety of substrates, resulting in a smaller restriction of the catalytic cavity to substrate binding. However, unfortunately, we did not succeed in obtaining co-crystals of GAGases with any of the substrates. We will try to explain the mechanism of substrate selectivity in future studies by culturing and resolving crystals of its enzyme substrate complex or otherwise.

      (5) I would put less emphasis on His188 in GAGase III being a strong indicator that this protein represents an evolutionary intermediate between alginate lyases and GAGases.

      Thank you for your comment. The His<sup>188</sup> residue, which is unique compared to other GAGases, is essential for the alginate-degrading activity of GAGase III. Regarding why GAGases are thought to represent a possible evolutionary intermediate between alginate lyases and GAG lyases, phylogenetic analysis demonstrated that GAGases show considerable homology with some identified GAG lyases and alginate lyases (DOI: 10.1016/j.jbc.2024.107466). The similarity in primary structure between some GAG lyases, alginate lyases, and GAGases suggests structural similarities, which are further supported by this study. As structure determines function, structural similarity is often used as a key criterion when studying the evolution of proteins, the GAGase III, which shows significant GAGs and alginate-degrading activity, support for this speculation. Of course, in this study, our analysis of the evolutionary relationship between GAGases and identified GAG lyases and alginate lyases, based on structural comparison, is an attempt using existing methods. The conclusions we have drawn remain a hypothesis that still requires further evidence to support and validate.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The manuscript under review investigates the role of periosteal stem cells (P-SSC) in bone marrow regeneration using a whole-bone subcutaneous transplantation model. While the model is somewhat artificial, the findings were interesting, suggesting the migration of periosteal stem cells into the bone marrow and their potential to become bone marrow stromal cells. This indicates a significant plasticity of P-SSC consistent with previous reports using fracture models (Cell Stem Cell 29:1547, Dev Cell 59:1192).

      Major Concerns

      (1) The authors assert that the periosteal layer was completely removed in their model, which is crucial for their conclusions. To substantiate this claim, it is recommended that the authors provide evidence of the successful removal of the entire periosteal stem cell (P-SSC) population. A colony-forming assay, with and without periosteal removal, could serve as a suitable method to demonstrate this.

      We are grateful to the reviewer for this valuable suggestion. The objective of this experiment was to demonstrate that periosteal ablation impairs bone marrow regeneration, a finding that is supported by our results. We expect that ablation of the periosteum would be associated with only a partial decrease in CFU-F activity, given the presence of MSCs in the bone and in the endosteal region of the bone marrow. Therefore, CFU-F assays would be difficult to interpret in this setting. In view of the phenotype obtained, providing proof of concept of the importance of the periosteum, we do not believe that further experiments would strengthen the level of proof of this experiment.

      (2) The observation that P-SSCs do not express Kitl or Cxcl12, while their bone marrow stromal cell (BM-MSC) derivatives do, is a key finding. To strengthen this conclusion, the authors are encouraged to repeat the experiment using Cxcl12 or Scf reporter alleles. Immunofluorescence staining that confirms the migration of periosteal cells and their transformation into Cxcl12- or Scf-reporter-positive cells would significantly enhance the paper's key conclusion.

      Transplantation of periosteum isolated from Cxcl12 or Scf into WT bones is an excellent suggestion. Indeed, this experiment would confirm (1) the migration of periosteal SSC and (2) the expression of Cxcl12 and Scf by BM-MSCs derived from the periosteum .However, it should be noted that the current limitations in terms of available resources preclude the execution of these experiments. Moreover, the use of the PostnCre<sup>ER</sup>;Tmt mice represent the optimal approach for tracking and specifically isolating BM-MSCs derived from the periosteum. The expression of Cxcl12 and Scf by BM-MSCs derived from the periosteum has been demonstrated in 2 distinct experimental models (Figures 5 and 6).

      (3) On page 8, line 20, the authors' statement regarding the detection of Periostin+ cells outside the periosteum layer could be misinterpreted due to the use of the periostin antibody. Given that periostin is an extracellular matrix protein, the staining may not accurately represent Periostin-expressing cells but rather the presence of periostin in the extracellular matrix. The authors should revise this section for greater precision.

      We acknowledge and appreciate the reviewer's attention to detail. This is, in fact, an error. Nestin-GFP positive periosteal SSC are seen within the periosteum marked by an anti-periostin antibody labeling the extracellular matrix of the periosteum. The manuscript has been revised to address this inaccuracy on page 9, lines 8-9.

      Reviewer #2 (Public review):

      Summary:

      The authors have established a femur graft model that allows the study of hematopoietic regeneration following transplantation. They have extensively characterized this model, demonstrating the loss of hematopoietic cells from the donor femur following transplantation, with recovery of hematopoiesis from recipient cells. They also show evidence that BM MSCs present in the graft following transplantation are graft-derived. They have utilized this model to show that following transplantation, periosteal cells respond by first expanding, then giving rise to more periosteal SSCs, and then migrating into the marrow to give rise to BM MSCs.

      Strengths:

      These studies are notable in several ways:

      (1) Establishment of a novel femur graft model for the study of hematopoiesis;

      (2) Use of lineage tracing and surgery models to demonstrate that periosteal cells can give rise to BM MSCs.

      We thank the reviewer for noting the novelty of our manuscript.

      Weaknesses:

      There are a few weaknesses. First, the authors do not definitively demonstrate the requirement of periosteal SSC movement into the BM cavity for hematopoietic recovery. Hematopoiesis recovers significantly before 5 months, even before significant P-SSC movement has been shown, and hematopoiesis recovers significantly even when periosteum has been stripped.

      This is an important point. Notably, we can see expansion of P-SSCs by day 8 after femur transplantation and evidence of periosteum-derived SSCs in the bone marrow by day 15, before we can detect any significant hematopoietic recovery (see Figure 3A-C).

      Second, it is not clear how the periosteum is changing in the grafts. Which cells are expanding is unclear, and it is not clear if these cells have already adopted a more MSC-like phenotype prior to entering the marrow space.

      This is an interesting question. To examine early changes in gene expression in periosteal SSCs in grafted femurs, we performed additional RNA sequencing on host periosteal SSCs vs periosteal SSCs from grafted femurs at an earlier time point - at 3 days after femur transplantation and on host bone marrow MSCs (see new Supplementary Figure S5 A-C). At this time point the three cell populations are already distinct on the PCA plot (Figure S5A), and there is downregulation of some periosteal genes in the graft P-SSCs (Figure S5B). However, we do not yet see upregulation of Kitl or Cxcl12 or most other BM MSC genes in graft P-SSCs at this time point (Figure S5B). Furthermore, gene set enrichment analysis (GSEA) revealed upregulation of cell cycle, DNA replication and mismatch repair gene signatures, and downregulation of multiple gene signatures compared to host P-SSCs (Figure S5C). Therefore, we conclude that P-SSCs already adopt some gene expression changes early after femur transplantation, but have not yet fully differentiated into BM MSCs at this early time point. This experiment is now discussed on p.10 of the revised manuscript.

      Indeed, given the presence of host-derived endothelial cells in the BM, these studies are reminiscent of prior studies from this group and others that re-endothelialization of the marrow may be much more important for determining hematopoietic regeneration, rather than the P-SSC migration.

      Indeed, as previously shown by our group and others, we agree that endothelial regeneration and re-endothelialization may also play an important role in this bone marrow regeneration model. It is noteworthy that this model has the potential to serve as a valuable tool for analyzing the origin of BM endothelial cells during regeneration processes. To further illustrate the endothelial regeneration, additional images of bone sections from VE-cadherin-cre;TdTomato grafted femurs at 15 days, one month, and five months post-transplantation have been included in the new Figure S3. These images reveal extensive vascularization of the graft and proximity of UBC-GFP+ donor-derived vessels to VE-cadherin+ host-derived blood vessels in the bone marrow within one month (see Figure S2C). This observation is consistent with the timing of both BM MSC recovery and HSC recovery in the grafts, thereby suggesting the importance of endothelial recovery (see Fig. 1B). A new discussion of these findings has been included on page 6 of the revised manuscript and on page 16 in the discussion section.

      Third, the studies exploring the preferential depletion of BM MSCs vs P-SSCs are difficult to interpret. The single metabolic stress condition chosen was not well-justified, and the use of purified cell populations to study response to stress ex vivo may have introduced artifacts into the system.

      We chose to focus on hypoxia as the main condition in which to analyze the stress response of P-SSCs vs BM MSCs because we reasoned that due to the location of P-SSCs on the outside of the bone, these cells would be exposed to a higher oxygen tension than BM-MSCs, which are located within the bone marrow. Therefore, we wanted to determine whether this exposure to a different oxygen tension would be sufficient to explain the different properties of P-SSCs and BM MSCs. We modified the text on p.11 of the manuscript to explain the rationale for this experiment better.

      Reviewer #3 (Public review):

      Summary:

      Marchand, Akinnola, et al. describe the use of the novel model to study BM regeneration. Here, they harvest intact femurs and subcutaneously graft them into recipient mice. Similar to standard BM regeneration models, there is a rapid decrease in cellularity followed by a gradual recovery over 5 months within the grafts. At 5 months, these grafts have robust HSC activity, similar to HSCs isolated from the host femur. They find that periosteum skeletal stem cells (p-SSCs) are the primary source of BM-MSCs within the grafted femur and that these cells are more resistant to the acute stress of grafting the femur.

      Strengths:

      This is an interesting manuscript that describes a novel model to study BM regeneration. The model has tremendous promise.

      We thank the reviewer for highlighting the novelty and potential of our work.

      Weaknesses:

      The authors claim that grafting intact femurs subcutaneously is a model of BM regeneration and can be used as a replacement for gold standard BM regeneration assays such as sublethal chemo/irradiation. However, there isn't enough explanation as to how this model is equivalent or superior to the traditional models. For instance, the authors claim that this model allows for the study of "BM regeneration in vivo in response to acute injury using genetic tools." This can and has been done numerous times with established, physiologically relevant BM regeneration models. The onus is on the authors to discuss or perform the necessary experiments to justify the use of this model. For example, standard BM regeneration models involve systemic damage that is akin to therapies that require BM regeneration. How is studying the current model that provides only an acute injury more relevant and useful than other models? As it stands, it seems as if the authors could have done all the experiments demonstrating the importance of these p-SSCs in the traditional myelosuppressive BM regeneration models to be more physiologically relevant. Along these lines, the use of a standard BM regeneration model (e.g., sublethal chemo/irradiation) as a critical control is missing and should be included. Even if the control doesn't demonstrate that p-SSCs can contribute to the BM-MSC during regeneration, it will still be important because it could be the justification for using the described model to specifically study p-SSCs' regulation of BM regeneration.

      We appreciate the reviewer raising this important point. We never intended this femur transplantation model of bone marrow injury to replace more established models, such as chemotherapy or irradiation. In fact, we compared the effects of femur transplantation to localized bone irradiation on P-SSCs using our Periostin-Cre;Td-Tomato lineage tracing model. We found that irradiation does not induce the same migration of Tomato+ P-SSCs from the periosteum to the bone marrow cavity the way that femur transplantation, and cannot be used to demonstrate the plasticity of P-SSCs in the same way (see new Supplementary Figure S7D-E). Therefore, this appears to be a more severe form of bone marrow injury, and is not similar to other more established assays of bone marrow injury. We also added this discussion to the revised manuscript on p.14 and in the discussion section on p.17.

      The authors perform some analysis that suggests that grafting a whole femur mimics BM regeneration, but there are many experiments missing from the manuscript that will be necessary to support the use of this model. To demonstrate that this new model mimics current BM regeneration models, the authors need to perform a careful examination of the early kinetics of hematopoietic recovery post-transplant. Complete blood counts should be performed on the grafts, focusing on white blood cells (particularly neutrophils), red blood cells, platelets, all critical indicators of BM regeneration. This analysis should be done at early time points that include weekly analysis for a minimum of 28 days following the graft. Additionally, understanding how and when the vasculature recovers is critical. This is particularly important because it is well-established that if there is a delay in vascular recovery, there is a delay in hematopoietic recovery. As mentioned above, a standard BM regeneration model should be used as a control.

      We concur with the reviewer that hematopoietic recovery is a pivotal aspect of this model. We conducted a time-course analysis of bone marrow and HSC cellularity from day 0 to month 5 post-transplantation (Figure 1B). Furthermore, we evaluated the HSC capacities through bone marrow transplantation from grafted or host femurs (Figures 1D and 1E) and quantified the various hematopoietic cells in the graft after five months (Supplemental Figure 1). Furthermore, hematopoiesis occurring in the transplanted bone was comprehensively evaluated in another article, currently in revision and available in BioRxiv (Takeishi, S., Marchand, T., Koba, W. R., Borger, D. K., Xu, C., Guha, C., Bergman, A., Frenette, P. S., Gritsman, K., & Steidl, U. (2023). Haematopoietic stem cell numbers are not solely determined by niche availability. bioRxiv: the preprint server for biology, 2023.10.28.564559. https://doi.org/10.1101/2023.10.28.564559). We did not use another assay of bone marrow regeneration as a “control”, since we do not expect to see similar plasticity of periosteal SSCs in these models, such as with the localized irradiation model described in the new Figure S7D-E.

      We agree with the reviewer that endothelial recovery is also likely to be very important for hematopoietic recovery in this model, but this was not the focus of this manuscript. The process of endothelial recovery  is likely to be more complex than that of MSC recovery, as our findings indicate that the graft endothelium can arise from both the host and the graft femur (see Fig.2D). Consequently, further investigation into the mechanisms of endothelial recovery and its contribution to hematopoiesis in this experimental system will be an interesting focus of future work. We believe that this bone transplantation model represents a valuable tool for addressing questions regarding the origin and regeneration mechanisms of bone marrow endothelial cells.

      The contribution of donor and host cells to the BM regeneration of the graft is interesting. Particularly, the chimerism of the vasculature. One can assume that for the graft to undergo BM regeneration, there needs to be the delivery of nutrients into the graft via the vasculature. The chimerism of the vascular network suggests that host endothelial cells anastomose with the graft. Host mice should have their vascular system labeled with a dye such as dextran to determine if anastomosis has occurred. If not, the authors need to explain how this graft survives up to 5 months. If anastomosis does occur, then it is very surprising that the hematopoietic system of the graft is not a chimera because this would essentially be a parabiosis model. This needs to be explained.

      We have included additional images of bone sections from VE-cadherin-cre;tdTomato grafted femurs at 15 days, one month, and five months post transplantation in the new Figure S3. These images show extensive vascularization of the graft and proximity of UBC-GFP+ donor-derived vessels to VE-cadherin+ host-derived blood vessels in the bone marrow within one month, suggesting a potential anastomosis (Figure S2C). However, it is not surprising that hematopoiesis arises exclusively from the host, as we observed complete death of the hematopoietic cells and BM MSCs in the graft femur within the first 3 days of femur transplantation (see Figure S1A), and we do not see any significant hematopoietic recovery in the grafts until at least 2 months (see Fig.1B). Therefore, this is not similar to a parabiosis model, as confirmed by our chimerism studies shown in Figure 2D. In addition, these data are consistent with the results reported with the use of ossicles (doi:10.1038/nature09262; DOI 10.1016/j.cell.2007.08.025; doi:10.1038/nature07547).

      Most of the data presented for the resistance of p-SSCs to stress suggests DNA damage response. Do p-SSCs demonstrate a higher ability to resolve DNA damage? Do they accumulate less DNA damage? Staining for DNA damage foci or performing comet assays could be done to further define the mechanism of stress resistance properties of p-SSCs.

      This is an interesting question. In our RNA sequencing analysis of graft P-SSCs compared with host P-SSCs we did observe an upregulation of mismatch repair gene signatures by gene set enrichment analysis (GSEA) (new Figure S5C). Therefore, it is possible that P-SSCs do have an altered DNA damage response. However, we are unable to investigate this further at this time.

      Given the importance of BM-MSCs in hematopoiesis and that the majority of the emerging BM-MSCs appear to be derived from p-SSCs, the authors should perform experiments to determine if p-SSC-derived BM-MSCs are critical regulators of BM regeneration. For example, the authors could test this by crossing the Postn-creER mice with iDTR mice to ablate these cells and see if recovery is inhibited or delayed. This should be done with the described periosteum-wrapped femur graft model as well as a control BM regeneration model. Demonstrating that the deletion of these cells affects BM regeneration in both models would further justify the physiological relevance and utility of the femur graft model.

      We thank the reviewer for this excellent suggestion, and we agree that this is an important experiment. However, our attempts to ablate Postn+ cells using the iDTA system were limited by technical difficulties, which we are unable to address at this time.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure 2C, the vascular network staining appears to be duplicated, suggesting a possible error in image capture. The authors should replace this image with a different field or an alternative picture to avoid confusion.

      We thank the reviewer for noting this accidental duplication due to an image stitching problem. Figure 2C was replaced by a different image from the same experiment.

      (2) For consistency and clarity, a scale bar should be included in Figure S3E to indicate that the magnification factors of the respective visual fields are identical.

      We thank the reviewer for highlighting this point. The magnification used has been added in the revised Figure.

      (3) In Figure S5B, the difference in normalized Opn mRNA expression relative to Gapdh between steady-state BM-MSCs and P-SSCs seems substantial, which contradicts the "ns" (not significant) label. The authors should verify the accuracy of this labeling.

      We agree with the reviewer that this difference in what is now Figure S6B looks substantial. However, we confirmed that this difference is not statistically significant, likely due to the high variability between replicates in Opn expression in the steady state BM MSCs.

      Reviewer #2 (Recommendations for the authors):

      In order to strengthen the argument that P-SSCs are necessary for hematopoietic recovery, the authors should consider providing the following data:

      (1) In the periosteal stripping experiments, the authors should show if periosteum-derived MSCs are present in the BM throughout the process of hematopoietic recovery (not just at the end of the experiment). If none are present at the end, that would mean that periosteum is not required for hematopoietic recovery, but would still suggest that it is required for optimal hematopoietic recovery. At early time points, it would also be very helpful to demonstrate the composition and amount of endothelium present in the marrow to determine if P-SSC migration and differentiation into MSCs depends on endothelial reconstitution.

      To further examine the vascularization of the transplanted femur at an earlier time point, we have added additional images of grafted femur from VE-cadherin-cre;tdTomato at 15 days and one month post transplantation in the new Figure S3A and S3B. These images already show extensive vascularization of the graft periosteum stained with an anti-periostin antibody. In addition, we observed anastomoses of host VE-cadherin;Tmt+ blood vessels with graft ubc-GFP+ blood vessels in the grafted periosteum within one month (Figure S3C).

      (2) Studies of the surgical periosteum grafts could benefit from histologic analysis of the BM and its MSC components at earlier time points following grafting since the data provided are only at 5 months. Such studies would allow a better appreciation of the relationship between P-SSC migration into the marrow and hematopoietic recovery.

      We have performed histologic analysis of grafted femurs at multiple early time points, which shows expansion of P-SSCs and their migration into the bone marrow cavity (Figure 3C).

      (3) Studies of stress responses preferably should be performed using intact bone and should characterize P-SSC and BM MSC apoptosis, cell cycle status, differentiation, etc, immediately following shifts to the stress conditions. These studies would be more compelling if performed using additional "stress" conditions likely to represent the graft environment.

      This is an interesting suggestion. However, these types of studies would not be possible in intact bones ex vivo, as P-SSCs are known to migrate out of the bone in culture.