10,000 Matching Annotations
  1. Jul 2025
    1. eLife Assessment

      This study concerns how macaque visual cortical area MT represents stimuli composed of more than one speed of motion. The study is valuable because little is known about how the visual pathway segments and preserves information about multiple stimuli, and the study involves perceptual reports from both humans and one monkey regarding whether there are one or two speeds in the stimulus. The study presents compelling evidence that (on average) MT neurons shift from faster-speed-takes-all at low speeds to representing the average of the two speeds at higher speeds. Ultimately, this study raises intriguing questions about how exactly the response patterns in visual cortical area MT might preserve information about each speed, since such information could potentially be lost in an average response as described here, depending on assumptions about how MT activity is evaluated by other visual areas.

    2. Reviewer #1 (Public review):

      Summary:

      Most studies in sensory neuroscience investigate how individual sensory stimuli are represented in the brain (e.g., the motion or color of a single object). This study starts tackling the more difficult question of how the brain represents multiple stimuli simultaneously and how these representations help to segregate objects from cluttered scenes with overlapping objects.

      Strengths

      The authors first document the ability of humans to segregate two motion patterns based on differences in speed. Then they show that a monkey's performance is largely similar; thus establishing the monkey as a good model to study the underlying neural representations.

      Careful quantification of the neural responses in the middle temporal area during the simultaneous presentation of fast and slow speeds leads to the surprising finding that, at low average speeds, many neurons respond as if the slowest speed is not present, while they show averaged responses at high speeds. This unexpected complexity of the integration of multiple stimuli is key to the model developed in this paper.

      One experiment in which attention is drawn away from the receptive field supports the claim that this is not due to the involuntary capture of attention by fast speeds.

      A classifier using the neuronal response and trained to distinguish single speed from bi-speed stimuli shows a similar overall performance and dependence on the mean speed as the monkey. This supports the claim that these neurons may indeed underlie the animal's decision process.

      The authors expand the well-established divisive normalization model to capture the responses to bi-speed stimuli. The incremental modeling (eq 9 and 10) clarifies which aspects of the tuning curves are captured by the parameters.

    3. Reviewer #3 (Public review):

      Summary:

      This study concerns how macaque visual cortical area MT represents stimuli composed of more than one speed of motion.

      Strengths:

      The study is valuable because little is known about how the visual pathway segments and preserves information about multiple stimuli. The study presents compelling evidence that (on average) MT neurons shift from faster-speed-takes-all at low speeds to representing the average of the two speeds at higher speeds. An additional strength of the study is the inclusion of perceptual reports from both humans and one monkey participant performing a task in which they judged whether the stimuli involved one vs two different speeds. Ultimately, this study raises intriguing questions about how exactly the response patterns in visual cortical area MT might preserve information about each speed, since such information is potentially lost in an average response as described here.

    4. Author response:

      The following is the authorsโ€™ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Most studies in sensory neuroscience investigate how individual sensory stimuli are represented in the brain (e.g., the motion or color of a single object). This study starts tackling the more difficult question of how the brain represents multiple stimuli simultaneously and how these representations help to segregate objects from cluttered scenes with overlapping objects.

      Strengths

      The authors first document the ability of humans to segregate two motion patterns based on differences in speed. Then they show that a monkey's performance is largely similar; thus establishing the monkey as a good model to study the underlying neural representations.

      Careful quantification of the neural responses in the middle temporal area during the simultaneous presentation of fast and slow speeds leads to the surprising finding that, at low average speeds, many neurons respond as if the slowest speed is not present, while they show averaged responses at high speeds. This unexpected complexity of the integration of multiple stimuli is key to the model developed in this paper.

      One experiment in which attention is drawn away from the receptive field supports the claim that this is not due to the involuntary capture of attention by fast speeds.

      A classifier using the neuronal response and trained to distinguish single-speed from bi-speed stimuli shows a similar overall performance and dependence on the mean speed as the monkey. This supports the claim that these neurons may indeed underlie the animal's decision process.

      The authors expand the well-established divisive normalization model to capture the responses to bi-speed stimuli. The incremental modeling (eq 9 and 10) clarifies which aspects of the tuning curves are captured by the parameters.

      We thank the Reviewer for the thorough summary of the findings and supportive comments.

      Weaknesses

      While the comparison of the overall pattern of behavioral performance between monkeys and humans is important, some of the detailed comparisons are not well supported by the data. For instance, whether the monkey used the apparent coherence simply wasn't tested and a difference between 4 human subjects and a single monkey subject cannot be tested statistically in a meaningful manner. I recommend removing these observations from the manuscript and leaving it at "The difference between the monkey and human results may be due to species differences or individual variability" (and potentially add that there are differences in the task as well; the monkey received feedback on the correctness of their choice, while the humans did not.)

      Thanks for the suggestion. We agree and have modified the text accordingly. We now state on page 8, lines 189-191, "The difference between the monkey and human results may be due to species differences or individual variability. The differences in behavioral tasks may also play a role โ€“ the monkey received feedback on the correctness of the choice, whereas human subjects did not."

      A control experiment aims to show that the "fastest speed takes all" behavior is general by presenting two stimuli that move at fast/slow speeds in orthogonal directions. The claim that these responses also show the "fastest speed takes all" is not well supported by the data. In fact, for directions in which the slow speed leads to the largest response on its own, the population response to the bi-speed stimulus is the average of the response to the components (This is fine. One model can explain all direction tuning curve, which also explain averaging at the slower speed stronger directions). Only for the directions where the fast speed stimulus is the preferred direction is there a bias towards the faster speed (Figure 7A). The quantification of this effect in Figure 7B seems to suggest otherwise, but I suspect that this is driven by the larger amplitude of Rf in Figure 8, and the constraint that ws and wf are constant across directions. The interpretation of this experiment needs to be reconsidered.

      The Reviewer raised a good question. Our model with fixed weights for faster and slower components across stimulus directions provided a parsimonious explanation for the whole tuning curve, regardless of whether the faster component elicited a stronger response than the slower component. Because the model can be well constrained by the measured direction-tuning curves, we did not restrain ๐‘ค and ๐‘ค to sum to one, which is more general. The linear weighted summation (LWS) model fits the neuronal responses to the bi-speed stimuli very well, accounting for an average of 91.8% (std = 7.2%) of the response variance across neurons. As suggested by the Reviewer, we now use the normalization model to fit the data with fixed weights across all motion directions. The normalization model also provides a good fit, accounting for an average of 90.5% (std = 7.1%) of the response variance across neurons.

      Note that in the new Figure 8A, at the left side of the tuning curve (i.e., at negative vector average (VA) directions), where the slower component moving in a more preferred direction of the neurons than the faster component, the bi-speed response (red curve) is slightly lower than the average of the component response (gray curve), indicating a bias toward the weaker faster component. Therefore, the faster speed bias does not occur only when the faster component moves in the more preferred direction. This can also be seen in the direction-tuning curves of an example neuron that we added to the figure (new Fig. 8B). The peak responses to the slower and faster component were about the same, but the neuron still showed a faster-speed bias. At negative VA directions, the red curve is lower than the response average (gray curve) and is biased toward the weaker (faster) component. ย 

      The faster-speed bias also occurs when the peak response to the slower component is stronger than the faster component. As a demonstration, Author response image 1 1 shows an example MT neuron that has a slow preferred speed (PS = 1.9 deg/s) and was stimulated by two speeds of 1.2 and 4.8 deg/s. The peak response to the faster component (blue) was weaker than that to the slower component (green). However, this neuron showed a strong bias toward the faster component. A normalization model fit with fixed weights for the faster and slower components (black curve) described the neuronal response to both speeds (red) well. This neuron was not included in the neuron population shown in Figure 8 because it was not tested with stimulus speeds of 2.5 and 10 deg/s.

      Author response image 1.

      An example MT neuron was tested with stimulus speeds of 1.2 and 4.8 deg/s. The preferred speed of this neuron was 1.9 deg/s. Fixed weights of 0.59 for the faster component and 0.12 for the slower component described the responses to the bispeed stimuli well using a normalization model. The neuron showed a faster-speed bias although its peak response to the slower component was higher than that of the faster component.

      We modified the text to clarify these points:

      Page 19, lines 405 โ€“ 410, โ€œThe bi-speed response was biased toward the faster component regardless of whether the response to the faster component was stronger (in positive VA directions) or weaker (in negative VA directions) than that to slower component (Fig. 8A). The result from an example neuron further demonstrated that, even when the peak firing rates of the faster and slower component responses were similar, the response elicited by the bi-speed stimuli was still biased toward the faster component (Fig. 8B). โ€

      Page 19, lines 421 โ€“ 427, โ€œBecause the model can be well constrained by the measured direction-tuning curves, it is not necessary to require ๐‘ค and ๐‘ค to sum to one, which is more general. An implicit assumption of the model is that, at a given pair of stimulus speeds, the response weights for the slower and faster components are fixed across motion directions. The model fitted MT responses very well, accounting for an average of 91.8% of the response variance (std = 7.2%, N = 21) (see Methods). The success of the model supports the assumption that the response weights are fixed across motion directions.โ€

      Reviewer #2 (Public Review):

      Summary:

      This is a paper about the segmentation of visual stimuli based on speed cues. The experimental stimuli are random dot fields in which each dot moves at one of two velocities. By varying the difference between the two speeds, as well as the mean of the two speeds, the authors estimate the capacity of observers (human and non-human primates) to segment overlapping motion stimuli. Consistent with previous work, perceptual segmentation ability depends on the mean of the two speeds. Recordings from area MT in monkeys show that the neuronal population to compound stimuli often shows a bias towards the faster-speed stimuli. This bias can be accounted for with a computational model that modulates single-neuron firing rates by the speed preferences of the population. The authors also test the capacity of a linear classifier to produce the psychophysical results from the MT data.

      Strengths:

      Overall, this is a thorough treatment of the question of visual segmentation with speed cues. Previous work has mostly focused on other kinds of cues (direction, disparity, color), so the neurophysiological results are novel. The connection between MT activity and perceptual segmentation is potentially interesting, particularly as it relates to existing hypotheses about population coding.

      We thank the Reviewer for the summary and comments.

      Weaknesses:

      Page 10: The relationship between (R-Rs) and (Rf-Rs) is described as "remarkably linear". I don't actually find this surprising, as the same term (Rs) appears on both the x- and y-axes. The R^2 values are a bit misleading for this reason.

      The Reviewer is correct that subtracting a common term Rs from R and Rf would introduce correlation between (R-Rs) and (Rf-Rs). To address this concern, we conducted an additional analysis. We showed that, at most speed pairs, the R^2 values between (R-Rs) and (Rf-Rs) based on the data are significantly higher than the R^2 values between (Rโ€™-Rs) and (RfRs), in which Rโ€™ was a random combination of Rs and Rf. Since the same Rs was commonly subtracted in calculating R^2 (data) and R^2 (simulation), the difference between R^2 (data) and R^2 (simulation) suggests that the response pattern of R contributes to the additional correlation.

      We now acknowledge this confounding factor and describe the new analysis results on page 14, lines 309 โ€“ 326. Please also see the response to Reviewer 3 about a similar concern.

      Figure 9: I'm confused about the linear classifier section of the paper. The idea makes sense - the goal is to relate the neuronal recordings to the psychophysical data. However the results generally provide a poor quantitative match to the psychophysical data. There is mention of a "different paper" (page 26) involving a separate decoding study, as well as a preprint by Huang et al. (2023) that has better decoding results. But the Huang et al. preprint appears to be identical to the current manuscript, in that neither has a Figure 12, 13, or 14. The text also says (page 26) that the current paper is not really a decoding study, but the linear classifier (Figure 9F) is a decoder, as noted on page 10. It sounds like something got mixed up in the production of two or more papers from the same dataset.

      We apologize for the confusion regarding the reference of Huang et al. (2023, bioRxiv). We referred to an earlier version of this bioRxiv manuscript (version 1), which included decoding analysis. In the bibliography, we provided two URLs for this pre-print. While the second link was correct, the first URL automatically links to the latest version (version 2), which did not have the abovementioned decoding analysis.

      The analysis in Figure 9 is to apply a classifier to discriminate two-speed from singlespeed stimuli, which is a decoding analysis as the Reviewer pointed out. We revised the result section about the classifier to make it clear what the classifier can and cannot explain (pages 2223, lines 516-534). We also included a sentence at the end of this section that leads to additional decoding analysis to extract motion speed(s) from MT population responses (page 23, lines 541543), โ€œTo directly evaluate whether the population neural responses elicited by the bi-speed stimulus carry information about two speeds, it is important to conduct a decoding analysis to extract speed(s) from MT population responses.โ€

      In any case, I think that some kind of decoding analysis would really strengthen the current paper by linking the physiology to the psychophysics, but given the limitations of the linear classifier, a more sophisticated approach might be necessary -- see for example Zemel, Dayan, and Pouget, 1998. The authors might also want to check out closely related work by Treue et al. (Nature Neuroscience 2000) and Watamaniuk and Duchon (1992).

      We thank the Reviewer for the suggestion and agree that it is useful to incorporate additional decoding analysis that can better link physiology results to psychophysics. The decoding analysis we conducted was motivated by the framework proposed by Zemel, Dayan, and Pouget (1998), and also similar to the idea briefly mentioned in the Discussion of Treue et al. (2000). We have added the decoding analysis to this paper on pages 25-32. ย 

      What do we learn from the normalization model? Its formulation is mostly a restatement of the results - that the faster and slower speeds differentially affect the combined response. This hypothesis is stated quantitatively in equation 8, which seems to provide a perfectly adequate account of the data. The normalization model in equation 10 is effectively the same hypothesis, with the mean population response interposed - it's not clear how much the actual tuning curve in Figure 10A even matters, since the main effect of the model is to flatten it out by averaging the functions in Figure 10B. Although the fit to the data is reasonable, the model uses 4 parameters to fit 5 data points and is likely underconstrained; the parameters other than alpha should at least be reported, as it would seem that sigma is actually the most important one. And I think it would help to examine how robust the statistical results are to different assumptions about the normalization pool.

      In the linear weighted summation model (LWS) model (Eq. 8), the weights Ws and Wf are free parameters. We think the value of the normalization model (Eq. 9) is that it provides an explanation of what determines the response weights. We agree with the Reviewer that using the normalization model (Eq. 9) with 4 parameters to fit 5 data points of the tuning curves to bispeed stimuli of individual neurons is under-constrained. We, therefore, removed the section using the normalization model to fit overlapping stimuli moving in the same direction at different speeds.

      A better way to constrain the normalization model is to use the full direction-tuning curves of MT neurons in response to two stimulus components moving in different directions at different speeds, as shown in Figure 8. We now use the normalization model (Eq. 9) to fit this data set (also suggested by Reviewer 1), in addition to the LWS model. We now report the median values of the model parameters of the normalization model, including the exponent n, sigma, alpha, and the constant c. We also compared the normalization model fit with the linear summation (LWS) model. We discuss the limitations of our data set and what needs to be done in future studies. The revisions are on page 20, lines 434-467 in the Results, and pages 34-35, lines 818-829 in Discussion.

      Reviewer #3 (Public Review):

      Summary:

      This study concerns how macaque visual cortical area MT represents stimuli composed of more than one speed of motion.

      Strengths:

      The study is valuable because little is known about how the visual pathway segments and preserves information about multiple stimuli. The study presents compelling evidence that (on average) MT neurons represent the average of the two speeds, with a bias that accentuates the faster of the two speeds. An additional strength of the study is the inclusion of perceptual reports from both humans and one monkey participant performing a task in which they judged whether the stimuli involved one vs two different speeds. Ultimately, this study raises intriguing questions about how exactly the response patterns in visual cortical area MT might preserve information about each speed, since such information could potentially be lost in an average response as described here, depending on assumptions about how MT activity is evaluated by other visual areas.

      Weaknesses:

      My main concern is that the authors are missing an opportunity to make clear that the divisive normalization, while commonly used to describe neural response patterns in visual areas (and which fits the data here), fails on the theoretical front as an explanation for how information about multiple stimuli can be preserved. Thus, there is a bit of a disconnect between the goal of the paper - how does MT represent multiple stimuli? - and the results: mostly averaging responses which, while consistent with divisive normalization, would seem to correspond to the perception of a single intermediate speed. This is in contrast to the psychophysical results which show that subjects can at least distinguish one from two speeds. The paper would be strengthened by grappling with this conundrum in a head-on manner.

      We thank the Reviewer for the constructive comments. We agree with the Reviewer that it is important to connect the encoding of multiple speeds with the perception. The Reviewer also raised an important question regarding whether multiple speeds can be extracted from population neural responses, given the encoding rules characterized in this study.

      It is a hard problem to extract multiple stimulus values from the population neural response. Inspired by the theoretical framework proposed by Zemel et al. (1998), we conducted a detailed decoding study to extract motion speed(s) from MT population responses. We used the decoded speed(s) to perform a discrimination task similar to our psychophysics task and compared the decoder's performance with perception. We found that, at X4 speed difference, we could decode two speeds based on MT response, and the decoder's performance was similar to that of perception. However, at X2 speed difference, except at the slowest speeds of 1.25 and 2.5 deg/s, the decoder cannot extract two speeds and cannot differentiate between a bi-speed stimulus and a single log-mean speed stimulus. We have added the decoding analysis to this paper on pages 25-32. We also discuss the implications and limitations of these results (pages 35-36, lines 852-884).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Classifier:

      One question I have is how the classifier's performance scales with the number of neurons used in the analysis. Here that number is set to the number that was recorded, but it is a free parameter in this analysis. Why does the arbitrary choice of 100 neurons match the animals' performance?

      We apologize for the unclearness of this point. The decoding using the classifier was based on the neural responses of 100 recorded MT neurons in our data set. The number of 100 neurons was not a free parameter. We need to reconstruct the population neural response based on the responses of the recorded neurons and their preferred speeds (red and black dots in Figure 9A-E). ย 

      We spline-fitted the reconstructed population neural responses (red and black curves in Figure 9-E). One way to change the number of neurons used for the decoding is to resample N points along the spline-fitted population responses, using N as a free parameter. However, we think it is better to conduct decoding based on the responses from the recorded neurons rather than based on interpolated responses. We now clarify on page 22, lines 520-522, that we based on the responses of the 100 recorded neurons in our dataset to do the classification (decoding).

      Normalization Model:

      Although the model is phenomenological, a schematic circuit diagram could help the reader understand how this could work (I think this is worthwhile even though the data cannot distinguish among different implementations of divisive normalization).

      Thanks for this suggestion. We agree that a circuit diagram would help the readers understand how the model works. However, as the Reviewer pointed out, our data cannot distinguish between different implementations of the model. For example, divisive normalization can occur on the inputs to MT neurons or on MT neurons themselves. The circuit mechanism of weighting the component responses is not clear either. A schematic circuit diagram then mainly serves to recapitulate the normalization model in Equation 9. We, therefore, choose not to add a schematic circuit diagram at this time. We are interested in developing a circuit model to account for how visual neurons represent multiple stimuli in future studies.

      Another suggestion is that the time courses could be used to constrain the model; the fact that it takes a while after the onset of the slow-speed response for averaging to reveal itself suggests the presence of inertia/hysteresis in the circuit).

      We agree that the time course of MT responses could be used to constrain the model. This is also why we think it is important to document the time course in this paper. We now state in the Results, page 17, lines 354-357:

      โ€œAt slow speeds, the very early faster-speed bias suggests a likely role of feedforward inputs to MT on the faster-speed bias. The slightly delayed reduction (normalization) in the bispeed response relative to the stronger component response also helps constrain the circuit model for divisive normalization.โ€

      Two-Direction Experiment:

      Applying the normalization model to this dataset could help determine its generality.

      This is a good point. We now apply the normalization model (Eq. 9) to fit this data set with the full direction tuning curves in response to two stimuli moving in different directions at different speeds. Please also see the response to Reviewer 2 about the normalization model fit.

      The results of the normalization model fit are now described on page 20 and Figure 8A, B, D.

      Reviewer #2 (Recommendations For The Authors):

      In terms of impact, I would say that the presentation is geared largely toward people who go to VSS. To broaden the appeal, the authors might consider a more general formulation of the four hypotheses stated at the bottom of page 3. These are prominent ideas in systems neuroscience - population encoding, Bayesian inference, etc.

      We thank the Reviewer for the suggestion. We have revised the Introduction accordingly on pages 3-4, lines 43-69. Please also see the response to Reviewer 3 about the Introduction.

      Figure 5: It might be helpful to show the predictions for different hypotheses. If the response to the transparent stimulus is equal to that of the faster stimulus, you will have a line with slope 1. If it is equal to the response to the slow stimulus, all points will lie on the x-axis. In between you get lines with slopes less than 1.

      In Figures 5F1 and 5F2, we show dotted lines indicating faster-all (i.e., faster-componenttake-all), response averaging, and slower-all (i.e., slower-component-take-all) on the X-axis. We show those labels in between Figs. 5F1 and F2.

      Figure 6: The analysis is not motivated by any particular question, and the results are presented without any quantitation. This section could be better motivated or else removed.

      We now better motivate the section about the response time course on page 16, lines 336 โ€“ 339: โ€œThe temporal dynamics of the response bias toward the faster component may provide a useful constraint on the neural model that accounts for this phenomenon. We therefore examined the timecourse of MT response to the bi-speed stimuli. We asked whether the faster-speed bias occurred early in the neuronal response or developed gradually.โ€

      On page 17, lines 354-357, we also state that โ€œAt slow speeds, the very early faster-speed bias suggests a likely role of feedforward inputs to MT on the faster-speed bias. The slightly delayed reduction (normalization) in the bi-speed response relative to the stronger component response also helps constrain the circuit model for divisive normalization.โ€

      Equation (9): There appears to be an "S" missing in the denominator.

      We double-checked and did not see a missing "S" in Equation 9, on page 20. ย 

      Reviewer #3 (Recommendations For The Authors):

      This is an impressive study, with the chief strengths being the computational/theoretical motivation and analyses and the inclusion of psychophysics together with primate neurophysiology. The manuscript is well-written and the figures are clear and convincing (with a couple of suggestions detailed below).

      We thank the Reviewer for the comments.

      Specific suggestions:

      (1) Intro para 3

      "It is conceivable that the responses of MT neurons elicited by two motion speeds may follow one of the following rules: (1) averaging the responses elicited by the individual speed components; (2) bias toward the speed component that elicits a stronger response, i.e. "soft-max operation" (Riesenhuber and Poggio, 1999); (3) bias toward the slower speed component, which may better represent the more probable slower speeds in nature scenes (Weiss et al., 2002); (4) bias toward the faster speed component, which may benefit the segmentation of a faster-moving stimulus from a slower background."

      This would be a good place to point out which of these options is likely to preserve vs. lose information and how.

      It seems to me that only #2 is clearly information-preserving, assuming that there are neurons with a variety of different speed preferences such that different neurons will exhibit different "winners". #1 would predict subjects would perceive only an intermediate speed, whereas #3 would predict perceiving only/primarily the slower speed and #4 would predict only/primarily perceiving the faster speed.

      The difference between "only" and "primarily" would depend on whether the biases are complete or only partial. I acknowledge that the behavioral task in the study is not a "report all perceived speeds" task, but rather a 1 vs 2 speeds task, so the behavioral assay is not a direct assessment of the question I'm raising here, but I think it should still be possible to write about the perceptual implications of these different possibilities for encoding in an informative way.

      Thanks for the suggestions. We have revised this paragraph in the Introduction on pages 3 โ€“ 4, lines 43 โ€“ 69.

      (2) Analysis clarifications

      The section "Relationship between the responses to bi-speed stimuli and constituent stimulus components" could use some clarification/rearrangement/polish. I had to read it several times. Possibly, rearrangement, simplification/explanation of nomenclature, and building up from a simpler to a more complex case would help. If I understand correctly, the outcome of the analysis is to obtain a weight value for every combination of slow and fast speeds used. The R's in equation 5 are measured responses, observed on the single stimulus and combined stimulus trials. It was not clear to me if the R's reflect average responses or individual trial responses; this should be clarified. Ws = 1- wf so in essence only 1 weight is computed for each combination. Then, in the subsequent sections of the manuscript, the authors explore whether the weight computed for each stimulus combination is the same or does it vary across conditions. If I have this right, then walking through these steps will aid the reader.

      The Reviewer is correct. We now walk through these steps and better state the rationale for this approach. The R's in Equation 5 are trial-averaged responses, not trial-by-trial responses.

      We have clarified these points on page 13.

      To take a particular example, the sentence "Using this approach to estimate the response weights for individual neurons can be inaccurate because, at each speed pair, the weights are determined only by three data points" struck me as a rather backdoor way to get at the question. Is the estimate noisy? Or does the weighting vary systematically across speeds? I think the authors are arguing the latter; if so, it would be valuable to say so.

      We wanted to estimate the weighting for each speed pair and determine whether the weights change with the stimulus speeds. Indeed, we found that the weights change systematically across speed pairs. The issue was not because the estimate was noisy (see below in response to the second paragraph for point 3. ย 

      We have clarified this point in the text, on page 13, lines 273 โ€“ 280: โ€œOur goal was to estimate the weights for each speed pair and determine whether the weights change with the stimulus speeds. In our main data set, the two speed components moved in the same direction. To determine the weights of ๐‘ค and w<sub>f</sub> for each neuron at each speed pair, we have three data points R, R<sub>s</sub>, and R<sub>f</sub>, which are trial-averaged responses. Since it is not possible to solve for both variables, ๐‘ค and w<sub>f</sub>, from a single equation (Eq. 5) with three data values, we introduced an additional constraint: ๐‘ค + w<sub>f</sub> =1. While this constraint may not yield the exact weights that would be obtained with a fully determined system, it nevertheless allows us to characterize how the relative weights vary with stimulus speed.โ€

      (3) Figure 5

      Related to the previous point, Figures 5A-E are subject to a possible confound. When plotting x vs y values, it is critical that the x and y not depend trivially on the same value. Here, the plots are R-Rs and Rf-Rs. Rs, therefore, is contained in both the x and y values. Assume, for the sake of argument, that R and Rf are constants, whereas Rs is drawn from a distribution of random noise. When Rs, by chance, has an extreme negative value, R-Rs and Rf-Rs will be large positive values. The solution to this artificial confound is to split the trials that generate Rs into two halves and subtract one half from R and the other half from Rf. Then, the same noisy draw will not be contributing to both x and y. The above is what is needed if the authors feel strongly about including this analysis.

      The Reviewer is correct that subtracting a common term (Rs) would introduce a correlation between (R-Rs) and (Rf-Rs) (Reviewer 2 also raised this point). R's in Equations 5, 6, 7 (and Figure 5A-E) are trial-averaged responses. So, we cannot address the issue by dividing Rโ€™s into two halves. Our results showed that the regression slope (W<sub>f</sub>) changed from near 1 to about 0.5 as the stimulus speeds increased, and the correlation coefficient between (R โ€“ Rs) and (R<sub>f</sub> โ€“ Rs) was high at slow stimulus speeds. To determine whether these results can be explained by the confounding factor of subtracting a common term Rs, rather than by the pattern of R in representing two speeds, we did an additional analysis. We acknowledged the issue and described the new analysis on page 13, lines 303 โ€“ 326:

      โ€œOur results showed that the bi-speed response showed a strong bias toward the faster component when the speeds were slow and changed progressively from a scheme of โ€˜fastercomponent-take-allโ€™ to โ€˜response-averagingโ€™ as the speeds of the two stimulus components increased (Fig. 5F1). We found similar results when the speed separation between the stimulus components was small (ร—2), although the bias toward the faster component at low stimulus speeds was not as strong as x4 speed separation (Fig. 5A2-F2 and Table 1). ย 

      In the regression between (๐‘… โ€“ ๐‘…<sub>s</sub>) and (๐‘…<sub>f</sub> โ€“ ๐‘…<sub>s</sub>), ๐‘…<sub>s</sub> was a common term and therefore could artificially introduce correlations. We wanted to determine whether our estimates of the regression slope (๐‘ค<sub>f</sub>) and the coefficient of determination (๐‘…<sup>2</sup>) can be explained by this confounding factor. At each speed pair and for each neuron from the data sample of the 100 neurons shown in Figure 5, we simulated the response to the bi-speed stimuli (๐‘… <sub>e</sub>) as a randomly weighted sum of ๐‘…<sub>f</sub> and ๐‘…<sub>s</sub> of the same neuron.

      ๐‘…<sub>e</sub> = ๐‘Ž๐‘…<sub>f</sub> + (1 โˆ’ ๐‘Ž)๐‘…<sub>s</sub>,

      in which ๐‘Ž was a randomly generated weight (between 0 and 1) for ๐‘…<sub>f</sub>, and the weights for ๐‘…<sub>f</sub> and ๐‘…<sub>s</sub> summed to one. We then calculated the regression slope and the correlation coefficient between the simulated ๐‘…<sub>e</sub> - ๐‘…<sub>s</sub> and ๐‘…<sub>f</sub> - ๐‘…<sub>s</sub> across the 100 neurons. We repeated the process 1000 times and obtained the mean and 95% confidence interval (CI) of the regression slope and the ๐‘…<sup>2</sup>. The mean slope based on the simulated responses was 0.5 across all speed pairs. The estimated slope (๐‘ค<sub>f</sub>) based on the data was significantly greater than the simulated slope at slow speeds of 1.25/5, 2.5/10 (Fig. 5F1), and 1.25/2.5, 2.5/5, and 5/10 degrees/s (Fig. 5F2) (bootstrap test, see p values in Table 1). The estimated ๐‘…<sup>2</sup> based on the data was also significantly higher than the simulated ๐‘…<sup>2</sup> for most of the speed pairs (Table 1). These results suggest that the faster-speed bias at the slow stimulus speeds and the consistent response weights across the neuron population at each speed pair are not analysis artifacts.โ€

      However, I don't see why the analysis is needed at all. Can't Figure 5F be computed on its own? Rather than computing weights from the slopes in 5A-E, just compute the weights from each combination of stimulus conditions for each neuron, subject to the constraint ws=1-wf. I think this would be simpler to follow, not subject to the noise confound described in the previous point, and likely would make writing about the analysis easier.

      We initially tried the suggested approach to determine the weights of the individual neurons. The weights from each speed combination for each neuron are calculated by: ย ๐‘ค<sub>s</sub> = , ๐‘ค<sub>f</sub> , and ๐‘ค<sub>s</sub> and ๐‘ค<sub>f</sub> sum to 1. ๐‘…, ๐‘…<sub>f</sub> and ย ๐‘…<sub>s</sub> are the responses to the same motion direction. Using this approach to estimate response weights for individual neurons can be unreliable, particularly when ๐‘…<sub>f</sub> and ๐‘…<sub>s</sub> are similar. This situation often arises when the two speeds fall on opposite sides of the neuron's preferred speed, resulting in a small denominator (๐‘…<sub>f</sub> - ๐‘…<sub>s</sub>) and, consequently, an artificially inflated weight estimate. We therefore used an alternative approach. We estimated the response weights for the neuronal population at each speed pair (๐‘…<sub>f</sub> - ๐‘…<sub>s</sub>) using linear regression of (๐‘… - ๐‘…<sub>s</sub>) against (๐‘…<sub>f</sub> - ๐‘…<sub>s</sub>). The slope is the weight for the faster component for the population. This approach overcame the difficulty of determining the response weights for single neurons.

      Nevertheless, if the data provide better constraints, it is possible to estimate the response weights for each speed pair for individual neurons. For example, we can calculate the weights for single neurons by using stimuli that move in different directions at two speeds. By characterizing the full direction tuning curves for R, R<sub>f</sub>, and Rs, we have sufficient data to constrain the response weights for single neurons, as we did for the speed pair of 2.5 and 10ยบ/s in Figure 8. In future studies, we can use this approach to measure the response weights for single neurons at different speed pairs and average the weights across the neuron population. ย 

      We explain these considerations in the Results (pages 13โ€“14, lines 265-326) and Discussion (pages 34-35, lines 818-829).

      (4) Figure 7

      Bidirectional analysis. It would be helpful to have a bit more explanation for why this analysis is not subject to the ws=1-wf constraint. In Figure 7B, a line could be added to show what ws + wf =1 would look like (i.e. a line with slope -1 going from (0,1) to (1,0); it looks like these weights are a little outside that line but there is still a negative trend suggesting competition.

      For the data set when visual stimuli move in the same direction at different speeds, we included a constraint that W<sub>s</sub> and W<sub>f</sub> sum to 1. This is because one cannot solve two independent variables (Ws and Wf) using one equation R = W<sub>s</sub> ยท R<sub>s</sub> + W<sub>f</sub> R<sub>f</sub>, with three data values (R, Rs, Rf).

      In the dataset using bi-directional stimuli (now Fig. 8), we can use the full direction tuning curves to constrain the linear weighted (LWS) summation model and the normalization model. So, we did not need to impose the additional constraint that Ws and Wf sum to one, which is more general. We now clarify this in the text, on page 19, lines 421-423.

      As suggested, we added a line showing Ws + Wf = 1 for the LWS model fit (Fig. 8C) and the normalization model fit (Fig. 8D) (also see page 21, lines 482-484). Although ๐‘ค and ๐‘ค are not constrained to sum to one in the model fits, the fitted weights are roughly aligned with the dashed lines of Ws + Wf = 1.

      (5) Attention task

      General wording suggestions - a caution against using "attention" as a causal/mechanistic explanation as opposed to a hypothesized cognitive state. For example, "We asked whether the faster-speed bias was due to bottom-attention being drawn toward the faster stimulus component". This could be worded more conservatively as whether the bias is "still present if attention is directed elsewhere" - i.e. a description of the experimental manipulation.

      We intended to test the hypothesis of whether the faster-speed bias can be explained by attention automatically drawn to the faster component and therefore enhance the contribution of the faster component to the bi-speed response. We now state it as a possible explanation to be tested. We changed the subtitle of this section to be more conservative: โ€œFaster-speed bias still present when attention was directed away from the RFsโ€, on page 18, line 363.

      We also modified the text on page 18, lines 364-367: โ€œOne possible explanation for the faster-speed bias may be that bottom-up attention is drawn toward the faster stimulus component, enhancing the response to the faster component. To address this question, we asked whether the faster-speed bias was still present if attention was directed away from the RFs.โ€

      Relatedly, in the Discussion, the section on "Neural mechanisms", the sentence "The faster-speed bias was not due to an attentional modulation" should be rephrased as something like 'the bias survived or was still present despite an attentional modulation requiring the monkey to attend elsewhere'.

      Our motivation for doing the attention-away experiment was to determine whether a bottom-up attentional modulation can explain the faster-speed bias. We now describe the results as suggested by the Reviewer. But weโ€™d also like to interpret the implications of the results. In Discussion, page 34, lines 789-790, we now state: โ€œWe found that the faster-speed bias was still present when attention was directed away from the RFs, suggesting that the faster-speed bias cannot be explained by an attentional modulation.โ€ ย 

      (6) "A model that accounts for the neuronal responses to bi-speed stimuli". This section opens with: "We showed that the neuronal response in MT to a bi-speed stimulus can be described by a weighted sum of the neuron's responses to the individual speed components". "Weighted average" would be more appropriate here, given that ws = 1-wf.

      As mentioned above, the added constraint of Ws+Wf = 1 was only a practical solution for determining the weights for the data set using visual stimuli moving in the same direction. More generally, Ws and Wf do not need to sum to one. As such, we prefer the wording of weighted sum.

      (7) "As we have shown previously using visual stimuli moving transparently in different directions, a classifier's performance of discriminating a bi-directional stimulus from a singledirection stimulus is worse when the encoding rule is response-averaging than biased toward one of the stimulus components" - this is important! Can this be worked into the Introduction?

      Yes, we now also mention this point in the Introduction regarding response averaging on page 4, lines 54-57: โ€œWhile decoding two stimuli from a unimodal response is theoretically possible (Zemel et al., 1998; Treue et al., 2000), response averaging may result in poorer segmentation compared to encoding schemes that emphasize individual components, as demonstrated in neural coding of overlapping motion directions (Xiao and Huang, 2015).โ€ Also, please see the response to point 1 above.

      (8) Minor, but worth catching now - is the use of initials for human participants consistent with best practices approved at your institution?

      Thanks for checking. The letters are not the initials of the human subjects. They are coded characters. We have clarified it in the legend of Figure 1, on page 7, line 168.

    1. eLife Assessment

      This valuable study uses tools of population and functional genomics to examine long non-coding RNAs (lncRNAs) in the context of human evolution. Analyses of computationally predicted human-specific lncRNAs and their genomic targets lead to the development of hypotheses regarding the potential roles of these genetic elements in human biology. The conclusions regarding evolutionary acceleration and adaptation, however, only incompletely take data and literature on human/chimpanzee genetics and functional genomics into account.

    2. Reviewer #2 (Public review):

      In this valuable manuscript, Lin et al attempt to examine the role of long non coding RNAs (lncRNAs) in human evolution, through a set of population genetics and functional genomics analyses that leverage existing datasets and tools. Although the methods are incomplete and at times inadequate, the results nonetheless point towards a possible contribution of long non coding RNAs to shaping humans, and suggest clear directions for future, more rigorous study.

      Comments on revisions:

      I thank the authors for their revision and changes in response to previous rounds of comments. As it had been nearly two years since I last saw the manuscript, I reread the full text to familiarise myself again with the findings presented. While I appreciate the changes made and think they have strengthened the manuscript, I still find parts of it a bit too speculative or hyperbolic. In particular, I think claims of evolutionary acceleration and adaptation require more careful integration with existing human/chimpanzee genetics and functional genomics literature. For example:

      Line 155: "About 5% of genes have significant sequence differences in humans and chimpanzees," This statement needs a citation, and a definition of what is meant by 'significant', especially as multiple lines below instead mention how it's not clear how many differences matter, or which of them, etc.

      line 187: "Notably, 97.81% of the 105141 strong DBSs have counterparts in chimpanzees, suggesting that these DBSs are similar to HARs in evolution and have undergone human-specific evolution." I do not see any support for the inference here. Identifying HARs and acceleration relies on a far more thorough methodology than what's being presented here. Even generously, pairwise comparison between two taxa only cannot polarise the direction of differences; inferring human-specific change requires outgroups beyond chimpanzee.

      line 210: "Based on a recent study that identified 5,984 genes differentially expressed between human-only and chimpanzee-only iPSC lines (Song et al., 2021), we estimated that the top 20% (4248) genes in chimpanzees may well characterize the human-chimpanzee differences" I do not agree with the rationale for this claim, and do not agree that it supports the cutoff of 0.034 used below. I also find that my previous concerns with the very disparate numbers of results across the three archaics have not been suitably addressed.

      I also think that there is still too much of a tendency to assume that adaptive evolutionary change is the only driving force behind the observed results in the results. As I've stated before, I do not doubt that lncRNAs contribute in some way to evolutionary divergence between these species, as do other gene regulatory mechanisms; the manuscript leans down on it being the sole, or primary force, however, and that requires much stronger supporting evidence. Examples include, but are not limited to:

      line 230: "These results reveal when and how HS lncRNA-mediated epigenetic regulation influences human evolution." This statement is too speculative.

      Line 268: "yet the overall results agree well with features of human evolution." What does this mean? This section is too short and unclear.

      Line 325: "and form 198876 HS lncRNA-DBS pairs with target transcripts in all tissues." This has not been shown in this paper - sequence based analyses simply identify the *potential* to form pairs.

      Line 423: "Our analyses of these lncRNAs, DBSs, and target genes, including their evolution and interaction, indicate that HS lncRNAs have greatly promoted human evolution by distinctly rewiring gene expression." I do not agree that this conclusion is supported by the findings presented - this would require significant additional evidence in the form of orthogonal datasets.

      I also return briefly to some of my comments before, in particular on the confounding effects of gene length and transcript/isoform number. In their rebuttal the authors argued that there was no need to control for this, but this does in fact matter. A gene with 10 transcripts that differ in the 5' end has 10 times as many chances of having a DBS than a gene with only 1 transcript, or a gene with 10 transcripts but a single annotated TSS. When the analyses are then performed at the gene level, without taking into account the number of transcripts, this could introduce a bias towards genes with more annotated isoforms. Similarly, line 246 focuses on genes with "SNP numbers in CEU, CHB, YRI are 5 times larger than the average." Is this controlled for length of the DBS? All else being equal a longer DBS will have more SNPs than a shorter one. It is therefore not surprising that the same genes that were highlighted above as having 'strong' DBS, where strength is impacted by length, show up here too.

    3. Author Response:

      The following is the authorsโ€™ response to the previous reviews

      Reviewer #1 (Public Review):

      Summary

      While DNA sequence divergence, differential expression, and differential methylation analysis have been conducted between humans and the great apes to study changes that "make us human", the role of lncRNAs and their impact on the human genome and biology has not been fully explored. In this study, the authors computationally predict HSlncRNAs as well as their DNA Binding sites using a method they have developed previously and then examine these predicted regions with different types of enrichment analyses. Broadly, the analysis is straightforward and after identifying these regions/HSlncRNAs the authors examined their effects using different external datasets.

      I no longer have any concerns about the manuscript as the authors have addressed my comments in the first round of review.

      We thank the reviewer for the valuable comments, which have helped us improve the manuscript.

      Reviewer #2 (Public Review):

      Lin et al attempt to examine the role of lncRNAs in human evolution in this manuscript. They apply a suite of population genetics and functional genomics analyses that leverage existing data sets and public tools, some of which were previously built by the authors, who clearly have experience with lncRNA binding prediction. However, I worry that there is a lack of suitable methods and/or relevant controls at many points and that the interpretation is too quick to infer selection. While I don't doubt that lncRNAs contribute to the evolution of modern humans, and certainly agree that this is a question worth asking, I think this paper would benefit from a more rigorous approach to tackling it.

      I thank the authors for their revisions to the manuscript; however, I find that the bulk of my comments have not been addressed to my satisfaction. As such, I am afraid I cannot say much more than what I said last time, emphasising some of my concerns with regards to the robustness of some of the analyses presented. I appreciate the new data generated to address some questions, but think it could be better incorporated into the text - not in the discussion, but in the results.

      We thank the reviewer for the careful reading and valuable comments. In this round of revision, we address the two main concerns: (1) there is a lack of suitable methods and/or relevant controls at many points, and (2) the interpretation is too quick to infer selection. Based on these comments, we have carefully revised all sections of the manuscript, including the Introduction, Results, Discussion, and Materials and Methods.

      In addition, we have performed two new analyses. Based on the two analyses, we have added one figure and two sections to Results, two sections to Materials and Methods, one figure to Supplementary Notes, and two tables to Supplementary Tables. These results were obtained using new methods and provided more support to the main conclusion.

      To be more responsible, we re-look into the comments made in the first round and respond to them further. The following are point-to-point responses to comments.

      Since many of the details in the Responses-To-Comments are available in published papers and eLife publishes Responses-To-Comments, we do not greatly revise supplementary notes to avoid ostensibly repeating published materials.

      โ€œlack of suitable methods and/or relevant controlsโ€.

      We carefully chose the methods, thresholds, and controls in the study; now, we provide clearer descriptions and explanations.

      (1) We have expanded the last paragraph in Introduction to briefly introduce the methods, thresholds, and controls.

      (2) In many places in Results and Materials and Methods, revisions are made to describe and justify methods, thresholds, and controls.

      (3) Some methods, thresholds, and controls have good consensus, such as FDR and genome-wide background, but others may not, such as the number of genes that greatly differ between humans and chimpanzees. Now, we describe our reasons for the latter situation. For example, we explain that โ€œAbout 5% of genes have significant sequence differences in humans and chimpanzees, but more show expression differences due to regulatory sequences. We sorted target genes by their DBS affinity and, to be prudential, chose the top 2000 genes (DBS length>252 bp and binding affinity>151) and bottom 2000 genes (DBS length<60 bp but binding affinity>36) to conduct over-representation analysisโ€.

      (4) We also carefully choose proper words to make descriptions more accurate.

      Responses to the suggestion โ€œnew data generated could be better incorporated into the textโ€.

      (1) We think that this sentence โ€œThe occurrence of HS lncRNAs and their DBSs may have three situations โ€“ (a) HS lncRNAs preceded their DBSs, (b) HS lncRNAs and their DBSs co-occurred, (c) HS lncRNAs succeeded their DBSs. Our results support the third situation and the rewiring hypothesisโ€, previously in Discussion, should be better in section 2.3. We have revised it and moved it into the second paragraph of section 2.3.

      (2) Our two new analyses generated new data, and we describe them in Results.

      (3) It is possible to move more materials from Supplementary Notes to the main text, but it is probably unnecessary because the main text currently has eight sub-sections, two tables, and four figures.

      Responses to the comment โ€œthe interpretation is too quick to infer selectionโ€.

      (1) When using XP-CLR, iSAFE, Tajima's D, Fay-Wu's H, the fixation index (Fst), and linkage disequilibrium (LD) to detect selection signals, we used the widely adopted parameters and thresholds but did not mention this clearly in the original manuscript. Now, in the first sentence of the second paragraph of section 2.4, we add the phrase โ€œwith widely-used parameters and thresholdsโ€ (more details are available in section 4.7 and Supplementary Notes).

      (2) It is not the first time we used these tests. Actually, we used these tests in two other studies (Tang et al. Uncovering the extensive trade-off between adaptive evolution and disease susceptibility. Cell Rep. 2022; Tang et al. PopTradeOff: A database for exploring population-specificity of adaptive evolution, disease susceptibility, and drug responsiveness. Comput Struct Biotechnol J. 2023). In this manuscript, section 2.5 and section 4.12 describe how we use these tests to detect signals and infer selection. We also cite the above two published papers from which the reader can obtain more details.

      (3) Also, in section 2.4, we stress that โ€œSignals in considerable DBSs were detected by multiple tests, indicating the reliability of the analysisโ€.

      To further respond to the comments of โ€œlack of suitable methodsโ€ and โ€œthis paper would benefit from a more rigorous approach to tackling itโ€, we have performed two new analyses. The results of the new analyses agree well with previous results and provide new support for the main conclusion. The result of section 2.5 is novel and interesting.

      We write in Discussion โ€œTwo questions are how mouse-specific lncRNAs specifically rewire gene expression in mice and how human- and mouse-specific rewiring influences the cross-species transcriptional differencesโ€. To investigate whether the rewiring of gene expression by HS lncRNA in humans is accidental in evolution, we have made further genomic and transcriptomic analyses (Lin et al. Intrinsically linked lineage-specificity of transposable elements and lncRNAs reshapes transcriptional regulation species- and tissue-specifically. doi:ย https://doi.org/10.1101/2024.03.04.583292). To verify the obtained conclusions, we analyzed the spermatogenesis data from multiple species and obtained supporting evidence (not published).

      I note some specific points that I think would benefit from more rigorous approaches, and suggest possible ways forward for these.

      Much of this work is focused on comparing DNA binding domains in human-unique long-noncoding RNAs and DNA binding sites across the promoters of genes in the human genome, and I think the authors can afford to be a bit more methodical/selective in their processing and filtering steps here. The article begins by searching for orthologues of human lncRNAs to arrive at a set of 66 human-specific lncRNAs, which are then characterised further through the rest of the manuscript. Line 99 describes a binding affinity metric used to separate strong DBS from weak DBS; the methods (line 432) describe this as being the product of the DBS or lncRNA length times the average Identity of the underlying TTSs. This multiplication, in fact, undoes the standardising value of averaging and introduces a clear relationship between the length of a region being tested and its overall score, which in turn is likely to bias all downstream inference, since a long lncRNA with poor average affinity can end up with a higher score than a short one with higher average affinity, and it's not quite clear to me what the biological interpretation of that should be. Why was this metric defined in this way?

      (1) Using RNA:DNA base-pairing rules, other DBS prediction programs return just DBSs with lengths. Using RNA:DNA base-pairing rules and a variant of Smith-Waterman local alignment, LongTarget returns DBSs with lengths and identity values together with DBDs (local alignment makes DBDs and DBSs predicted simultaneously). Thus, instead of measuring lncRNA/DNA binding based on DBS length, we measure lncRNA/DNA binding based on both DBS length and DBD/DBS identity (simply called identity, which is the percentage of paired nucleotides in the RNA and DNA sequences). This allows us to define โ€œbinding affinityโ€. One may think that binding affinity is a more complex function of length and identity. But, according to in vitro studies (see the review Abu Almakarem et al. 2012 and citations therein, and see He et al. 2015 and citations therein), the strength of a triplex is determined by all paired nucleotides (i.e., triplet). Thus, binding affinity=length * identity is biologically reasonable.

      (2) Further, different from predicting DBS upon individual base-pairing rules such as AT-G and CG-C, LongTarget integrates base-pairing rules into rulesets, each covering A, T, C, and G (see the two figures below, which are from He et al 2015). This makes every nucleotide in the RNA and DNA sequences comparable and allows the computation of identity.

      (3) On whether LongTarget may predict unreasonably long DBSs. Three technical features of LongTarget make this highly unlikely (and more unlikely than other programs). The three features are (a) local alignment, (b) gap penalty, and (c) TT penalty (He et al. 2015).

      (4) Some researchers may think that a higher identity threshold (e.g., 0.8 or even higher) makes the predicted DBSs more reliable. This is not true. To explore plausible identity values, we analyzed the distribution of Kcnq1ot1โ€™s DBSs in the large Kcnq1 imprinting region (which contains many known imprinted genes). We found that a high threshold for identity (e.g., 0.8) will make DBSs in many known imprinted genes fail to be predicted. Upon our analysis of many lncRNAs and upon early in vitro experiments, plausible identity values range from 0.4 to 0.8.

      (5) Is it necessary or advisable to define an identity threshold? Since identity values from 0.4 to 0.8 are plausible and identity is a property of a DBS but does not reflect the strength of the whole triplex, it is more reasonable to define a threshold for binding affinity to control predicted DBSs. As explained above, binding affinity = length*identity is a reasonable measure of the strength of a triplex. The default threshold is 60, and given an identity of 0.6 in many triplexes, a DBS with affinity=60 is about 100 bp. Compared with TF binding sites (TFBS), 100 bp is quite long. As we explain in the main text, โ€œtaking a DBS of 147 bp as an example, it is extremely unlikely to be generated by chance (p < 8.2e-19 to 1.5e-48)โ€.

      (6) How to validate predicted DBSs? Validation faces these issues. (a) DBDs are predicted on the genome level, but target transcripts are expressed in different tissues and cells. So, no single transcriptomic dataset can validate all predicted DBSs of a lncRNA. No matter using what techniques and what cells, only a small portion of predicted DBSs can be experimentally captured (validated). (b) The resolution of current experimental techniques is limited; thus, experimentally identified DBSs (i.e., โ€œpeaksโ€) are much longer than computationally predicted DBSs. (c) Experimental results contain false positives and false negatives. So, validation (or performance evaluation) should also consider the ROC curves (Wen et al. 2022).

      (7) As explained above, a long DBS may have a lower binding affinity than a short DBS. A biological interpretation is that the long DBS may accumulate mutations that decrease its binding ability gradually.

      There is also a strong assumption that identified sites will always be bound (line 100), which I disagree is well-supported by additional evidence (lines 109-125). The authors show that predicted NEAT1 and MALAT1 DBS overlap experimentally validated sites for NEAT1, MALAT1, and MEG3, but this is not done systematically, or genome-wide, so it's hard to know if the examples shown are representative, or a best-case scenario.

      (1) We did not make this assumption. Apparently, binding depends on multiple factors, including co-expression of genes and specific cellular context.

      (2) On the second issue, โ€œthis is not done systematically, or genome-wideโ€. We did genome-wide but did not show all results (supplementary fig 2 shows three genomic regions, which are impressively good). In Wen et al. 2022, we describe the overall results.

      It's also not quite clear how overlapping promoters or TSS are treated - are these collapsed into a single instance when calculating genome-wide significance? If, eg, a gene has five isoforms, and these differ in the 3' UTR but their promoter region contains a DBS, is this counted five times, or one? Since the interaction between the lncRNA and the DBS happens at the DNA level, it seems like not correcting for this uneven distribution of transcripts is likely to skew results, especially when testing against genome-wide distributions, eg in the results presented in sections 5 and 6. I do not think that comparing genes and transcripts putatively bound by the 40 HS lncRNAs to a random draw of 10,000 lncRNA/gene pairs drawn from the remaining ~13500 lncRNAs that are not HS is a fair comparison. Rather, it would be better to do many draws of 40 non-HS lncRNAs and determine an empirical null distribution that way, if possible actively controlling for the overall number of transcripts (also see the following point).

      (1) We predicted DBSs in the promoter region of 179128 Ensembl-annotated transcripts and did not merge DBSs (there is no need to merge them). If multiple transcripts share the same TSS, they may share the same DBS, which is natural.

      (2) If the DBSs of multiple transcripts of a gene overlap, the overlap does not raise a problem for lncRNA/DNA binding analysis in specific tissues because usually only one transcript is expressed in a tissue. Therefore, there is no such situation โ€œIf, e.g., a gene has five isoforms, and these differ in the 3' UTR but their promoter region contains a DBS, is this counted five times, or one?โ€

      (3) It is unclear to us what โ€œit seems like not correcting for this uneven distribution of transcripts is likely to skew resultsโ€ means. Regarding testing against genome-wide distributions, statistically, it is beneficial to make many rounds of random draws genome-wide, but this will take a huge amount of time. Since more variables demand more rounds of drawing, to our knowledge, this is not widely practiced in large-scale transcriptomic data analyses.

      (4) If the difference (result) is small thus calls for rigorous statistical testing, making many rounds of random draws genome-wide is necessary. In our results, โ€œ45% of these pairs show a significant expression correlation in specific tissues (Spearman's |rho| >0.3 and FDR <0.05). In contrast, when randomly sampling 10000 pairs of lncRNAs and protein-coding transcripts genome-wide, the percent of pairs showing this level of expression correlation (Spearman's |rho| >0.3 and FDR <0.05) is only 2.3%โ€.

      Thresholds for statistical testing are not consistent, or always well justified. For instance, in line 142 GO testing is performed on the top 2000 genes (according to different rankings), but there's no description of the background regions used as controls anywhere, or of why 2000 genes were chosen as a good number to test? Why not 1000, or 500? Are the results overall robust to these (and other) thresholds? Then line 190 the threshold for downstream testing is now the top 20% of genes, etc. I am not opposed to different thresholds in principle, but they should be justified.

      (1) We used the g:Profiler program to perform over-representation analysis to identify enriched GO terms. This analysis is used to determine what pre-defined gene sets (GO terms) are more present (over-represented) in a list of โ€œinterestingโ€ genes than what would be expected by chance. Specifically, this analysis is often used to examine whether the majority of genes in a pre-defined gene set fall in the extremes of a list: the top and bottom of the list, for example, may correspond to the largest differences in expression between the two cell types. g:Profiler always takes the whole genome as the reference; that is why we did not mention the whole genome reference. We now add in section 2.2 โ€œ(with the whole genome as the reference)โ€.

      (2) Why choosing 2000 but not 2500 genes is somewhat subjective. We now explain that โ€œAbout 5% of genes have significant sequence differences in humans and chimpanzees, but more show expression differences due to regulatory sequences. We sorted target genes by their DBS affinity and, to be prudential, chose the top 2000 genes (DBS length>252 bp and binding affinity>151) and bottom 2000 genes (DBS length<60 bp but binding affinity>36) to conduct over-representation analysisโ€.

      Likewise, comparing Tajima's D values near promoters to genome-wide values is unfair, because promoters are known to be under strong evolutionary constraints relative to background regions; as such it is not surprising that the results of this comparison are significant. A fairer comparison would attempt to better match controls (eg to promoters without HS lncRNA DBS, which I realise may be nearly impossible), or generate empirical p-values via permutation or simulation.

      We used these tests to detect selection signals in DBSs but not in the whole promoter regions. Using promoters without HS lncRNA DBS as the control also has risks because promoter regions contain other kinds of regulatory sequences.

      There are huge differences in the comparisons between the Vindija and Altai Neanderthal genomes that to me suggest some sort of technical bias or the such is at play here. e.g. line 190 reports 1256 genes to have a high distance between the Altai Neanderthal and modern humans, but only 134 Vindija genes reach the same threshold of 0.034. The temporal separation between the two specimens does not seem sufficient to explain this difference, nor the difference between the Altai Denisovan and Neanderthal results (2514 genes for Denisovan), which makes me wonder if it is a technical artefact relating to the quality of the genome builds? It would be worth checking.

      We feel it is hard to know whether or not the temporal separation between these specimens is sufficient to explain the differences because many details of archaic humans and their genomes remain unknown and because mechanisms determining genotype-phenotype relationships remain poorly known. After 0.034 was determined, these numbers of genes were determined accordingly. We chose parameters and thresholds that best suit the most important requirements, but these parameters and thresholds may not best suit other requirements; this is a problem for all large-scale studies.ย ย  ย ย 

      Inferring evolution: There are some points of the manuscript where the authors are quick to infer positive selection. I would caution that GTEx contains a lot of different brain tissues, thus finding a brain eQTL is a lot easier than finding a liver eQTL, just because there are more opportunities for it. Likewise, claims in the text and in Tables 1 and 2 about the evolutionary pressures underlying specific genes should be more carefully stated. The same is true when the authors observe high Fst between groups (line 515), which is only one possible cause of high Fst - population differentiation and drift are just as capable of giving rise to it, especially at small sample sizes.

      (1) We add in Discussion that โ€œFinally, not all detected signals reliably indicate positive selectionโ€.

      (2) Our results are that more signals are detected in CEU and CHB than in YRI; this agrees all population genetics studies and implies that our results are not wrongly biased because more samples and larger samples were obtained from CEU and CHB.

    1. eLife Assessment

      This important study presents a well-constructed multiscale simulation framework to investigate ATP-driven DNA translocation by prokaryotic SMC complexes, supporting a segment-capture mechanism. The strength of evidence is convincing, highlighting the necessity of a precise balance between electrostatic interactions and hydrogen bonding, as well as the critical role of kleisin asymmetry in ensuring unidirectional movement.

    2. Reviewer #1 (Public review):

      Summary:

      This study used explicit-solvent simulations and coarse-grained models to identify the mechanistic features that allow for the unidirectional motion of SMC on DNA. Shorter explicit-solvent models describe relevant hydrogen bond energetics, which were then encoded in a coarse-grained structure-based model. In the structure-based model, the authors mimic chemical reactions as signaling changes in the energy landscape of the assembly. By cycling through the chemical cycle repeatedly, the authors show how these time-dependent energetic shifts naturally lead SMC to undergo translocation steps along DNA that are on a length scale that has been identified.

      Strengths:

      Simulating large-scale conformational changes in complex assemblies is extremely challenging. This study utilizes highly-detailed models to parameterize a coarse-grained model, thereby allowing the simulations to connect the dynamics of precise atomistic-level interactions with a large-scale conformational rearrangement. This study serves as an excellent example for this overall methodology, where future studies may further extend this approach to investigated any number of complex molecular assemblies.

      Weaknesses:

      The only relative weakness is that the text does not always clearly communicate which aspects of the dynamics are expected to be robust. That is, which aspects of the dynamics/energetics are less precisely described by this model? Where are the limits of the models, and why should the results be considered within the range of applicability of the models?

    3. Reviewer #2 (Public review):

      Summary:

      The authors perform coarse grained and all atom simulations to provide a mechanism for loop extrusion that is involved in genome compaction.

      Strengths:

      The simulations are very thoughtful. They provide insights into the translocation process, which is only one of the mechanisms. Much of the analyses is very good. Over all the study advances the use of simulations in this complicated systems.

      Weaknesses:

      Even the authors point out several limitations, which cannot be easily overcome in the paper because of the paucity of experimental data. Nevertheless, the authors could have done so to illustrate the main assertion that loop extrusion occurs by the motor translocating on DNA. They should mention more clearly that there are alternative theories that have accounted for a number of experimental data,

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Yamauchi and colleagues combine all-atom and coarse-grained MD simulations to investigate the mechanism of DNA translocation by prokaryotic SMC complexes. Their multiscale approach is well-justified and supports a segment-capture model in which ATP-dependent conformational changes lead to the unidirectional translocation of DNA. A key insight from the study is that asymmetry in the kleisin path enforces directionality. The work introduces an innovative computational framework that captures key features of SMC motor action, including DNA binding, conformational switching, and translocation.

      This work is well executed and timely, and the methodology offers a promising route for probing other large molecular machines where ATP activity is essential.

      Strengths:

      This manuscript introduces an innovative yet simple method that merges all-atom and coarse-grained, purely equilibrium, MD simulations to investigate DNA translocation by SMC complexes, which is triggered by activated ATP processes. Investigating the impact of ATP on large molecular motors like SMC complexes is extremely challenging, as ATP catalyses a series of chemical reactions that take and keep the system out of equilibrium. The authors simulate the ATP cycle by cycling through distinct equilibrium simulations where the force field changes according to whether the system is assumed to be in the disengaged, engaged, and V-shaped states; this is very clever as it avoids attempting to model the non-equilibrium process of ATP hydrolysis explicitly. This equilibrium switching approach is shown to be an effective way to probe the mechanistic consequences of ATP binding and hydrolysis in the SMC complex system.

      The simulations reveal several important features of the translocation mechanism. These include identifying that a DNA segment of ~200 bp is captured in the engaged state and pumped forward via coordinated conformational transitions, yielding a translocation step size in good agreement with experimental estimates. Hydrogen bonding between DNA and the top of the ATPase heads is shown to be critical for segment capturtrans, as without it, translocation is shown to fail. Finally, asymmetry in the kleisin subunit path is shown to be responsible for unidirectionally.

      This work highlights how molecular simulations are an excellent complement to experiments, as they can exploit experimental findings to provide high-resolution mechanistic views currently inaccessible to experiments. The findings of these simulations are plausible and expand our understanding of how ATP hydrolysis induces directional motion of the SMC complex.

      Weaknesses:

      There are aspects of the methodology and modelling assumptions that are not clear and could be better justified. The major ones are listed below:

      (1) The all-atom MD simulations involve a 47-bp DNA duplex interacting with the ATPase heads, from which key residues involved in hydrogen bonding are identified. However, DNA mechanics-including flexibility and hydrogen bond formation-are known to be sequence-dependent. The manuscript uses a single arbitrary sequence but does not discuss potential biases. Could the authors comment on how sequence variability might affect binding geometry or the number of hydrogen bonds observed?

      (2) A key feature of the coarse-grained model is the inclusion of a specific hydrogen-bonding potential between DNA and residues on the ATPase heads. The authors select the top 15 hydrogen-bond-forming residues from the all-atom simulations (with contact probability > 0.05), but the rationale for this cutoff is not explained. Also, the strength of hydrogen bonds in coarse-grained models can be sensitive to context. How did the authors calibrate the strength of this interaction relative to electrostatics, and did they test its robustness (e.g., by varying epsilon or residue set)? Could this interaction be too strong or too weak under certain ionic conditions? What happens when salt is changed?

      (3) To enhance sampling, the translocation simulations are run at 300 mM monovalent salt. While this is argued to be physiological for Pyrococcus yayanosii, such a concentration also significantly screens electrostatics, possibly altering the interaction landscape between DNA and protein or among protein domains. This may significantly impact the results of the simulations. Why did the authors not use enhanced sampling methods to sample rare events instead of relying on a high-salt regime to accelerate dynamics?

      (4) Only a small fraction of the simulated trajectories complete successful translocation (e.g., 45 of 770 in one set), and this is attributed to insufficient simulation time. While the authors are transparent about this, it raises questions about the reliability of inferred success rates and about possible artefacts (e.g., DNA trapping in coiled-coil arms). Could the authors explore or at least discuss whether alternative sampling strategies (e.g., Markov State Models, transition path sampling) might address this limitation more systematically?

    5. Author Response:

      We thank the reviewers for their insightful comments on our manuscript. We are encouraged by their positive assessment of our multiscale simulation approach and segment-capture mechanism.

      In our revision, we will address the reviewers' primary concerns, which are summarized into three key points: (1) providing a more comprehensive discussion of the validity, robustness, and limitations of our model; (2) improving contextualization with alternative mechanisms; and (3) enhancing the clarity of our results, figures, and terminology.

      1) Model Validity, Robustness, and Limitations:

      As suggested by Reviewers #1 and #3, we will provide a more thorough discussion of our model's assumptions and limitations.[tt1]ย  This is essential to evaluate the generalizability and reliability of our conclusions. We will clarify which aspects of the dynamics we believe to be robust, elaborate on the rationale behind key parameter choices, such as the selection criteria for hydrogen-bonding residues and the calibration of their interaction strength, and discuss how these choices may influence the simulation outcomes. Furthermore, we will mention the potential impact of our choices regarding DNA sequence, DNA length, and the high-salt concentration, explaining why we opted for this simulation strategy over alternative enhanced-sampling techniques.

      2) Contextualization with Alternative Mechanisms:

      Following the comments by Reviewer #2, we will expand our discussion to better contextualize our work. We will provide a more detailed comparison between our segment-capture model and alternative mechanisms, particularly the 'scrunching' model (e.g., the theoretical work by Takaki et al. Nat. Commun. 2021,). This will help clarify how our high-resolution mechanistic view that reveals stepwise conformational transitions underlying segment capture fits into the broader landscape of SMC loop extrusion research. We believe this will contribute to the ongoing scientific discourse.

      3) Clarity of Results, Figures, and Terminology:

      Based on valuable suggestions from Reviewers #2 and #3, we will revise our manuscript to improve the clarity and accessibility of our findings. We will update figures and their descriptions (e.g., Figure 4I, J), providing a clearer step-by-step explanation of the translocation process within the ATP cycle (related to Figure 2), clarifying the role of each conformational state, elucidating how these transitions contribute to the loop extrusion mechanism, and defining key terms such as "pumping" more precisely.

      We are confident that these revisions will substantially strengthen the mechanistic clarity and scientific contribution of our work.

    1. eLife Assessment

      Research on push-pull systems has often focused on controlled environments, leaving significant gaps in our understanding of how these systems function under real-world conditions. This important and solid study makes a substantial contribution by investigating the volatile emissions and behavioral effects of Desmodium in natural and semi-field contexts which offer insights of broad interest for sustainable agriculture and pest management. While the authors rightly acknowledge some remaining limitations, the revised manuscript now provides a well-supported and transparent assessment of the ecological role of Desmodium volatiles in push-pull systems.

    2. Reviewer #2 (Public review):

      Based on the controversy of whether the Desmodium intercrop emits bioactive volatiles that repel the fall armyworm, the authors conducted this study to assess the effects of the volatiles from Desmodium plants in the push-pull system on behavior of FAW oviposition. This topic is interesting and the results are valuable for understanding the push-pull system for the management of FAW, the serious pest. The methodology used in this study is valid, leading to reliable results and conclusions. I just have a few concerns and suggestions for the improvement of this paper:

      (1) The volatiles emitted from D. incanum were analyzed and their effects on the oviposition behavior of FAW moth were confirmed. However, it would be better and useful to identify the specific compounds that are crucial for the success of the push-pull system.

      (2) That would be good to add "symbols" of significance in Figure 4 (D).

      (3) Figure A is difficult for readers to understand.

      (4) It will be good to deeply discuss the functions of important volatile compounds identified here with comparison with results in previous studies in the discussion better.

      Comments on revisions:

      The authors addressed all my concerns, and I believe that the current version is appropriate for publication.

    3. Author Response:

      The following is the authorsโ€™ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The manuscript of Odermatt et al. investigates the volatiles released by two species of Desmodium plants and the response of herbivores to maize plants alone or in combination with these species. The results show that Desmodium releases volatiles in both the laboratory and the field. Maize grown in the laboratory also released volatiles, in a similar range. While female moths preferred to oviposit on maize, the authors found no evidence that Desmodium volatiles played a role in lowering attraction to or oviposition on maize.

      Strengths:

      The manuscript is a response to recently published papers that presented conflicting results with respect to whether Desmodium releases volatiles constitutively or in response to biotic stress, the level at which such volatiles are released, and the behavioral effect it has on the fall armyworm. These questions are relevant as Desmodium is used in a textbook example of pest-suppressive sustainable intercropping technology called push-pull, which has supported tens of thousands of smallholder farmers in suppressing moth pests in maize. A large number of research papers over more than two decades have implied that Desmodium suppresses herbivores in push-pull intercropping through the release of large amounts of volatiles that repel herbivores. This premise has been questioned in recent papers. Odermatt et al. thus contribute to this discussion by testing the role of odors in oviposition choice. The paper confirms that ovipositing FAW preferred maize, and also confirmed that odors released from Desmodium appeared not important in their bioassays.

      The paper is a welcome addition to the literature and adds quality headspace analyses of Desmodium from the laboratory and the field. Furthermore, the authors, some of whom have since long contributed to developing push-pull, also find that Desmodium odors are not significant in their choice between maize plants. This advances our knowledge of the mechanisms through which push-pull suppresses herbivores, which is critically important to evolving the technique to fit different farming systems and translating this mechanism to fit with other crops and in other geographical areas.

      Thank you for your careful assessment of our manuscript.

      Weaknesses:

      Below I outline the major concerns:

      (1) Clear induction of the experimental plants, and lack of reflective discussion around this: from literature data and previous studies of maize and Desmodium, it is clear that the plants used in this study, particularly the Desmodium, were induced. Maize appeared to be primarily manually damaged, possibly due to sampling (release of GLV, but little to no terpenoids, which is indicative of mostly physical stress and damage, for example, one of the coauthor's own paper Tamiru et al. 2011), whereas Desmodium releases a blend of many compounds (many terpenoids indicative of herbivore induction). Erdei et al. also clearly show that under controlled conditions maize, silver leaf and green leaf Desmodium release volatiles in very low amounts. While the condition of the plants in Odermatt et al. may be reflective of situations in push-pull fields, the authors should elaborate on the above in the discussion (see comments) such that the readers understand that the plant's condition during the experiments. This is particularly important because it has been assumed that Desmodium releases typical herbivore-induced volatiles constitutively, which is not the case (see Erdei et al. 2024). This reflection is currently lacking in the manuscript.

      We acknowledge the need for a more reflective discussion on the possible causes of volatile emission due to physical damage. Although the field plants were carefully handled, it is possible that some physical stress may have contributed to the release of volatiles, such as green leaf volatiles (GLVs). We ensured the revised manuscript reflects this nuanced interpretation (lines 282 โ€“ 286). However, we also explained more clearly that our aim was to capture the volatile emission of plants used by farmers under realistic conditions and moth responses to these plants, not to be able to attribute the volatile emission to a specific cause (lines 115 โ€“ 117). We revised relevant passages throughout the results and discussion to ensure that we do not make any claims about the reason for volatile emissions, and that our claims regarding these plants and their headspace being representative of the system as practiced by farmers are supported. In the revised manuscript we provide a new supplementary table S2 that additionally shows the classification of the identified substances, which also shows that the majority of the substances that were found in the headspace of the sampled plants of Desmodium intortum or Desmodium incanum are monoterpenes, sesquiterpenes, or aromatic compounds, and not GLVs (that are typically emitted following damage).

      (2) Lack of controls that would have provided context to the data: The experiments lack important controls that would have helped in the interpretation:

      2a The authors did not control the conditions of the plants. To understand the release of volatiles and their importance in the field, the authors should have included controlled herbivory in both maize and Desmodium. This would have placed the current volatile profiles in a herbivory context. Now the volatile measurements hang in midair, leading to discussions that are not well anchored (and should be rephrased thoroughly, see eg lines 183-188). It is well known that maize releases only very low levels of volatiles without abiotic and biotic stressors. However, this changes upon stress (GLVs by direct, physical damage and eg terpenoids upon herbivory, see above). Erdei et al. confirm this pattern in Desmodium. Not having these controls, means that the authors need to put the data in the context of what has been published (see above).

      We appreciate this concern. Our study aimed to capture the real-world conditions of push-pull fields, where Desmodium and maize grow in natural environments without the direct induction of herbivory for experimental purposes (lines 115 โ€“ 117). We agree that in further studies it would be important to carry out experiments under different environmental conditions, including herbivore damage. However, this was not within the scope of the present study.

      2b It would also have been better if the authors had sampled maize from the field while sampling Desmodium. Together with the above point (inclusion of herbivore-induced maize and Desmodium), the levels of volatile release by Desmodium would have been placed into context.

      We acknowledge that sampling maize and other intercrop plants, such as edible legumes, alongside Desmodium in the push-pull field would have allowed us to make direct comparisons of the volatile profiles of different plants in the push-pull system under shared field conditions. Again, this should be done in future experiments but was beyond the scope of the present study. Due to the amount of samples we could handle given cost and workload, we chose to focus on Desmodium because there is much less literature on the volatile profiles of field-grown Desmodium than maize plants in the field: we are aware of one study attempting to measure field volatile profiles from Desmodium intortum (Erdei et al. 2024) and no study attempting this for Desmodium incanum. We pointed out this justification for our focus on Desmodium in the manuscript (lines 435 - 439). Additionally, we suggested in the discussion that future studies should measure volatile profiles from all plants commonly used in push-pull systems alongside Desmodium (lines 267 โ€“ 269).

      2c To put the volatiles release in the context of push-pull, it would have been important to sample other plants which are frequently used as intercrop by smallholder farmers, but which are not considered effective as push crops, particularly edible legumes. Sampling the headspace of these plants, both 'clean' and herbivore-induced, would have provided a context to the volatiles that Desmodium (induced) releases in the field - one would expect unsuccessful push crops to not release any of these 'bioactive' volatiles (although 'bioactive' should be avoided) if these odors are responsible for the pest suppressive effect of Desmodium. Many edible intercrops have been tested to increase the adoption of push-pull technology but with little success.

      We very much agree that such measurements are important for the longer-term research program in this field. But again, for the current study this would have exploded the size of the required experiment. Regarding bioactivity, we have been careful to use the phrase "potentially bioactive" solely when referring to findings from the literature (lines 99โ€“103), in order to avoid making any definitive claims about our own results.

      Because of the lack of the above, the conclusions the authors can draw from their data are weakened. The data are still valuable in the current discussion around push-pull, provided that a proper context is given in the discussion along the points above.

      We think our revisions made the specific aims of this study more explicit and help to avoid misleading claims.

      (3) 'Tendency' of the authors to accept the odor hypothesis (i.e. that Desmodium odors are responsible for repelling FAW and thereby reduce infestation in maize under push-pull management) in spite of their own data: The authors tested the effects of odor in oviposition choice, both in a cage assay and in a 'wind tunnel'. From the cage experiments, it is clear that FAW preferred maize over Desmodium, confirming other reports (including Erdei et al. 2024). However, when choosing between two maize plants, one of which was placed next to Desmodium to which FAW has no tactile (taste, structure, etc), FAW chose equally. Similarly in their wind tunnel setup (this term should not be used to describe the assay, see below), no preference was found either between maize odor in the presence or absence of Desmodium. This too confirms results obtained by Erdei et al. (but add an important element to it by using Desmodium plants that had been induced and released volatiles, contrary to Erdei et al. 2024). Even though no support was found for repellency by Desmodium odors, the authors in many instances in the manuscript (lines 30-33, 164-169, 202, 279, 284, 304-307, 311-312, 320) appear to elevate non-significant tendencies as being important. This is misleading readers into thinking that these interactions were significant and in fact confirming this in the discussion. The authors should stay true to their own data obtained when testing the hypothesis of whether odors play a role in the pest-suppressive effect of push-pull.

      We appreciate this feedback and agree that we may have overstated claims that could not be supported by strict significance tests. However, we believe that non-significant tendencies can still provide valuable insights. In the revised version of the manuscript, we ensured a clear distinction between statistically significant findings and non-significant trends and remove any language that may imply stronger support for the odor hypothesis than what the data show in all the lines that were mentioned.

      (4) Oviposition bioassay: with so many assays in close proximity, it is hard to certify that the experiments are independent. Please discuss this in the appropriate place in the discussion.

      We have pointed this out in the submitted manuscript in lines 275 โ€“ 279. Furthermore, we included detailed captions to figure 4 - supporting figure 3 & figure 4 - supporting figure 4. We are aware that in all such experiments there is a danger of between-treatment interference, which we pointed out for our specific case. We stated that with our experimental setup we tried to minimize interference between treatments by spacing and temporal staggering. We would like to point out that this common caveat does not invalidate experimental designs when practicing replication and randomization. We assume that insects are able to select suitable oviposition sites in the background of such confounding factors under realistic conditions.

      (5) The wind tunnel has a number of issues (besides being poorly detailed):

      5a. The setup which the authors refer to as a 'wind tunnel' does not qualify as a wind tunnel. First, there is no directional flow: there are two flows entering the setup at opposite sides. Second, the flow is way too low for moths to orient in (in a wind tunnel wind should be presented as a directional cue. Only around 1.5 l/min enters the wind tunnel in a volume of 90 l approximately, which does not create any directional flow. Solution: change 'wind tunnel' throughout the text to a dual choice setup /assay.)

      We agree with these criticisms and changed the terminology accordingly from โ€˜wind tunnelโ€™ to โ€˜dual choice assayโ€™. We have now conducted an additional experiment which we called โ€˜no-choice assayโ€™ that provides conditions closer to a true wind tunnel. The setup of the added experiment features an odor entry point at only one side of the chamber to create a more directional airflow. Each treatment (maize alone, maize + D. intortum, maize + D. incanum, and a control with no plants) was tested separately, with only one treatment conducted per evening to avoid cross-contamination, as described in the methods section of the no-choice assay.

      5b. There is no control over the flows in the flight section of the setup. It is very well possible that moths at the release point may only sense one of the 'options'. Please discuss this.

      We added this to the discussion (lines 369 โ€“ 374). The new no-choice assays also address this concern by using a setup with laminar flow.

      5c. Too low a flow (1,5 l per minute) implies a largely stagnant air, which means cross-contamination between experiments. An experiment takes 5 minutes, but it takes minimally 1.5 hours at these flows to replace the flight chamber air (but in reality much longer as the fresh air does not replace the old air, but mixes with it). The setup does not seem to be equipped with e.g. fans to quickly vent the air out of the setup. See comments in the text. Please discuss the limitations of the experimental setup at the appropriate place in the discussion.

      We added these limitations to the discussion and addressed these concerns with new experiments (see answer 5a).

      5d. The stimulus air enters through a tube (what type of tube, diameter, length, etc) containing pressurized air (how was the air obtained into bags (type of bag, how is it sealed?), and the efflux directly into the flight chamber (how, nozzle?). However, it seems that there is no control of the efflux. How was leakage prevented, particularly how the bags were airtight sealed around the plants?ย 

      We added the missing information to the methods and provided details about types of bags, manufacturers, and pre-treatments in the method section. In short, PTFE tubes connected bagged plants to the bioassay setup and air was pumped in at an overpressure, so leakage was not eliminated but contamination from ambient air was avoided.

      5e. The plants were bagged in very narrowly fitting bags. The maize plants look bent and damaged, which probably explains the GLVs found in the samples. The Desmodium in the picture (Figure 5 supplement), which we should assume is at least a representative picture?) appears to be rather crammed into the bag with maize and looks in rather poor condition to start with (perhaps also indicating why they release these volatiles?). It would be good to describe the sampling of the plants in detail and explain that the way they were handled may have caused the release of GLVs.

      We included a more detailed description of the plant handling and bagging processes to the methods to clarify how the plants were treated during the dual-choice and the no-choice assays reported in the revised manuscript. We politely disagree that the maize plants were damaged and the Desmodium plants not representative of those encountered in the field. The plants were grown in insect-proof screen houses to prevent damage by insects and carefully curved without damaging them to fit into the bag. The Desmodium plant pictured was D. incanum, which has sparser foliage and smaller leaves than D. intortum.

      (6) Figure 1 seems redundant as a main figure in the text. Much of the information is not pertinent to the paper. It can be used in a review on the topic. Or perhaps if the authors strongly wish to keep it, it could be placed in the supplemental material.

      We think that Figure 1 provides essential information about the push-pull system and the FAW. To our knowledge, this partly contradictory evidence so far has not been synthesized in the literature. We realize that such a figure would more commonly be provided in a review article, but we do not think that the small number of studies on this topic so far justify a stand-alone review. Instead, the introduction to our manuscript includes a brief review of these few studies, complemented by the visual summary provided in Figure 1 and a detailed supplementary table.

      Reviewer #2 (Public review):

      Based on the controversy of whether the Desmodium intercrop emits bioactive volatiles that repel the fall armyworm, the authors conducted this study to assess the effects of the volatiles from Desmodium plants in the push-pull system on behavior of FAW oviposition. This topic is interesting and the results are valuable for understanding the push-pull system for the management of FAW, the serious pest. The methodology used in this study is valid, leading to reliable results and conclusions. I just have a few concerns and suggestions for improvement of this paper:

      (1) The volatiles emitted from D. incanum were analyzed and their effects on the oviposition behavior of FAW moth were confirmed. However, it would be better and useful to identify the specific compounds that are crucial for the success of the push-pull system.

      We fully agree that identifying specific volatile compounds responsible for the push-pull effect would provide valuable insights into the underlying mechanisms of the system. However, the primary focus of this study was to address the still unresolved question whether Desmodium emits detectable or โ€œsignificantโ€ amounts of volatiles at all under field conditions, and the secondary aim was to test whether we could demonstrate a behavioral effect of Desmodium headspace on FAW moths. Before conducting our experiments, we carefully considered the option of using single volatile compounds and synthetic blends in bioassays. We decided against this because we judged that the contradictory evidence in the literature was not a sufficient basis for composing representative blends. Furthermore, we think it is an important first step to test f. or behavioral responses to the headspaces of real plants. We consider bioassays with pure compounds to be important for confirmation and more detailed investigation in future studies. There was also contradictory evidence in the literature regarding moth responses to plants. We thus opted to focus on experiments with whole plants to maintain ecological relevance.

      (2) That would be good to add "symbols" of significance in Figure 4 (D).

      We report the statistical significance of the parameters in Figure 4 (D) in Table 3, which shows the mixed model applied for oviposition bioassays. While testing significance between groups is a standard approach, we used a more robust model-based analysis to assess the effects of multiple factors simultaneously. We provided a cross-reference to Table 3 from the figure description of Figure 4 (D) for readers to easily find the statistical details.

      (3) Figure A is difficult for readers to understand.

      Unfortunately, it is not entirely clear which specific figure is being referred to as "Figure A" in this comment. We tried to keep our figures as clear as possible.

      (4) It will be good to deeply discuss the functions of important volatile compounds identified here with comparison with results in previous studies in the discussion better.

      Our study does not provide strong evidence that specific volatiles from Desmodium plants are important determinants of FAW oviposition or choice in the push-pull system. Therefore, we prefer to refrain from detailed discussions of the potential importance of individual compounds. However, in the revised version, we provide an additional table S2 which identifies the overlap with volatiles previously reported from Desmodium, as only the total numbers are summarized in the discussion of the submitted paper.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The points raised are largely self-explanatory as to what needs to be done to fully resolve them. At a minimum the text needs to be seriously revised to:

      (1) reflect the data obtained.

      (2) reflect on the limitations of their experimental setup and data obtained.

      (3) put the data obtained and its limitations in what these tell us and particularly what not. Ideally, additional headspace measurements are taken, including from herbivory and 'clean' maize and Desmodium (in which there is better control of biotic and abiotic stress), as well as other crops commonly planted as companion crops with maize (but none of them reducing pest pressure).

      Thank you for this summary. Please see our detailed responses above.

      In addition to the main points of critique provided above, I have provided additional comments in the text (https://elife-rp.msubmit.net/elife-rp_files/2024/07/18/00134767/00/134767_0_attach_28_25795_convrt.pdf). These elaborate on the above points and include some new ones too. These are the major points of critique, which I hope the authors can address.

      Thank you very much for these detailed comments.

      Reviewer #2 (Recommendations for the authors):

      It is important to note that the original push-pull system was developed against stemborers and involved Napier grass (still used) around the field, which attracts stemborer moths, and Molasses grass as the intercrop that repels the moths and attracts parasitoids. Later, Molasses grass was replaced by desmodiums because it is a legume that fixes nitrogen and therefore can increase nitrate levels in the soil, but most importantly because it prevents germination of the parasitic Striga weed. The possible repellent effect of desmodium on pests and attraction of natural enemies was never properly tested but assumed, probably to still be able to use the push-pull terminology. This "mistake" should be recognized here and in future publications. It is a real pity that the controversy over the repellent effect of desmodium distracts from the amazing success of the push-pull system, also against the fall armyworm.

      We thank the reviewer for pointing out these issues, which are part of the reason for our Figure 1 and why we would like to keep it. We have described this development of the system in the introduction to better present the push-pull system. Our aim in Figure 1 and Table S1 is to highlight both the evidence of the system's success, and the gaps in our understanding, regarding specifically control of damage from the FAW.

    1. eLife Assessment

      This is a valuable study on how past sensory experiences shape perception across multiple time scales. Using a behavioural task and reanalysed EEG data, the authors identify two unifying mechanisms across time scales: a process resulting in faster responses to expected stimuli modulated by attention to task, and reduced early decoding precision for expected inputs interpreted as dampened feedforward processing. The manipulation to dissociate task-related and unrelated history effects over multiple timescales is novel and promising, but the evidence is incomplete and could be strengthened by clarifying the measures, justifying analyses choices, and the relationship to other work.

    2. Reviewer #1 (Public review):

      Summary:

      This paper addresses an important and topical issue: how temporal context, at various time scales, affects various psychophysical measures, including reaction times, accuracy, and localization. It offers interesting insights, with separate mechanisms for different phenomena, which are well discussed.

      Strengths:

      The paradigm used is original and effective. The analyses are rigorous.

      Weaknesses:

      Here I make some suggestions for the authors to consider. Most are stylistic, but the issue of precision may be important.

      (1) The manuscript is quite dense, with some concepts that may prove difficult for the non-specialist. I recommend spending a few more words (and maybe some pictures) describing the difference between task-relevant and task-irrelevant planes. Nice technique, but not instantly obvious. Then we are hit with "stimulus-related", which definitely needs some words (also because it is orthogonal to neither of the above).

      (2) While I understand that the authors want the three classical separations, I actually found it misleading. Firstly, for a perceptual scientist to call intervals in the order of seconds (rather than milliseconds), "micro" is technically coming from the raw prawn. Secondly, the divisions are not actually time, but events: micro means one-back paradigm, one event previously, rather than defined by duration. Thirdly, meso isn't really a category, just a few micros stacked up (and there's not much data on this). And macro is basically patterns, or statistical regularities, rather than being a fixed time. I think it would be better either to talk about short-term and long-term, which do not have the connotations I mentioned. Or simply talk about "serial dependence" and "statistical regularities". Or both.

      (3) More serious is the issue of precision. Again, this is partially a language problem. When people use the engineering terms "precision" and "accuracy" together, they usually use the same units, such as degrees. Accuracy refers to the distance from the real position (so average accuracy gives bias), and precision is the clustering around the average bias, usually measured as standard deviation. Yet here accuracy is percent correct: also a convention in psychology, but not when contrasting accuracy with precision, in the engineering sense. I suggest you change "accuracy" to "percent correct". On the other hand, I have no idea how precision was defined. All I could find was: "mixture modelling was used to estimate the precision and guess rate of reproduction responses, based on the concentration (k) and height of von Mises and uniform distributions, respectively". I do not know what that means.

      (4) Previous studies show serial dependence can increase bias but decrease scatter (inverse precision) around the biased estimate. The current study claims to be at odds with that. But are the two measures of precision relatable? Was the real (random) position of the target subtracted from each response, leaving residuals from which the inverse precision was calculated? (If so, the authors should say so..) But if serial dependence biases responses in essentially random directions (depending on the previous position), it will increase the average scatter, decreasing the apparent precision.

      (5) I suspect they are not actually measuring precision, but location accuracy. So the authors could use "percent correct" and "localization accuracy". Or be very clear what they are actually doing.

    3. Reviewer #2 (Public review):

      Summary:

      This study investigates the influence of prior stimuli over multiple time scales in a position discrimination task, using pupillometry data and a reanalysis of EEG data from an existing dataset. The authors report consistent history-dependent effects across task-related, task-unrelated, and stimulus-related dimensions, observed across different time scales. These effects are interpreted as reflecting a unified mechanism operating at multiple temporal levels, framed within predictive coding theory.

      Strengths:

      The goal of assessing history biases over multiple time scales is interesting and resonates with both classic (Treisman & Williams, 1984) and recent work (Fritsche et al., 2020; Gekas et al., 2019). The manipulations used to distinguish task-related, unrelated, and stimulus-related reference frames are original and promising.

      Weaknesses:

      I have several concerns regarding the text, interpretation, and consistency of the results, outlined below:

      (1) The abstract should more explicitly mention that conclusions about feedforward mechanisms were derived from a reanalysis of an existing EEG dataset. As it is, it seems to present behavioral data only.

      (2) The EEG task seems quite different from the others, with location and color changes, if I understand correctly, on streaks of consecutive stimuli shown every 100 ms, with the task involving counting the number of target events. There might be different mechanisms and functions involved, compared to the behavioral experiments reported.

      (3) How is the arbitrary choice of restricting EEG decoding to a small subset of parieto-occipital electrodes justified? Blinks and other artifacts could have been corrected with proper algorithms (e.g., ICA) (Zhang & Luck, 2025) or even left in, as decoders are not necessarily affected by noise. Moreover, trials with blinks occurring at the stimulus time should be better removed, and the arbitrary selection of a subset of electrodes, while reducing the information in input to the decoder, does not account for trials in which a stimulus was missed (e.g., due to blinks).

      (4) The artifact that appears in many of the decoding results is puzzling, and I'm not fully convinced by the speculative explanation involving slow fluctuations. I wonder if a different high-pass filter (e.g., 1 Hz) might have helped. In general, the nature of this artifact requires better clarification and disambiguation.

      (5) Given the relatively early decoding results and surprisingly early differences in decoding peaks, it would be useful to visualize ERPs across conditions to better understand the latencies and ERP components involved in the task.

      (6) It is unclear why the precision derived from IEM results is considered reliable while the accuracy is dismissed due to the artifact, given that both seem to be computed from the same set of decoding error angles (equations 8-9).

      (7) What is the rationale for selecting five past events as the meso-scale? Prior history effects have been shown to extend much further back in time (Fritsche et al., 2020).

      (8) The decoding bias results, particularly the sequence of attraction and repulsion, appear to run counter to the temporal dynamics reported in recent studies (Fischer et al., 2024; Luo et al., 2025; Sheehan & Serences, 2022).

      (9) The repulsive component in the decoding results (e.g., Figure 3h) seems implausibly large, with orientation differences exceeding what is typically observed in behavior.

      (10) The pattern of accuracy, response times, and precision reported in Figure 3 (also line 188) resembles results reported in earlier work (Stewart, 2007) and in recent studies suggesting that integration may lead to interference at intermediate stimulus differences rather than improvement for similar stimuli (Ozkirli et al., 2025).

      (11) Some figures show larger group-level variability in specific conditions but not others (e.g., Figures 2b-c and 5b-c). I suggest reporting effect sizes for all statistical tests to provide a clearer sense of the strength of the observed effects.

      (12) The statement that "serial dependence is associated with sensory stimuli being perceived as more similar" appears inconsistent with much of the literature suggesting that these effects occur at post-perceptual stages (Barbosa et al., 2020; Bliss et al., 2017; Ceylan et al., 2021; Fischer et al., 2024; Fritsche et al., 2017; Sheehan & Serences, 2022).

      (13) If I understand correctly, the reproduction bias (i.e., serial dependence) is estimated on a small subset of the data (10%). Were the data analyzed by pooling across subjects?

      (14) I'm also not convinced that biases observed in forced-choice and reproduction tasks should be interpreted as arising from the same process or mechanism. Some of the effects described here could instead be consistent with classic priming.

    1. eLife Assessment

      The authors study how apolipoprotein L1 variants impact inflammation and lipid accumulation in macrophages. The findings will be useful for researchers investigating macrophage metabolism and inflammation. The discovery that the polyamine spermidine in part mediates such effects is interesting, but the supporting evidence for a physiologically relevant role is currently incomplete due to the lack of relevant in vivo studies.

    2. Reviewer #1 (Public review):

      Summary:

      Liu et al. investigated the mechanisms by which apolipoprotein L1 (APOL1) G1 and G2 variants cause inflammation and lipid accumulation in macrophages by bone-marrow-derived macrophages from transgenic mice and human iPS cells. Although these findings are not novel, this work provides solid evidence to prove enhanced inflammation and lipid accumulation in macrophages by APOL1 G1 and G2 variants by a variety of in vitro assays and metabolomics measurements. Further, metabolomics measurements identified that the spermidine synthesis pathway was altered by APOL1 G1 and G2 variants, and the polyamine inhibitor reversed the variants-induced phenotypes.

      Strengths:

      Their hypothesis and choice of experiments in each section were clear and mostly solid. Mitochondrial morphological quantification by transmission electron microscopy images was convincing. The authors confirmed APOL1 localization inside macrophages and built stories based on their findings. Showing relevant positive and negative findings in line with current knowledge of APOL1-variants-driven pathologies, such as cation flux, cGAS-STING pathways, indicates a good rigor.

      Weaknesses:

      Although most methods in this work were solid, the choice of ฮฑ-difluoromethylornithine (DFMO) as an inhibitor of spermidine synthesis was not direct. It was still unclear if DFMO was reversing the phenotypes by lowering spermidine levels. Seahorse assay results would have avoided potential variabilities in cell densities by normalization. Heatmaps showing RNA-seq results would be appreciated better with a clear description of how the color is defined and calculated.

    3. Reviewer #2 (Public review):

      Summary:

      The G1 and G2 variants of the Apolipoprotein L1 (APOL1) gene are well-established risk factors for chronic kidney disease. While macrophages have been implicated in the pathogenesis of APOL1-mediated kidney diseases (AMKD), the precise impact of the G1 and G2 APOL1 variants on macrophage function and the underlying molecular mechanisms remains insufficiently characterized. In this manuscript, the authors investigate pathological phenotypes in macrophages carrying the G1 and G2 APOL1 variants. They report an accumulation of neutral lipids and activation of pro-inflammatory pathways, which appear to be at least partly driven by an accumulation of the polyamine spermidine and upregulation of the spermidine synthesis pathway. These findings reveal a pro-inflammatory role for G1 and G2 APOL1 in macrophages and identify the spermidine synthesis pathway as a potential therapeutic target.

      Strengths:

      The authors employ a comprehensive set of approaches to characterize macrophage phenotypes, including assessments of lipid accumulation, pro-inflammatory cytokine release, responses to M2-polarizing cytokines, autophagy, mitochondrial function, and metabolic profiling. The reversal of pathological phenotypes in G1 and G2 APOL1 macrophages by the polyamine synthesis inhibitor ฮฑ-difluoromethylornithine provides compelling evidence supporting a causal role for spermidine in mediating APOL1 variant-associated dysfunction. Furthermore, the inclusion of both mouse and human models strengthens the translational relevance of the findings.

      Weaknesses:

      The manuscript would benefit from a clearer articulation of the specific role macrophages play in the pathogenesis of APOL1-associated kidney diseases to better emphasize the significance of the study. Additionally, the experimental design lacks a clear, logical progression, and the rationale behind some experiments is insufficiently justified, making certain conclusions difficult to fully support based on the presented data. Given the availability of established animal models of APOL1-associated kidney diseases, it is unclear why the authors chose to derive macrophages from the bone marrow of G1 and G2 APOL1 mice for in vitro assays rather than isolating and testing macrophages in vivo within these models. In vitro assays may exaggerate macrophage responses compared to physiological conditions, which could affect the interpretation of the data. Addressing this point would strengthen the manuscript.

    4. Reviewer #3 (Public review):

      Summary:

      Liu et al investigate the impact of G1 and G2 variants of the gene encoding Apolipoprotein L1 (APOL1) on macrophage inflammation. The authors have used bone marrow-derived macrophages and human induced pluripotent stem cell-derived macrophages as their model to identify altered immune signaling caused by G1 and G2 APOL1. The unbiased metabolite analysis indicates the possible involvement of altered polyamine metabolism in the regulation of inflammatory response in G1 and G2 macrophages. This study shows that targeting polyamine metabolism can limit macrophage inflammation and lipid accumulation in vitro conditions.

      Strengths:

      This study shows the importance of polyamine metabolism in the regulation of macrophage inflammatory response. The authors showed that spermidine synthesis is closely associated with altered macrophage functions with two risk-variant forms of APOL1 (G1 and G2). The altered macrophage lipid metabolism is known to be associated with macrophage dysfunction in G1 and G2 APOL1. However, the involvement of polyamine in the regulation of lipid accumulation and inflammation in macrophages in G1 and G2 variants is interesting and could be explored as a novel therapeutic approach for chronic inflammation.

      Weaknesses:

      The novelty of this study lies in the association of polyamine metabolism with lipid metabolism dysregulation in macrophages. The weakness of the manuscript is that insufficient experiments to support the claim of involvement of polyamine metabolism in the regulation of macrophage inflammation, which undermines the novelty of this study. The authors performed in vitro experiments targeting spermidine synthesis to show reduced inflammation and lipid accumulation, but have not performed any in vivo analysis of chronic kidney inflammation progression in G1 and G2 mice, which they have used to generate bone-marrow-derived macrophages. They have not shown any data that supports the specificity of DFMO in targeting spermidine synthesis.

    1. eLife Assessment

      This study presents a valuable finding of novel markers that may potentially identify resident tendon stem/progenitor cells (TSPCs). The study also presents a comprehensive single-cell transcriptional dataset that will be of value to the field. The evidence supporting the identification of novel markers of a TSPC is incomplete, requiring clarification of current analyses and additional validation experiments to demonstrate that these markers are indeed specific and these cells are indeed TSPCs. This work will be of interest to biologists and engineers focused on tendons and ligaments.

    2. Reviewer #1 (Public review):

      This study is focused on identifying unique, innovative surface markers for mature Achilles tendons by combining the latest multi-omics approaches and in vitro evaluation, which would address the knowledge gap of controversial identity of TPSCs with unspecific surface markers. The use of multi-omics technologies, in vivo characterization, in vitro standard assays of stem cells, and in vitro tissue formation is a strength of this work and could be applied for other stem cell quantification in the musculoskeletal research. The evaluation and identification of Cd55 and Cd248 in TPSCs have not been conducted in tendon, which is considered as innovative. Additionally, the study provided solid sequencing data to confirm co-expressions of Cd55 and Cd248 with other well-described surface markers such as Ly6a, Tpp3, Pdgfra, and Cd34. Generally, the data shown in the manuscript support the claims that the identified surface antigens mark TPSCs in juvenile tendons.

    3. Reviewer #2 (Public review):

      Summary:

      The molecular signature of tendon stem cells is not fully identified. The endogenous location of tendon stem cells within native tendon is also not fully elucidated. Several molecular markers have been identified to isolate tendon stem cells but they lack tendon specificity. Using the declining tendon repair capacity of mature mice, the authors compared the transcriptome landscape and activity of juvenile (2 weeks) and mature (6 weeks) tendon cells of mouse Achilles tendons and identified CD55 and CD248 as novel surface markers for tendon stem cells. CD55+ CD248+ FACS-sorted cells display a preferential tendency to differentiate into tendon cells compared to CD55neg CD248neg cells.

      Strengths:

      The authors generated a lot of data of juvenile and mature Achilles tendons, using scRNAseq, snRNAseq, ATACseq strategies. This constitutes a resource datasets.

      Weaknesses:

      The analyses and validation of identified genes are not complete and could be pushed further. The endogenous expression of newly-identified genes in native tendons would be informative. The comparison of scRNAseq and snRNAseq datasets for tendon cell populations would strengthen the identification of tendon cell populations.

    4. Author Response:

      The following is the authorsโ€™ response to the original reviews

      Reviewer #1 (Public review):ย 

      This study is focused on identifying unique, innovative surface markers for mature Achilles tendons by combining the latest multi-omics approaches and in vitro evaluation, which would address the knowledge gap of the controversial identity of TPSCs with unspecific surface markers. The use of multi-omics technologies, in vivo characterization, in vitro standard assays of stem cells, and in vitro tissue formation is a strength of this work and could be applied for other stem cell quantification in musculoskeletal research. The evaluation and identification of Cd55 and Cd248 in TPSCs have not been conducted in tendons, which is considered innovative. Additionally, the study provided solid sequencing data to confirm co-expressions of Cd55 and Cd248 with other well-described surface markers such as Ly6a, Tpp3, Pdgfra, and Cd34. Generally, the data shown in the manuscript support the claims that the identified surface antigens mark TPSCs in juvenile tendons.

      However, there are missing links between scientific questions aimed to be addressed in Introduction and Methodology/Results. If the study focuses on unsatisfactory healing responses of mature tendons and understanding of mature TPSCs, at least mature Achilles tendons from more than 12-week-old mice and their comparison with tendons from juvenile/neonatal mice should be conducted. However, either 2-week or 6-weekold mice, used for characterization here, are not skeletally mature, Additionally, there is a lack of complete comparison of TPSCs between 2-week and 6-week-old mice in the transcriptional and epigenetic levels.

      In order to distinguish TPSCs and characterize their epigenetic activities, the authors used scRNA-seq, snRNA-seq, and snATAC-seq approaches. The integration, analysis, and comparison of sequencing data across assays and/or time points is confusing and incomplete. For example, it should be more comprehensive to integrate both scRNA-seq and snRNA-seq data (if not, why both assays were used for Achilles tendons of both 2-week and 6-week timepoints). snRNA-seq and snATAC-seq data of 6-week-old mice were separately analyzed. No comparison of difference and similarity of TPSCs of 2-week and 6-week-old mice was conducted.

      Given the goal of this work to identify specific TPSC markers, the specificity of Cd55 and Cd248 for TPSCs is not clear. First, based on the data shown here, Cd55 and Cd248 mark the same cell population which is identified by Ly6a, TPPP3, and Pdgfra. Although, for instance, Cd34 is expressed by other tissues as discussed here, no data/evidence is provided by this work showing that Cd55 and Cd248 are not expressed by other musculoskeletal tissues/cells. Second, the immunostaining of Cd55 and Cd248 doesn't support their specificity. What is the advantage of using Cd55 and Cd248 for TPSCs compared to using other markers?

      Reviewer #2 (Public review):ย 

      Summary:ย 

      The molecular signature of tendon stem cells is not fully identified. The endogenous location of tendon stem cells within the native tendon is also not fully elucidated. Several molecular markers have been identified to isolate tendon stem cells but they lack tendon specificity. Using the declining tendon repair capacity of mature mice, the authors compared the transcriptome landscape and activity of juvenile (2 weeks) and mature (6 weeks) tendon cells of mouse Achilles tendons and identified CD55 and CD248 as novel surface markers for tendon stem cells. CD55+ CD248+ FACS-sorted cells display a preferential tendency to differentiate into tendon cells compared to CD55neg CD248neg cells.

      Strengths:ย 

      The authors generated a lot of data on juvenile and mature Achilles tendons, using scRNAseq, snRNAseq, and ATACseq strategies. This constitutes a resource dataset.

      Weaknesses:ย 

      The analyses and validation of identified genes are not complete and could be pushed further. The endogenous expression of newly identified genes in native tendons would be informative. The comparison of scRNAseq and snRNAseq datasets for tendon cell populations would strengthen the identification of tendon cell populations.ย 

      Reviewer #3 (Public review):ย 

      Summary:ย 

      In their report, Tsutsumi et al., use single nucleus transcriptional and chromatin accessibility analyses of mouse achilles tendon in an attempt to uncover new markers of tendon stem/progenitor cells. They propose CD55 and CD248 as novel markers of tendon stem/progenitor cells.ย 

      Strengths:ย 

      This is an interesting and important research area. The paper is overall well written.

      Weaknesses:ย 

      Major problems:ย 

      (1) It is not clear what tissue exactly is being analyzed. The authors build a story on tendons, but there is little description of the dissection. The authors claim to detect MTJ and cartilage cells, but not bone or muscle cells. The tendon sheath is known to express CD55, so the population of "progenitors" may not be of tendon origin.

      (2) Cluster annotations are seemingly done with a single gene. Names are given to cells without functional or spatial validation. For example, MTJ cells are annotated based on Postn, but it is never shown that Postn is only expressed at the MTJ, and not in other anatomical locations in the tendon.ย 

      (3) The authors compare their data to public data based on interrogating single genes in their dataset. It is now standard practice to integrate datasets (eg, using harmony), or at a minimum using gene signatures built into Seurat (eg AddModuleScore).

      (4) Progenitor populations (SP1, SP2). The authors claim these are progenitors but show very clearly that they express macrophage genes. What are they, macrophages or fibroblasts?

      (5) All omics analysis is done on single data points (from many mice pooled). The authors make many claims on n=1 per group for readouts dependent on sample number (eg frequency of clusters).

      (6) The scRNAseq atlas in Figure 1 is made by analyzing 2W and 6W tendons at the same time. The snRNAseq and ATACseq atlas are built first on 2W data, after which the 6W data is compared. Why use the 2W data as a reference?

      Why not analyze the two-time points together as done with the scRNAseq?ย 

      (7) Figure 5: The authors should show the gating strategy for FACS. Were non-fibroblasts excluded (eg, immune cells, endothelia...etc). Was a dead cell marker used? If not, it is not surprising that fibroblasts form colonies and express fibroblast genes when compared to CD55-CD248- immune cells, dead cells, or debris. Can control genes such as Ptprc or Pecam1 be tested to rule out contamination with other cell types?

      Minor problems:ย 

      (1) Report the important tissue processing details: type of collagenase used. Viability before loading into 10x machine.

      Reviewer #1 (Recommendations for the authors):ย 

      (1)ย Better healing responses in neonatal mice than mature mice have been well appreciated in the field and differences in ECM environment, immune responses, and cell function might account for varied injury results. However, direct evidence/data between better healing and abundant TSPCs needs to be discussed in the Introduction.ย 

      We agree with this insightful comment. We have now enhanced our introduction to include a more direct discussion of the relationship between better healing responses in neonatal mice and the abundance of TSPCs. We specifically highlighted how Howell et al. (2017) demonstrated that tendons in juvenile mice can regenerate functional tissue after injury, while this ability is lost in mature mice. Based on this observation, we articulated our hypothesis that juvenile mouse tendons likely contain abundant TSPCs, which potentially explains their superior healing capacity. Additionally, we have added a statement emphasizing that "investigating TSPCs biology is important for understanding tendon regeneration and homeostasis" (lines 61-62), which clearly articulates the central role that TSPCs play in tendon repair processes and tissue maintenance.

      (2)ย 6-week-old mouse Achilles tendons are not mature enough and clinically relevant to understand the deficiency of regenerative capacity of TPSCs for undesired healing. If the goal of this study is to identify TSPCs of mature tendons, evaluation of Achilles tendons from at least 12-week-old mice is more reasonable.ย 

      We agree with this insightful comment. We have now enhanced our introduction to include a more direct discussion of the relationship between better healing responses in neonatal mice and the abundance of TSPCs. We specifically highlighted how Howell et al. (2017) demonstrated that tendons in juvenile mice can regenerate functional tissue after injury, while this ability is lost in mature mice. Based on this observation, we articulated our hypothesis that juvenile mouse tendons likely contain abundant TSPCs, which potentially explains their superior healing capacity. Additionally, we have added a statement emphasizing that "investigating TSPCs biology is important for understanding tendon regeneration and homeostasis" (lines 61-62), which clearly articulates the central role that TSPCs play in tendon repair processes and tissue maintenance.

      (3)ย 40-60 mouse Achilles tendons pooled for one sample seems a lot and there is mixed/missed information about how many total cells were collected for each sample and how they were used for different sequencing assays. This could raise the concern that cell digestion was not complete and possibly abundant resident cells might be missed for sequencing analysis.

      We agree with this insightful comment. We have now enhanced our introduction to include a more direct discussion of the relationship between better healing responses in neonatal mice and the abundance of TSPCs. We specifically highlighted how Howell et al. (2017) demonstrated that tendons in juvenile mice can regenerate functional tissue after injury, while this ability is lost in mature mice. Based on this observation, we articulated our hypothesis that juvenile mouse tendons likely contain abundant TSPCs, which potentially explains their superior healing capacity. Additionally, we have added a statement emphasizing that "investigating TSPCs biology is important for understanding tendon regeneration and homeostasis" (lines 61-62), which clearly articulates the central role that TSPCs play in tendon repair processes and tissue maintenance.

      (4)ย The methods section has necessary information missing, which could create confusion for readers. Which time points are used for scRNA-seq and snATAC-seq? Which time points of cells are integrated and analyzed regarding each assay/combined assays? Why is transcriptional expression evaluated by both scRNA-seq and snRNA-seq and is there any technological difference between the two assays?

      We have thoroughly revised the Methods section to clearly specify which time points were used for each assay (line 132-133 and line 148-149). We have also clarified how cells from different time points were integrated and analyzed (lines 167-170, 179-184 and 494-502). Regarding the use of both scRNA-seq and snRNA-seq, we have explained that this complementary approach allowed us to capture both cytoplasmic and nuclear transcripts, providing a more comprehensive view of gene expression profiles while also enabling direct integration with snATAC-seq data. Comparison of similarity between scRNA-seq integration data (2-week and 6-week) and snRNA-seq (2-week) clusters confirmed that the clusters in each data set are almost correlated. We added the dot plot and correlation data in supplemental figure 5. Additionally, we have included comprehensive lists of differentially expressed genes (DEGs) for each identified cluster across all datasets (supplementary tables 1-15), which provide detailed molecular signatures for each cell population and facilitate cross-dataset comparisons.

      (5)ย snATAC-sequencing data seems to be used to only confirm the findings by snRNA-seq and snATAC-sequencing data is not well explored. This assay directly measures/predicts transcription factor activities and epigenetic changes, which might be more accurate in inferring transcription factors from RNA sequencing data using the R package SCENIC.

      We appreciate the reviewer's insightful comment regarding the utilization of our snATAC-seq data. We agree that snATAC-seq provides valuable direct measurements of chromatin accessibility and transcription factor binding sites that can complement inference-based approaches like SCENIC. To address this concern, we have revised our manuscript to better emphasize the value of our snATAC-seq data in transcription factor activity evaluation. We have modified our text (lines 570-574). This modification emphasizes that our integrated approach leverages the strengths of both methodologies, with snATAC-seq providing direct measurements of chromatin accessibility and transcription factor binding sites that can validate and enhance the inference-based predictions from SCENIC analysis of RNA-seq data.

      (6)ย The image quality of immunostaining of Cd55 and Cd248 is low. The images show that only part of the tendon sheath has positive staining. Co-localization of Cd55 and Cd248 can't be found.

      We agree with the reviewer regarding the limitations of our immunostaining images. To obtain clearer images, we used paraffin sections for our analysis. Additionally, the antibodies for CD55 and CD248 required different antigen retrieval conditions to work effectively, which unfortunately prevented us from performing co-immunostaining to directly demonstrate co-localization. Despite these technical limitations, we have optimized the processing and imaging parameters to improve the quality of the immunostaining images in Figure 5A. These improved images more clearly demonstrate the expression of CD55 and CD248 in the tendon sheath, although in separate sections. The consistent localization patterns observed in these separate stainings, together with our FACS and functional analyses of double-positive cells, strongly support their co-expression in the same cell population. We have also updated the corresponding Methods section (lines 260-272) to include these optimized immunostaining protocols for better reproducibility.

      (7)ย Only TEM data of tendon construct formed by sorted cells are shown. Results of mechanical tests will be super helpful to show the capacity of these TPSCs for tendon assembly.

      We appreciate the reviewer's suggestion regarding mechanical testing. We would like to direct the reviewer's attention to Figure 5I in our manuscript, where we have already included tensile strength measurements of the tendon construct. These mechanical test results demonstrate the functional capacity of CD55/CD248+ cells to form tendon-like tissue with appropriate mechanical properties, providing quantitative evidence of their ability for tendon assembly.

      (8)ย Cells negative for CD55/CD248 could be mixed cell populations, including hematopoietic lineages, cells from tendon mid substance, immune cells, and/or endothelial cells. Under induction of tri-lineage media, these mixed cell populations could process different, unpredicted phenotypes (shown by no increased gene expression of tenogenic, chondrogenic, and osteogenic markers after induction). Higher tenogenic gene expressions of TPSCs after induction don't mean that TPSCs are induced into tenocytes if compared to unknown cell populations with/without similar induction. Additionally, PCR data in Figure 5 presented as ฮ”ฮ”CT, with unclear biological meanings, is challenging to interpret.

      We appreciate the reviewer's suggestion regarding mechanical testing. We would like to direct the reviewer's attention to Figure 5I in our manuscript, where we have already included tensile strength measurements of the tendon construct. These mechanical test results demonstrate the functional capacity of CD55/CD248+ cells to form tendon-like tissue with appropriate mechanical properties, providing quantitative evidence of their ability for tendon assembly.

      Reviewer #2 (Recommendations for the authors):ย 

      The aim of this study was to identify novel markers for tendon stem cells. The authors used the fact that tendon cells of juvenile tendons have a greater ability to regenerate versus mature tendons. scRNAseq, snRNAseq, and snATACseq datasets were generated and analyzed in juvenile and mature Achilles tendons (mice).ย 

      The authors generated a lot of data that could be exploited further to show that these two novel surface tendon markers are more tendon-specific than those previously identified. Another concern is that there is no robust data indicative of the endogenous location of CD55+ CD248+ cells in the native tendon. Same comments for the transcription factors regulating the transcription of CD55 and CD248 and that of Scx and Mkx. A validation of the ATACseq data with a location in native tendons would be pertinent.

      The analysis was performed by comparing 2 sub-clusters of the same datasets and not between the two stages. Given the introduction highlighting the differential ability to regenerate between the two stages, the comparison between the two stages was somehow expected. I wonder if there is an explanation for the absence of analysis between the two stages.

      The authors have all the datasets to (bioinformatically) compare scRNAseq and snRNAseq datasets. This comparative analysis would strengthen the clustering of tendon cell populations at both stages. The labeling/identification of clusters associated with tendon cell populations is not obvious. I am surprised that there is no tendon sheath cluster such as endotenon or peritenon. A discussion on the different tendon cell populations (tendon clusters) is lacking.

      (1) Choice of the three markersย 

      The authors chose three genes known to be markers for tendon stem cells, Tppp3, PdgfRa, and Ly6a, and investigated clusters (or subclusters) that co-express these three genes. Except for Tppp3, the other two genes lack tendonspecificity. Ly6a is a stem cell marker and is recognized to be a marker of epi/perimysium in fetal and perinatal stages in mouse limbs (PMID: 39636726). Pdgfra is a generic marker of all connective tissue fibroblasts. Could it be that the identification of the two novel surface markers was biased with this choice? The identification of CD55 and CD248 has been done by comparing DEGs between cluster 4 (SP2) and cluster 1 (SP1). What about an unbiased comparison of both clusters 4 and 1 (or individual clusters) between mature and juvenile samples? The reader expected such a comparison since it was introduced as the rationale of the paper to compare juvenile and mature tendon cells.

      We selected Tppp3, PdgfRa, and Ly6a based on established literature identifying them as TSPC markers (Harvey et al., 2019; Tachibana et al., 2022). While only Tppp3 has tendon specificity, these genes collectively represent reliable TSPC markers currently available.

      Our identification of CD55 and CD248 came from comparing SP2 and SP1 clusters that showed these three markers plus tendon development genes. We did compare juvenile and mature samples as shown in Figure 1G, revealing decreased stem/progenitor marker expression with maturation. Additionally, we performed a comprehensive comparison between 2-week and 6-week samples visualized as a heatmap in Supplemental Figure 3, which clearly demonstrates the transcriptional changes that occur during tendon maturation. We have also provided the complete lists of differentially expressed genes for each identified cluster

      (supplementary tables 1-15), allowing for unbiased examination of cluster-specific gene signatures across developmental stages.

      Our functional validation confirmed CD55/CD248 positive cells express Tppp3, PdgfRa, and Ly6a while demonstrating high clonogenicity and tenogenic differentiation capacity, confirming their TSPC identity.

      (2) Concerns with cluster identificationย 

      The cluster11, named as MTJ cluster, in 2-week scRNAseq datasets was not detected in 6-week scRNAseq datasets (Figure 1A). Does it mean that MTJ disappears at 6 weeks in Achilles tendons? In the snRNAseq MTJ cluster was defined on the basis of Postn expression. ยซCluster 11, with high Periostin (Postn) expression, was classified as a myotendinous junction (MTJ).ยป Line 379.

      What is the basis/reference to set a link between Postn and MTJ?ย 

      Could the CA clusters be enthesis clusters? Is there any cartilage in the Achilles tendon?

      If there are MTJ clusters, one could expect to see clusters reflecting tendon attachment to cartilage/bone.

      I am surprised to see no cluster reflecting tendon attachments (endotenon or peritenon).

      Cluster 9 was identified as a proliferating cluster in scRNAseq datasets. Does the Cell Cycle Regression step have been performed?

      Thank you for highlighting these important questions about our cluster identification. The MTJ cluster (cluster 11) appears reduced but not absent in 6-week samples. We based our MTJ classification on Postn expression, which is enriched at the myotendinous junction, as documented by Jacobson et al. (2020) in their proteome analysis of myotendinous junctions. We have added this reference to the manuscript to provide clear support for our cluster annotation (lines 400-401).

      Regarding the CA cluster, these cells express chondrogenic markers but are not enthesis clusters. We have revised our manuscript to acknowledge that these could potentially represent enthesis cells, as you suggested (lines 412-414). While Achilles tendons themselves don't contain cartilage, our digestion process likely captured some adjacent cartilaginous tissues from the calcaneus insertion site.

      We acknowledge the absence of clearly defined endotenon/epitenon clusters. We have added more comprehensive explanations about peritenon tissues in our manuscript (lines 431-433 and 584-585), noting that previous studies (Harvey et al., 2019) have reported that Tppp3-positive populations are localized to the peritenon, and our SP clusters might also reflect peritenon-derived cells. This additional context helps clarify the potential tissue origins of our identified cell populations.

      For the proliferating cluster (cluster 9), we confirmed high expression of cell cycle markers (Mki67, Stmn1) but did not perform cell cycle regression to maintain biological relevance of proliferation status in our analysis. We have clarified this methodological decision in the revised Methods section.

      (3)ย What is the meaning of all these tendon clusters in scRNAseq snRNAseq and snATACseq? The authors described 2 or 3 SP clusters (depending on the scRNAseq or snRNAseq datasets), 2 CT clusters, 1 MTJ cluster, and 1CA cluster. Do genes with enriched expression in these different clusters correspond to different anatomical locations in native tendons? Are there endotenon and peritenon clusters? Is there a correlation between clusters (or subclusters) expressing stem cell markers and peritenon as described for Tppp3

      Thank you for this important question about the biological significance of our identified clusters. The multiple tendon-related clusters we identified likely represent distinct cellular states and differentiation stages rather than strictly discrete anatomical locations. The SP clusters (stem/progenitor cells) express markers consistent with tendon progenitors reported in the literature, including Tppp3, which has been described in the peritenon. As we mentioned in our response to the previous question, we have added more comprehensive explanations about peritenon tissues in our manuscript (Lines 432-433 and 584-585), noting that previous studies (Harvey et al., 2019) have reported that Tppp3-positive populations are localized to the peritenon, and our SP clusters might reflect peritenon-derived cells. Our immunohistochemistry data in Figure 5A further confirms that CD55/CD248 positive cells are localized primarily to the tendon sheath region, similar to the localization pattern of Tppp3 reported by Harvey et al. (2019). The tenocyte clusters (TC) represent mature tendon cells within the fascicles, and their distinct transcriptional profiles suggest heterogeneity even within mature tenocytes. The MTJ cluster specifically expresses genes enriched at the myotendinous junction, while the CA cluster likely represents cells from the enthesis region, as you suggested. In the revised manuscript, we have clarified this interpretation and added additional discussion about the relationship between cluster identity and anatomical localization, particularly regarding the SP clusters and their correlation with peritenon regions.

      (4)ย The use of single-cell and single-nuclei RNAseq strategies to analyze tendon cell populations in juvenile and mature tendons is powerful, but the authors do not exploit these double analyses. A comparison between scRNAseq and snRNAseq datasets (2 weeks and 6 weeks) is missing. The similar or different features at the level of the clustering or at the level of gene expression should be explained/shown and discussed. This analysis should strengthen the clustering of tendon cell populations at both stages. In the same line, why are there 3 SP clusters in snRNAseq versus 2 SP clusters in scRNAseq? The MTJ cluster R2-5 expressing Sox9 should be discussed.

      Thank you for highlighting this important gap. We have conducted a comprehensive comparison between scRNA-seq and snRNA-seq datasets, revealing substantial correlation between cell populations identified by both methodologies. We've added a detailed dot plot visualization and correlation heatmap in Supplemental Figure 5 that demonstrates the relationships between clusters across datasets. The additional SP cluster in snRNA-seq likely reflects the greater sensitivity of nuclear RNA sequencing in capturing certain cell states that might be missed during whole-cell isolation. Our analysis shows this SP3 cluster represents a transitional state between stem/progenitor cells and differentiating tenocytes. Regarding the Sox9-expressing MTJ cluster R2-5, we have expanded our discussion in the revised manuscript (lines 500502) to address this finding, incorporating relevant references (Nagakura et al., 2020) that describe Sox9 expression at the myotendinous junction. This expression pattern suggests that cells at this specialized interface may maintain developmental plasticity between tendon and cartilage fates, which is consistent with the transitional nature of this anatomical region.

      (5) The claim of "high expression of CD55 and CD248 in the tendon sheath" is not supported by the experiments. The images of immunostaining (Figure 5A) are not very convincing. It is not explained if these are sections of 3Dtendon constructs or native tendons. The expression in 3D-tendon constructs is not informative, since tendon sheaths are not present. The endogenous expression of the transcription factors regulating tendon gene expression would be informative to localize tendon stem cells in native tendons.

      Thank you for this important critique. We agree that the original immunostaining images were not sufficiently convincing. To address this, we have used paraffin sections and optimized our staining protocols to improve image quality. It's worth noting that CD55 and CD248 antibodies required different antigen retrieval conditions to work effectively, which unfortunately prevented us from performing coimmunostaining to directly demonstrate co-localization in the same section. Despite these technical limitations, we have significantly improved the quality of the immunostaining images in Figure 5A with enhanced processing and imaging parametersย 

      The improved images more clearly demonstrate the preferential expression of CD55 and CD248 in the tendon sheath/peritenon regions. The consistent localization patterns observed in these separate stainings, together with our FACS and functional analyses of double-positive cells, strongly support their coexpression in the same cell population.

      In the revised manuscript, we have also improved the figure legends to clearly indicate the nature of the tissue samples and updated the methods section to provide more detailed protocols for the immunostaining procedures used.

      Your suggestion regarding transcription factor visualization is valuable. While beyond the scope of our current study, we agree that examining the endogenous expression of regulatory transcription factors like Klf3 and Klf4 would provide additional insights into tendon stem cell localization in native tendons, and we plan to pursue this in future work

      Minor concerns:

      (1) Lines 392-397 ยซ To identify progenitor populations within these clusters, we analyzed expression patterns of previously reported markers Tppp3 and Pdgfra (Harvey et al., 2019; Tachibana, et al., 2022), along with the known stem/progenitor cell marker Ly6a (Holmes et al., 2007; Sung et al., 2008; Hittinger et al., 2013; Sidney et al., 2014; Fang et al., 2022). We identified subclusters within clusters 1 and 4 showing high expression of these genes, which we defined as SP1 and SP2. SP2 exhibited the highest expression of these genes, suggesting it had the strongest progenitor characteristics.ยป Please cite relevant Figures. Feature and violin plots (scRNAseq) across all cells (not for the only 2 SP1 and SP2 clusters) of Tppp3, Pdgfra and Ly6a are missing.

      Thank you for pointing out this important oversight. We have modified the manuscript to clarify that the text in question describes Figure 1B. Additionally, we have added new feature plots showing the expression of Tppp3, Pdgfra, and Ly6a across all cells in supplymental figure 1B

      (2) The labeling of clusters with numbers in single-cell, single nuclei RNAseq, and ATACseq is difficult to follow.

      We appreciate your feedback on this issue. We recognize that the numerical labeling system across different datasets (scRNA-seq, snRNA-seq, and snATAC-seq) makes it difficult to track the same cell populations. To address this, we have added Supplemental Figure 5, which clearly shows the correspondence between cell populations in single-cell and single-nucleus RNA-seq datasets.

      (3)ย Figure 1C. It is not clear from the text and Figure legend if the DEGs are for the merged 2 and 6 weeks. If yes, an UMAP of the merged datasets of 2 and 6 weeks would be useful.

      We appreciate your feedback on this issue. We recognize that the numerical labeling system across different datasets (scRNA-seq, snRNA-seq, and snATAC-seq) makes it difficult to track the same cell populations. To address this, we have added Supplemental Figure 5, which clearly shows the correspondence between cell populations in single-cell and single-nucleus RNA-seq datasets.

      (4)ย Along the Text, there are a few sentences with obscure rationale. Here are a few examples (not exhaustive):

      Abstractย 

      โ€œCombining single-nucleus ATAC and RNA sequencing analyses revealed that Cd55 and Cd248 positive fractions in tendon tissue are TSPCs, with this population decreasing at 6 weeks.โ€

      The rationale of this sentence is not clear. How can single-nucleus ATAC and RNA sequencing analyses identify Cd55 and Cd248 positive fractions as tendon stem cells?

      Thank you for highlighting this unclear statement in our abstract. We agree that the previous wording did not adequately explain how our sequencing analyses identified CD55 and CD248 positive cells as TSPCs. We have revised this sentence to clarify that our multi-modal approach (combining scRNA-seq, snRNA-seq, and snATAC-seq) enabled us to identify Cd55 and Cd248 positive populations as TSPCs based on their co-expression with established TSPC markers such as Tppp3, Pdgfra, and Ly6a. This comprehensive analysis across different sequencing modalities provided strong evidence for their identity as tendon stem/progenitor cells, which we further validated through functional assays. The revised abstract now more clearly communicates the logical progression of our analysis and findings

      Line 80-82ย 

      โ€œCd34 is known to be highly expressed in mouse embryonic limb buds at E14.5 compared to E11.5 (Havis et al., 2014), making it a potential marker for TSPCs.โ€

      The rationale of this sentence is not clear. How can "the fact to be expressed in E14.5 mouse limbs" be an indicator of being a "potential marker of tendon stem cells"?

      Thank you for highlighting this unclear statement in our abstract. We agree that the previous wording did not adequately explain how our sequencing analyses identified CD55 and CD248 positive cells as TSPCs. We have revised this sentence to clarify that our multi-modal approach (combining scRNA-seq, snRNA-seq, and snATAC-seq) enabled us to identify Cd55 and Cd248 positive populations as TSPCs based on their co-expression with established TSPC markers such as Tppp3, Pdgfra, and Ly6a. This comprehensive analysis across different sequencing modalities provided strong evidence for their identity as tendon stem/progenitor cells, which we further validated through functional assays. The revised abstract now more clearly communicates the logical progression of our analysis and findings

      Line 611ย 

      โ€œRecent reports have highlighted the role of the Klf family in limb development (Kult et al., 2021), suggesting its potential importance in tendon differentiationโ€

      Why does the "role of Klf family in limb development" suggest an "importance in tendon differentiation"?

      Thank you for highlighting this logical gap in our manuscript. You're right that involvement in limb development doesn't necessarily indicate specific importance in tendon differentiation. We've revised this statement to more accurately reflect current knowledge, noting that while Klf factors are involved in limb development, their specific role in tendon differentiation requires further investigation (lines 658-659). This revised text better aligns with our findings of Klf3 and Klf4 expression in tendon progenitor cells without making unsupported claims about their functional significance

      Reviewer #3 (Recommendations for the authors):ย 

      In addition to the points highlighted above some additional points are listed below.

      (1)ย Case in point: the authors claim CD55 and CD248 are found at the tendon sheath (line 541), which is not part of the tendon proper (although the IHC seems to show green in the epi/endotenon).

      Thank you for highlighting this logical gap in our manuscript. You're right that involvement in limb development doesn't necessarily indicate specific importance in tendon differentiation. We've revised this statement to more accurately reflect current knowledge, noting that while Klf factors are involved in limb development, their specific role in tendon differentiation requires further investigation (lines 658-659). This revised text better aligns with our findings of Klf3 and Klf4 expression in tendon progenitor cells without making unsupported claims about their functional significance

      (2) All cell types seem to express collagen based on Figure 1B, so either there is serious background contamination (eg, ambient RNA), or an error in data analysis.

      Thank you for highlighting this logical gap in our manuscript. You're right that involvement in limb development doesn't necessarily indicate specific importance in tendon differentiation. We've revised this statement to more accurately reflect current knowledge, noting that while Klf factors are involved in limb development, their specific role in tendon differentiation requires further investigation (lines 658-659). This revised text better aligns with our findings of Klf3 and Klf4 expression in tendon progenitor cells without making unsupported claims about their functional significance

      Minor problems:ย 

      (1) The figures are confusingly formatted. It is hard to go between cluster numbers and names. Clusters of similar cell types (eg progenitors) are not grouped to facilitate comparison, as ordering is based on cluster number).

      Thank you for highlighting this logical gap in our manuscript. You're right that involvement in limb development doesn't necessarily indicate specific importance in tendon differentiation. We've revised this statement to more accurately reflect current knowledge, noting that while Klf factors are involved in limb development, their specific role in tendon differentiation requires further investigation (lines 658-659). This revised text better aligns with our findings of Klf3 and Klf4 expression in tendon progenitor cells without making unsupported claims about their functional significance

      (2) The introduction does not distinguish between findings in mice and man. A lot of confusion in the tendon literature probably arises from interspecies differences, which are rarely addressed.ย 

      We appreciate this important point about species distinctions. We have revised our introduction to clearly identify species-specific findings by adding the term "murine" before TSPC references when discussing mouse studies (lines 64, 66, 70, 75, 100, and 108). We agree that interspecies differences are important considerations in tendon biology research, particularly when translating findings between animal models and humans. Our study focuses specifically on mouse models, and we have been careful not to overgeneralize our conclusions to human tendon biology without appropriate evidence. This clarification helps readers better contextualize our findings within the broader tendon literature landscape.

    1. eLife Assessment

      This study reanalyzed previously published scRNA-seq and TCR-seq data to examine the proportion and characteristics of dual-TCR-expressing Treg cells in mice, presenting some useful insights into TCR diversity and immune regulation. However, the evidence is incomplete, particularly with respect to data interpretation, statistical rigor, and the functionality of dual -TCR Treg cells. The study is potentially of interest to immunologists studying T-cell biology.

    2. Reviewer #2 (Public review):

      Summary:

      The manuscript, by Xu and Peng, et al. investigates whether co-expression of 2 T cell receptor (TCR) clonotypes can be detected in FoxP3+ regulatory CD4+ T cells (Tregs) and if it is associated with identifiable phenotypic effects. This paper presents data reanalyzing publicly available single-cell TCR sequencing and transcriptional analysis, convincingly demonstrating that dual TCR co-expression can be detected in Tregs, both in peripheral circulation as well as among Tregs in tissues. They then compare metrics of TCR diversity between single-TCR and dual TCR Tregs, as well as between Tregs in different anatomic compartments, finding the TCR repertoires to be generally similar though with dual TCR Tregs exhibiting a less diverse repertoire and some moderate differences in clonal expansion in different anatomic compartments. Finally, they examine the transcriptional profile of dual TCR Tregs in these datasets, finding some potential differences in expression of key Treg genes such as Foxp3, CTLA4, Foxo3, Foxo1, CD27, IL2RA, and Ikzf2 associated with dual TCR-expressing Tregs, which the authors postulate implies a potential functional benefit for dual TCR expression in Tregs.

      Strengths:

      This report examines an interesting and potentially biologically significant question, given recent demonstrations that dual TCR co-expression is a much more common phenomenon than previously appreciated (approximately 15-20% of T cells) and that dual TCR co-expression has been associated with significant effects on the thymic development and antigenic reactivity of T cells. This investigation leverages large existing datasets of single-cell TCRseq/RNAseq to address dual TCR expression in Tregs. The identification and characterization of dual TCR Tregs is rigorously demonstrated and presented, providing convincing new evidence of their existence.

      Weaknesses:

      The existence of dual TCR expression by Tregs has previously been demonstrated in mice and humans, limiting the novelty of the reported findings. The presented results should be considered in the context of these prior important findings. The focus on self-citation of their previous work, using the same approach to measure dual TCR expression in other datasets. limits the discussion of other more relevant and impactful published research in this area. Also, Reference #7 continues to list incorrect authors. The authors do not present a balanced or representative description of the available knowledge about either dual TCR expression by T cells or TCR repertoires of Tregs.

      The approach used follows a template used previously by this group for re-analysis of existing datasets generated by other research groups. The descriptions and interpretations of the data as presented are still shallow, lacking innovative or thoughtful approaches that would potentially be innovation or provide new insight.

      This demonstration of dual TCR Tregs is notable, though the authors do not compare the frequency of dual TCR co-expression by Tregs with non-Tregs. This limits interpreting the findings in the context of what is known about dual TCR co-expression in T cells. The response to this criticism in a previous review is considered non-responsive and does not improve the data or findings.

      Comparison of gene expression by single- and dual TCR Tregs is of interest, but as presented is difficult to interpret. The interpretations of the gene expression analyses are somewhat simplistic, focusing on single-gene expression of some genes known to have function in Tregs. However, the investigators continue to miss an opportunity to examine larger patterns of coordinated gene expression associated with developmental pathways and differential function in Tregs (Yang. 2015. Science. 348:589; Li. 2016. Nat Rev Immunol. Wyss. 2016. 16:220; Nat Immunol. 17:1093; Zenmour. 2018. Nat Immunol. 19:291). No attempt to define clusters is made. No comparison is made of the proportions of dual TCR cells in transcriptionally-defined clusters. The broad assessment of key genes by single- and dual TCR cells is conceptually interesting, but likely to be confounded by the heterogeneity of the Treg populations. This would need to be addressed and considered to make any analyses meaningful.

      The study design, re-analysis of existing datasets generated by other scientific groups, precludes confirmation of any findings by orthogonal analyses.

    3. Reviewer #3 (Public review):

      Summary:

      This study addressed the TCR pairing types and CDR3 characteristics of Treg cells. By analyzing scRNA and TCR-seq data, it claims that 10-20% of dual TCR Treg cells exist in mouse lymphoid and non-lymphoid tissues and suggests that dual TCR Treg cells in different tissues may play complex biological functions.

      Strengths:

      The study addresses an interesting question of how dual-TCR-expressing Treg cells play roles in tissues.

      Weaknesses:

      This study is inadequate, particularly regarding data interpretation, statistical rigor, and the discussion of the functional significance of Dual TCR Tregs.

      Comments on revisions:

      Although the authors have provided brief explanations in response to the reviewers' comments, they do not present any additional analyses that would address the fundamental concerns in a convincing manner.<br /> Moreover, the in silico analyses presented in the manuscript alone are insufficient to support the conclusions, and the functional experiments requested by the reviewers have not been conducted.

      In the current rebuttal, while some textual additions have been made to the manuscript, the only substantial revision to the figures appears to be the inclusion of statistical significance annotations (e.g., Fig. 1G, Fig. 3G). These changes do not adequately strengthen the overall data or address the core issues raised.

    4. Author Response:

      The following is the authorsโ€™ response to the original reviews.

      Reviewer #1 (Public Review):ย 

      (1) The use of single-cell RNA and TCR sequencing is appropriate for addressing potential relationships between gene expression and dual TCR.

      Thank you for your detailed review and suggestions. The main advantages of scRNA+TCR-seq are as follows: (1) It enables comparative analysis of features such as the ratio of single TCR paired T cells to dual TCR paired T cells at the level of a large number of individual T cells, through mRNA expression of the ฮฑ and ฮฒ chains. In the past, this analysis was limited to a small number of T cells, requiring isolation of single T cells, PCR amplification of the ฮฑ and ฮฒ chains, and Sanger sequencing; (2) While analyzing TCR paired T cell characteristics, it also allows examination of mRNA expression levels of transcription factors in corresponding T cells through scRNA-seq.

      (2) The data confirm the presence of dual TCR Tregs in various tissues, with proportions ranging from 10.1% to 21.4%, aligning with earlier observations in ฮฑฮฒ T cells.

      Thank you very much for your detailed review and suggestions. Early studies on dual TCR ฮฑฮฒ T cells have been very limited in number, with reported proportions of dual TCR T cells ranging widely from 0.1% to over 30%. In contrast, scRNA+TCR-seq can monitor over 5,000 single and paired TCRs, including dual paired TCRs, in each sample, enabling more precise examination of the overall proportion of dual TCR ฮฑฮฒ T cells. It is important to note that our analysis focuses on T cells paired with functional ฮฑ and ฮฒ chains, while T cells with non-functional chain pairings and those with a single functional chain without pairing were excluded from the total cell proportion analysis. Previous studies generally lacked the ability to determine expression levels of specific chains in T cells without dual TCR pairings.

      (3) Tissue-specific patterns of TCR gene usage are reported, which could be of interest to researchers studying T cell adaptation, although these were more rigorously analyzed in the original works.

      Thank you very much for your detailed review and suggestions. T cell subpopulations exhibit tissue specificity; thus, we conducted a thorough investigation into Treg cells from different tissue sites. This study builds upon the original by innovatively analyzing the differences in VDJ rearrangement and CDR3 characteristics of dual TCR Treg cells across various tissues. This provides new insights and directions for the potential existence of โ€œnew Treg cell subpopulationsโ€ in different tissue locations. The results of this analysis suggest the necessity of conducting functional experiments on dual TCR Treg cells at both the TCR protein level and the level of effector functional molecules.

      (4) Lack of Novelty: The primary findings do not substantially advance our understanding of dual TCR expression, as similar results have been reported previously in other contexts.

      Thank you for your detailed review and suggestions. Early research on dual TCR T cells primarily relied on transgenic mouse models and in vitro experiments, using limited TCR alpha chain or TCR beta chain antibody pairings. Flow cytometry was used to analyze a small number of T cells to estimate dual TCR T cell proportion. No studies have yet analyzed dual TCR Treg cell proportion, V(D)J recombination, and CDR3 characteristics at high throughput in physiological conditions. The scRNA+TCR-seq approach offers an opportunity to conduct extensive studies from an mRNA perspective. With high-throughput advantages of single-cell sequencing technology, researchers can analyze transcriptomic and TCR sequence characteristics of all dual TCR Treg cells within a study sample, providing new ideas and technical means for investigating dual TCR T cell proportions, characteristics, and origins under different physiological and pathological states.

      (5) Incomplete Evidence: The claims about tissue-specific differences lack sufficient controls (e.g., comparison with conventional T cells) and functional validation (e.g., cell surface expression of dual TCRs).

      Thank you for your detailed review and suggestions. This study indeed only analyzed dual TCR Treg cells from different tissue locations based on the original manuscript, without a comparative analysis of other dual TCR T cell subsets corresponding to these tissue locations. The main reason for this is that, in current scRNA+TCR-seq studies of different tissue locations, unless specific T cell subsets are sorted and enriched, the number of T cells obtained from each subset is very low, making a detailed comparative analysis impossible. In the results of the original manuscript, we observed a relatively high proportion of dual TCR Treg cell populations in various tissues, with differences in TCR composition and transcription factor expression. Following the suggestions, we have included additional descriptions in R1, citing the study by Tuovinen et al., which indicates that the proportion of dual TCR Tregs in lymphoid tissues is higher than other T cell types. This will help understand the distribution characteristics of dual TCR Treg cells in different tissues and provide a basis for mRNA expression levels to conduct functional experiments on dual TCR Treg cells in different tissue locations.

      (6) Methodological Weaknesses: The diversity analysis does not account for sample size differences, and the clonal analysis conflates counts and clonotypes, leading to potential misinterpretation.

      We thank you for your review and suggestions. In response to your question about whether the diversity analysis considered the sample size issue, we conducted a detailed review and analysis. This study utilized the inverse Simpson index to evaluate TCR diversity of Treg cells. A preliminary analysis compared the richness and evenness of single TCR Treg cell and dual TCR Treg cell repertoires. The two datasets analyzed were from four mouse samples with consistent processing and sequencing conditions. However, when analyzing single TCR Tregs and dual TCR Tregs from various tissues, differences in detected T cell numbers by sequencing cannot be excluded from the diversity analysis. Following recommendations, we provided additional explanations in R1: CDR3 diversity analysis indicates TCR composition of dual TCR Treg cells exhibits diversity, similar to single TCR Treg cells; however, diversity indices of single TCR Tregs and dual TCR Tregs are not suitable for statistical comparison. Regarding the "clonal analysis" you mentioned, we define clonality based on unique TCR sequences; cells with identical TCR sequences are part of the same clone, with โ‰ฅ2 counts defined as expansion. For example, in Blood, there are 958 clonal types and 1,228 cells, of which 449 are expansion cells. In R1, we systematically verified and revised clonal expansion cells across all tissue samples according to a unified standard.

      (7) Insufficient Transparency: The sequence analysis pipeline is inadequately described, and the study lacks reproducibility features such as shared code and data.

      Thank you for your review and suggestions. Based on the original manuscript, we have made corresponding detailed additions in R1, providing further elaboration on the analysis process of shared data, screening methods, research codes, and tools. This aims to offer readers a comprehensive understanding of the analytical procedures and results.

      (8) Weak Gene Expression Analysis: No statistical validation is provided for differential gene expression, and the UMAP plots fail to reveal meaningful clustering patterns.

      Thank you very much for your review and suggestions. Based on your recommendations, we conducted an initial differential expression analysis of the top 10 mRNA molecules in single TCR Treg and dual TCR Treg cells using the DESeq2 R package in R1, with statistical significance determined by Padj < 0.05. Regarding the clustering patterns in the UMAP plots, since the analyzed samples consisted of isolated Treg cell subpopulations that highly express immune suppression-related genes, we did not perform a more detailed analysis of subtypes and expression gene differences. This study primarily aims to explore the proportions of single TCR and dual TCR Treg cells from different tissue sources, as well as the characteristics of CDR3 composition, with a focus on showcasing the clustering patterns of samples from different tissue origins and various TCR pairing types.

      (9) A quick online search reveals that the same authors have repeated their approach of reanalysing other scientists' publicly available scRNA-VDJ-seq data in six other publications,In other words, the approach used here seems to be focused on quick re-analyses of publicly available data without further validation and/or exploration.

      Thank you for your review and suggestions. Most current studies utilizing scRNA+TCR-seq overlook analysis of TCR pairing types and related research on single TCR and dual TCR T cell characteristics. Through in-depth analysis of shared scRNA+TCR-seq data from multiple laboratories, we discovered a significant presence of dual TCR T cells in high-throughput T cell research results that cannot be ignored. In this study, we highlight the higher proportion of dual TCR Tregs in different tissue locations, which exhibits a certain degree of tissue specificity, suggesting these cells may participate in complex functional regulation of Tregs. This finding provides new ideas and a foundation for further research into dual TCR Treg functions. However, as reviewers pointed out, findings from scRNA+TCR-seq at the mRNA level require additional functional experiments on dual TCR T cells at the protein level. We have supplemented our discussion in R1 based on these suggestions.

      Reviewer #2 (Public review):

      (1)The existence of dual TCR expression by Tregs has previously been demonstrated in mice and humans (Reference #18 and Tuovinen. 2006. Blood. 108:4063; Schuldt. 2017. J Immunol. 199:33, both omitted from references). The presented results should be considered in the context of these prior important findings.

      Thank you very much for your review and suggestions. Based on the original manuscript, we have supplemented our reading, understanding, and citation of closely related literature (Tuovinen, 2006, Blood, 108:4063 (line 44,line175 in R1); Schuldt, 2017, J Immunol, 199:33 (line 44,line178 in R1)). We once again appreciate the valuable comments from the reviewers, and we will refer to these in our subsequent dual TCR T cell research.

      (2) This demonstration of dual TCR Tregs is notable, though the authors do not compare the frequency of dual TCR co-expression by Tregs with non-Tregs. This limits interpreting the findings in the context of what is known about dual TCR co-expression in T cells.

      Thank you very much for your review and suggestions. This analysis is primarily based on the scRNA+TCR-seq study of sorted Treg cells, where we found the proportions and distinguishing features of dual TCR Treg cells in different tissue sites. Given the diversity and complexity of Treg function, conducting a comparative analysis of the origins of dual TCR Treg cells and non-T cells with dual TCRs will be a meaningful direction. Currently, peripheral induced Treg cells can originate from the conversion of non-Treg cells; however, little is known about the sources and functions of dual TCR Treg cell subsets in both central and peripheral sites. In R1, we have supplemented the discussion regarding the possible origins and potential applications of the "novel dual TCR Treg" subsets.

      (3) Comparison of gene expression by single- and dual TCR Tregs is of interest, but as presented is difficult to interpret. Statistical analyses need to be performed to provide statistical confidence that the observed differences are true.

      Thank you very much for your review and suggestions. Based on your recommendations, we performed an initial differential expression analysis of the top 10 mRNA molecules in single TCR Treg and dual TCR Treg cells using the DESeq2 R package in R1, with a statistical significance threshold of Padj<0.05 for comparisons.

      (4) The interpretations of the gene expression analyses are somewhat simplistic, focusing on the single-gene expression of some genes known to have a function in Tregs. However, the investigators miss an opportunity to examine larger patterns of coordinated gene expression associated with developmental pathways and differential function in Tregs (Yang. 2015. Science. 348:589; Li. 2016. Nat Rev Immunol. Wyss. 2016. 16:220; Nat Immunol. 17:1093; Zenmour. 2018. Nat Immunol. 19:291).

      Thank you for your review and suggestions. This study is based on publicly available scRNA+TCR-seq data from different organ sites generated by the original authors, focusing on sorted and enriched Treg cells within each tissue sample. However, there was no corresponding research on other cell types in each tissue sample, preventing analysis of other cells and factors involved in development and differentiation of single TCR Treg and dual TCR Treg. The literature suggested by the reviewer indicates that development, differentiation, and function of Treg cells have been extensively studied, resulting in significant advances. It also highlights complexity and diversity of Treg origins and functions. This research aims to investigate "novel dual TCR Treg cell subpopulations" that may exhibit tissuespecific differences found in the original authors' studies of Treg cells across different organ sites. This suggests further experimental research into their development, differentiation, origin, and functional gene expression as an important direction, which we have supplemented in the discussion section of R1.

      Reviewer #3 (Public review):

      (1) Definition of Dual TCR and Validity of Doublet Removal:This study analyzes Treg cells with Dual TCR, but it is not clearly stated how the possibility of doublet cells was eliminated. The authors mention using DoubletFinder for detecting doublets in scRNA-seq data, but is this method alone sufficient?We strongly recommend reporting the details of doublet removal and data quality assessment in the Supplementary Data.

      Thank you very much for your review and suggestions. In the analysis of the shared scRNA+TCR-seq data across multiple laboratories, as you mentioned, this study employed the DoubletFinder R package to exclude suspected doublets. Additionally, we used the nCount values of individual cells (i.e., the total sequencing reads or UMI counts for each cell) as auxiliary parameters to further optimize the assessment of cell quality. Generally, due to the possibility that doublet cells may contain gene expression information from two or more cells, their nCount values are often abnormally high. In this study, all cells included in the analysis had nCount values not exceeding 20,000. Among the five tissue sample datasets, we further utilized hashtag oligonucleotide (HTO) labeling (where HTO labeling provides each cell with a unique barcode to differentiate cells from different tissue sources. By analyzing HTO labels, doublets and negative cells can be accurately identified) to eliminate doublets and negative cells.After the removal of chimeric cells, all samples exhibited T cells that possessed two or more TCR clones. This phenomenon validates the reliability of the methodological approach employed in this study and indicates that the analytical results accurately reflect the proportion of dual TCR T cells. Based on the recommendations of the reviewers, we have supplemented and clarified the methods and discussion sections in the manuscript. It is particularly noteworthy that in our analysis, the discussed dual TCR Treg cells and single TCR Treg cells specifically refer to those T cells that possess both functional ฮฑ and ฮฒ chains, which are capable of forming TCR. We have excluded from this analysis any Treg cells that possess only a single functional ฮฑ or ฮฒ chain and do not form TCR pairs, as well as those Treg cells in which the ฮฑ or ฮฒ chains involved in TCR pairing are non-functional.

      (2) In Figure 3D, the proportion of Dual TCR T cells (A1+A2+B1+B2) in the skin is reported to be very high compared to other tissues. However, in Figure 4C, the proportion appears lower than in other tissues, which may be due to contamination by non-Tregs. The authors should clarify why it was necessary to include non-Tregs as a target for analysis in this study. Additionally, the sensitivity of scRNA-seq and TCR-seq may vary between tissues and may also be affected by RNA quality and sequencing depth in skin samples, so the impact of measurement bias should be assessed.

      We deeply appreciate your review and constructive comments. Based on the original manuscript, we have further supplemented and elaborated on the uniqueness and relative proportions of double TCR T cell pairs in skin tissue samples in Section R1. Due to the scarcity of T cells in skin samples, we included some non-Treg cells during single-cell RNA sequencing and TCR sequencing to obtain a sufficient number of cells for effective analysis. The presence of non-regulatory T cells may indeed impact the statistical representation of double TCR T cells as well as the related comparative analyses, as noted by the reviewer. T cells with A1+A2+B1+B2 type double TCR pairings are primarily found within the non-regulatory T cell population in the skin. In response to this point, we have provided a detailed explanation of this analytical result in the revised manuscript R1. Furthermore, concerning the two datasets included in the study, we conducted a comparative analysis in R1, exploring how factors such as sequencing depth at different tissue sites might introduce biases in our findings, which we have thoroughly elaborated upon in the discussion section. We thank you once again for your valuable suggestions.

      (3) Issue of Cell Contamination:In Figure 2A, the data suggest a high overlap between blood, kidney, and liver samples, likely due to contamination. Can the authors effectively remove this effect? If the dataset allows, distinguishing between blood-derived and tissue-resident Tregs would significantly enhance the reliability of the findings. Otherwise, it would be difficult to separate biological signals from contamination noise, making interpretation challenging.

      We thank you for your review and suggestions. We have carefully verified data sources for tissues such as blood, kidneys, and liver. In the study by Oliver T et al., various techniques were employed to differentiate between leukocytes from blood and those from tissues, ensuring accurate identification of leukocytes from tissue samples. First, anti-CD45 antibody was injected intravenously to label cells in the vasculature, verifying that analyzed cells were indeed resident in the tissue. Second, prior to dissection and cell collection, authors performed perfusion on anesthetized mice to reduce contamination of tissue samples by leukocytes from the vasculature. Additionally, during single-cell sequencing, authors utilized HTO technology to avoid overlap between cells from different tissues.

      Analysis of the scRNA+TCR-seq data shared by the original authors revealed highly overlapping TCR sequences in blood, kidney, and liver, despite distinct cell labels associated with each tissue. While these techniques minimize overlap of cells from different sources, they cannot completely rule out the potential impact of this technical issue. As suggested, we have provided additional clarification in R1 of the manuscript regarding this phenomenon of high overlap in the kidney, liver, and blood, indicating that the possibility of Treg migration from blood to kidney and liver cannot be entirely excluded.

      (4) Inconsistency Between CDR3 Overlap and TCR Diversity:The manuscript states that Single TCR Tregs have a higher CDR3 overlap, but this contradicts the reported data that Dual TCR Tregs exhibit lower TCR diversity (higher 1/DS score). Typically, when TCR diversity is low (i.e., specific clones are concentrated), CDR3 overlap is expected to increase. The authors should carefully address this discrepancy and discuss possible explanations.

      Thank you for your review and suggestions. Regarding the potential relationship between CDR3 overlap and TCR diversity, in samples with consistent sequencing depth, lower diversity indeed corresponds to a higher proportion of CDR3 overlap. In our analysis of scRNA+TCR-seq data, we found that single TCR Tregs exhibit both higher diversity and CDR3 overlap, seemingly presenting contradictory analytical results (i.e., dual TCR Tregs show lower TCR diversity and CDR3 overlap). In R1, we supplemented the analysis of possible reasons: the presence of multiple TCR chains in dual TCR Treg cells may lead to a higher uniqueness of CDR3 due to multiple rearrangements and selections, resulting in lower CDR3 overlap; the lower diversity of dual TCR Tregs may be related to the number of T cells sequenced in each sample. The CDR3 diversity analysis in this study merely suggests that the TCR composition of dual TCR Treg cells is diverse, similar to that of single TCR Tregs. However, the diversity indices of single TCR Tregs and dual TCR Tregs are not suitable for statistical comparative analysis. A more in-depth and specific analysis of the diversity and overlap of the VDJ recombination mechanisms and CDR3 composition in dual TCR Tregs during development will be an important technical means to elucidate the function of dual TCR Treg cells.

      (5) Functional Evaluation of Dual TCR Tregs:This study indicates gene expression differences among tissue-resident Dual TCR T cells, but there is no experimental validation of their functional significance. Including functional assays, such as suppression assays or cytokine secretion analysis, would greatly enhance the study's impact.

      We sincerely appreciate your review and suggestions: In this analysis of scRNA+TCR-seq data, we innovatively discovered a higher proportion of dual TCR Treg cells in different tissue sites, which exhibited differences in tissue characteristics. Furthermore, we conducted a comparative analysis of the homogeneity and heterogeneity between single TCR Treg and dual TCR Treg cells. This result provides a foundation for further research on the origin and characteristics of dual TCR Treg cells in different tissue sites, offering new insights for understanding the complexity and functional diversity of Treg cells. Based on your suggestions, we have supplemented R1 with the feasibility of further exploring the functions of tissue-resident dual TCR T cells and the necessity for potential application research.

      (6) Appropriateness of Statistical Analysis:When discussing increases or decreases in gene expression and cell proportions (e.g., Figure 2D), the statistical methods used (e.g., t-test, Wilcoxon, FDR correction) should be explicitly described. They should provide detailed information on the statistical tests applied to each analysis.

      Thank you for your review and suggestions: Based on the original manuscript, we have supplemented the specific statistical methods for the differences in cell proportions and gene expression in R1.

    1. eLife Assessment

      This study provides an important perspective on the influence of parental care in the establishment of the amphibian microbiome. Through a combination of cross-fostering experimental work, comparative analysis, and developmental time series, the authors provide compelling evidence that vertical transmission through care is possible, and solid but somewhat preliminary evidence that it plays a significant role in shaping frog skin microbiomes in nature or across time. This work will be of interest to researchers studying the evolution of parental care and microbiomes in vertebrates.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript describes a series of lab and field experiments to understand the role of tadpole transport in shaping the microbiome of poison frogs in early life. The authors conducted a cross-foster experiment in which R. variabilis tadpoles were carried by adults of their own species, carried by adults of another frog species, or not carried at all. After being carried for 6 hours, tadpole microbiomes resembled those of their caregiving species. Next, the authors reported higher microbiome diversity in tadpoles of two species that engage in transport-based parental care compared to one species that does not. Finally, they collected tadpoles either from the backs of an adult (i.e., they had recently been transported) or from eggs (i.e., not transported) but did not find significant overlap in microbiome composition between transported tadpoles and their parents.

      Strengths:

      The cross-foster experiment and the field experiment that reared transported and non-transported tadpoles are creative ways to address an important question in animal microbiome research. Together, they imply a small role for parental care in the development of the tadpole microbiome. The manuscript is generally well-written and easy to understand. The authors make an effort (improved since the first version of the manuscript) to acknowledge the limitations of their experimental design.

      Weaknesses:

      This manuscript has improved since the initial version and now more clearly discusses the limitations of its experimental design. I have no further revisions to request.

    3. Reviewer #2 (Public review):

      Summary:

      Here, the Fischer et al. attempt to understand the role of parental care, specifically the transport of offspring, in the development of the amphibian microbiome. The amphibian microbiome is an important study system due to its association with host health and disease outcomes. This study provides vertical transfer of bacteria through parental transport of tadpoles as one mechanism, among others, influencing tadpole microbiome composition. This paper gives insight into the relative roles of the environment, species, and parental care in amphibian microbiome assembly.

      The authors determine the time of bacterial colonization during tadpole development using PCR, observing that tadpoles were not colonized by bacteria prior to hatching from the vitelline membrane. This is an important finding for amphibian microbiome research and I would be curious to see if this is seen broadly across amphibian species. By doing this, the impact of transport can be more accurately assessed in their laboratory experiments. The authors found that caregiver species influenced community composition, with transported tadpoles sharing a greater proportion of their skin communities with the transporting species.

      In a comparison of three sympatric amphibian species that vary in their reproductive strategies, the authors found that tadpole community diversity was not reflective of habitat diversity, but may be associated with the different reproductive strategies of each species. Parental care explained some of the variance of tadpole microbiomes between species, however, transportation by conspecific adults did not lead to more similar microbiomes between tadpoles and adults compared to species that do not exhibit parental transport. This finding is in agreement with the understanding that the amphibian microbiome is distinct between developmental stages (eggs/tadpoles/adults) and also that amphibian microbiome composition is generally species specific.

      When investigating contributions of caretakers to transported offspring, the authors found that tadpole-adult pairs with a history of direct contact were not more similar than tadpole-adult pairs lacking that history. This conclusion was surprising when considering the direct contact between the adults and tadpoles, however if only certain taxa from the adults are capable of colonizing tadpoles, then one could expect that similar ASVs might be donated between tadpole-adult pairs.

      I did not find any major weaknesses in my review of this paper. I think that the data and conclusions here are of value to other researchers looking into the assembly of the amphibian microbiome. This paper offers insight into how tadpole-transport could influence the microbiome and adds to our overall understanding of amphibian microbiome assembly across the varied life histories of frogs.

    4. Author response:

      The following is the authorsโ€™ response to the original reviews

      Public Reviews:

      Reviewer #1:

      (1) Developmental time series:

      It was not entirely clear how this experiment relates to the rest of the manuscript, as it does not compare any effects of transport within or across species.

      Implemented Changes: ย 

      The importance of species arrival timing for community assembly is addressed in both the introduction and discussion. To accommodate the reviewerโ€™s concerns and further emphasize this point, we have added a clarifying sentence to the results section and included an illustrative example with supporting literature in the discussion.

      Results: Clarifying the timing of initial microbial colonization is essential for determining whether and how priority effects mediate community assembly of vertically transmitted microbes in early life, or whether these microbes arrive into an already established microbial landscape. We used non-sterile frogs of our captive laboratory colony (โ€ฆ)

      Discussion: For example, early microbial inoculation has been shown to increase the relative abundance of beneficial taxa such as Janthinobacterium lividum (Jones et al., 2024), whereas efforts to introduce the same probiotic into established adult communities have not led to long-term persistence (Bletz, 2013; Woodhams et al., 2016). ย 

      (2) Cross-foster experiment:

      The "heterospecific transport" tadpoles were manually brushed onto the back of the surrogate frog, while the "biological transport" tadpoles were picked up naturally by the parent. It is a little challenging to interpret the effect of caregiver species since it is conflated with the method of attachment to the parent. I noticed that the uptake of Os-associated microbes by Os-transported tadpoles seemed to be higher than the uptake of Rv-associated microbes by Rv-associated tadpoles (comparing the second box from the left to the rightmost boxplot in panel S2C). Perhaps this could be a technical artifact if manual attachment to Os frogs was more efficient than natural attachment to Rv frogs.

      I was also surprised to see so much of the tadpole microbiome attributed to Os in tadpoles that were not transported by Os frogs (25-50% in many cases). It suggests that SourceTracker may not be effectively classifying the taxa.

      Implemented Changes: ย 

      Methods (Study species, reproductive strategies and life history): Oophaga sylvatica (Os) (Funkhouser, 1956; CITES Appendix II, IUCN Conservation status: Near Threatened) is a large, diurnal poison frog (family Dendrobatidae) inhabiting lowland and submontane rainforests in Colombia and Ecuador. While male Os care for the clutch of up to seven eggs, females transport 1-2 tadpoles at a time to water-filled leaf axils where tadpoles complete their development (Paลกukonis et al., 2022; Silverstone, 1973; Summers, 1992). Notably, females return regularly to these deposition sites to provision their offspring with unfertilized eggs.

      Discussion: Most poison frogs transport tadpoles on their backs, but the mechanism of adherence remains unclear. Similar to natural conditions, tadpoles that are experimentally placed onto a caregiverโ€™s back also gradually adhere to the dorsal skin, where they remain firmly attached for several hours as the adult navigates dense terrain. Although transport durations were standardized, species-specific factors- such as microbial density at the contact site, microbial taxa identity, and skin physiology such as moisture -could influence microbial transmission between the transporting frog and the tadpole. While these differences may have contributed to varying transmission efficacies observed between the two frog species in our experiment, none of these factors should compromise the correct microbial source assignment. We thus conclude that transporting frogs serve as a source of microbiota for transported tadpoles. However, further studies on species-specific physiological traits and adherence mechanisms are needed to clarify what modulates the efficacy of microbial transmission during transport, both under experimental and natural conditions. ย 

      Methods (Vertical transmission): Cross-fostering tadpoles onto non-parental frogs has been used previously to study navigation in poison frogs (Paลกukonis et al., 2017). According to our experience, successful adherence to both parent and heterospecific frogs depends on the developmental readiness of tadpoles, which must have retracted their gills and be capable of hatching from the vitelline envelope through vigorous movement. Another factor influencing cross-fostering success is the docility of the frog during initial attachment, as erratic movements easily dislodge tadpoles before adherence is established. Rv are small, jumpy frogs that are easily stressed by handling, making experimental fostering of tadpolesโ€”even their ownโ€” impractical. Therefore, we favored an experimental design where tadpoles initiate natural transport and parental frogs pick them up with a 100% success rate. We chose the poison frog Os as foster frogs because adults are docile, parental care in this species involves transporting tadpoles, and skin microbial communities differ from Rv- a critical prerequisite for our SourceTracker analysis. The use of the docile Os as the foster species enabled a 100% cross-fostering success rate, with no notable differences in adherence strength after six hours.

      Methods (Sourcetracker Analysis): To assess training quality, we evaluated model selfassignment using source samples. We selected the model trained on a dataset rarefied to the read depth of the adult frog sample with the lowest read count (48162 reads), as it showed the best overall self-assignment performance, whereas models trained on datasets rarefied to the lowest overall read depth performed worse. Unlike studies using technical replicates, our source samples represent distinct biological individuals and sampling timepoints, where natural microbiome variability is expected within each source category. Consequently, we considered self-assignment rates above 70% acceptable. All source samples were correctly assigned to their respective categories (Rv, Os, or control), but with varying proportions of reads assigned as 'Unknown'. Adult frog sources were reliably selfidentified with high confidence (Os: 97.2% median, IQR = 1.4; Rv: 76.3% median, IQR = 38.1). Adult R. variabilis frogs displayed a higher proportion of 'Unknown' assignments compared to O. sylvatica, likely reflecting greater biological variability among individuals and/or a higher proportion of rare taxa not well captured in the training set. The control tadpole source showed lower self-assignment accuracy (median = 30.5%, IQR = 17.1), as expected given the low microbial biomass of these samples, which resulted in low read depth. Low readdepth limits the information available to inform the iterative updating steps in Gibbs sampling and reduces confidence in source assignments. We therefore verified the robustness of our results by performing the second Sourcetracker analysis as described above, training the model only on adult sources and assigning all tadpoles, including lowbiomass controls, as sinks (as described above). Self-assignment rates for the second training set varied (O. sylvatica: 79.2% median, IQR = 29; R. variabilis: 96.6% median, IQR = 3.7), while results remained consistent across analyses, supporting the reliability of our findings.

      (3) Cross-species analysis:

      Like the developmental time series, this analysis doesn't really address the central question of the manuscript. I don't think it is fair for the authors to attribute the difference in diversity to parental care behavior, since the comparison only includes n=2 transporting species and n=1 non-transporting species that differ in many other ways. I would also add that increased diversity is not necessarily an expectation of vertical transmission. The similarity between adults and tadpoles is likely a more relevant outcome for vertical transmission, but the authors did not find any evidence that tadpole-adult similarity was any higher in species with tadpole transport. In fact, tadpoles and adults were more similar in the non-transporting species than in one of the transporting species (lines 296-298), which seems to directly contradict the authors' hypothesis. I don't see this result explained or addressed in the Discussion.

      To address the reviewerโ€™s concerns, we implemented the following changes: ย 

      Results:

      We rephrased the following sentence from the results part: ย 

      โ€œThese variations may therefore be linked to differing reproductive traits: Af and Rv lay terrestrial egg clutches and transport hatchlings to water, whereas Ll, a non-transporting species, lays eggs directly in water.โ€

      To read

      โ€œThese variations may therefore reflect differences in life history traits among the three species.โ€

      We moved the information on differing reproductive strategies into the Discussion, where it contributes to a broader context alongside other life history traits that may influence community diversity.

      Discussion (1): We added to our discussion that increased microbial diversity was not an expected outcome of vertical transmission.

      โ€œHowever, increased microbial diversity is not a known outcome of vertical transmission, and further studies across a broader range of transporting and non-transporting species are needed to assess the role of transport in shaping diversity of tadpole-associated microbial communities.โ€

      Discussion (2): Likewise, communities associated with adults and tadpoles of transporting species were no more similar than those of non-transporting species. While poison frog tadpoles do acquire caregiver-specific microbes during transport, most of these microbes do not persist on the tadpoles' skin long-term. This pattern can likely be attributed to the capacity of tadpole skin- and gut microbiota to flexibly adapt to environmental changes (Emerson & Woodley, 2024; Santos et al., 2023; Scarberry et al., 2024). It may also reflect the limited compatibility of skin microbiota from terrestrial adults with aquatic habitats or tadpole skin, which differs structurally from that of adults (Faszewski et al., 2008). As a result, many transmitted microbes are probably outcompeted by microbial taxa continuously supplied by the aquatic environment. Interestingly, microbial communities of the non-transporting Ll were more similar to their adult counterparts than those of poison frogs. This pattern might reflect differences in life history among the species. While adult Ll commonly inhabit the rock pools where their tadpoles develop, adults of the two poison frog species visit tadpole nurseries only sporadically for deposition. These differences in habitat use may result in adult Ll hosting skin microbiota that are better adapted to aquatic environments as compared to Rv and Af. Additionally, their presence in the tadpolesโ€™ habitat could make Ll a more consistent source of microbiota for developing tadpoles.

      (4) Field experiment: The rationale and interpretation of the genus-level network are not clear, and the figure is not legible. What does it mean to "visualize the microbial interconnectedness" or to be a "central part of the community"? The previous sentences in this paragraph (lines 337-343) seem to imply that transfer is parent-specific, but the genuslevel network is based on the current adult frogs, not the previous generation of parents that transported them. So it is not clear that the distribution or co-distribution of these taxa provides any insight into vertical transmission dynamics.

      Implemented Changes: ย 

      We appreciate the reviewerโ€™s close reading and understand how the inclusion of the network visualization without further clarification may have led to confusion. To clarify, the network was constructed from all adult frogs in the population, includingโ€”but not limited toโ€”the parental frogs examined in the field experiment. We do not make any claims about the origin of the microbial taxa found on parental frogs. Rather, our aim was to illustrate how genera retained on tadpoles (following potential vertical transmission) contribute to the skin microbial communities of adult frogs of this population beyond just the parental individuals. This finding supports the observation that these retained taxa are generally among the most abundant in adult frogs. However, since this information is already presented in Table S8 and the figure is not essential to the main conclusions, we have removed Supplementary Figure S5 and the accompanying sentence: โ€œA genus-level network constructed from 44 adult frogs shows that the retained genera make up a central part of the community of adult Rv in wild populations (Fig. S5).โ€ We have adjusted the Methods section accordingly.

      Reviewer #2:

      I did not find any major weaknesses in my review of this paper. The work here could potentially benefit from absolute abundance levels for shared ASVs between adults and tadpoles to more thoroughly understand the influences of vertical transmission that might be masked by relative abundance counts. This would only be a minor improvement as I think the conclusions from this work would likely remain the same, however.

      In response to the reviewerโ€™s suggestion, we estimated the absolute abundance of specific ASVs for all samples of tadpoles in which Sourcetracker identified shared ASVs between adults and tadpoles. The resulting scaled absolute abundance values (in copies/ฮผL and copies per tadpole) are provided in Table S10, and a description of the method has been incorporated into the revised Methods section of the manuscript. To support the robustness of this approach in our dataset, we additionally designed an ASV-specific system for ASV24902-Methylocella. Candidate primers were assessed for specificity by performing local BLASTn alignments against the full set of ASV sequences identified in the respective microbial communities of tadpoles. We optimized the annealing temperature via gradient PCR and confirmed primer specificity through Sanger sequencing of the PCR product (Forward: 5โ€ฒโ€“GAGCACGTAGGCGGATCTโ€“3โ€ฒ Reverse: 5โ€ฒโ€“GGACTACNVGGGTWTCTAATโ€“3โ€ฒ). Using this approach, we confirmed that the relative abundance of ASV24902 (18.05% in the amplicon sequencing data) closely matched its proportion of the absolute 16S rRNA copy number in transported tadpole 6 (18.01%). While we intended to quantify all shared ASVs, we were limited to this single target due to insufficient material for optimizing the assays. As this particular ASV was also detected in the water associated with the same tadpole, we chose not to include this confirmation in the manuscript. Nevertheless, the close match supports the reliability of our approach for scaling absolute abundances in this dataset.

      Results: Absolute abundances of shared ASVs likely originating from the parental source pool (as identified by Sourcetracker) after one month of growth ranged from 7804 to 172326 copies per tadpole (Table S10).

      Methods: Quantitative analysis of 16S rRNA copy numbers with digital PCR (dPCR)

      Absolute abundances were estimated for ASVs that were shared between tadpoles after a one-month growth period and their respective caregivers, and for which Sourcetracker analysis identified the caregiver as a likely source of microbiota. We followed the quantitative sequencing framework described by Barlow et al. (2020), measuring total microbial load via digital PCR (dPCR) with the same universal 16S rRNA primers used to amplify the v4 region in our sequencing dataset. Absolute 16S rRNA copy numbers obtained from dPCR were then multiplied by the relative abundances from our amplicon sequencing dataset to calculate ASV-specific scaled absolute abundances. All dPCR reactions were carried out on a QIAcuity Digital PCR System (Qiagen) using Nanoplates with a 8.5K partition configuration, using the following cycling program: 95ยฐC for 2 minutes, 40 cycles of 95ยฐC for 30 seconds and 52ยฐC for 30 seconds and 72ยฐC for 1 minute, followed by 1 cycle of 40ยฐC for 5 minutes. Reactions were prepared using the QIAcuity EvaGreen PCR Kit (Qiagen, Cat. No. 250111) with 2 ยตL of DNA template per reaction, following the manufacturer's protocol, and included a negative no-template control and a cleaned and sequenced PCR product as positive control. Samples were measured in triplicates and serial dilutions were performed to ensure accurate quantification. Data were processed with the QIAcuity Software Suite (v3.1.0.0). The threshold was set based on the negative and positive controls in 1D scatterplots. We report mean copy numbers per microliter with standard deviations, correcting for template input, dPCR reaction volume, and dilution factor. Mean copy numbers per tadpole were additionally calculated by accounting for the DNA extraction (elution) volume. ย 

      Recommendations for the authors:

      Reviewer #1:

      (1) Figure 1b summarizes the ddPCR data as a binary (detected/not detected), but this contradicts the main text associated with this figure, which describes bacteria as present, albeit in low abundances, in unhatched embryos (lines 145-147). Could the authors keep the diagram of tadpole development, which I find very useful, but add the ddPCR data from Figure S1c instead of simply binarizing it as present/absent?

      We appreciate the reviewerโ€™s positive feedback on the clarity of the figure. We agree that presenting the ddPCR data in a more quantitative manner provides a more accurate representation of bacterial abundance across developmental stages. In response, we have retained the developmental diagram, as suggested, and replaced the binary (detected/not detected) information in Figure 1B with rounded mean values for each stage. To complement this, we have included mean values and standard deviations in Table S1. The corresponding text in the main manuscript and legends has been revised accordingly to reflect these changes. ย 

      (2) More information about the foster species, Oophaga sylvatica, would be helpful. Are they sympatric with Rv? Is their transporting behavior similar to that of Rv?

      We thank the reviewer for this helpful comment. In response, we have added further details on the biology and parental care behavior of Oophaga sylvatica, including information on its distribution range. The species does not overlap with Ranitomeya variabilis at the specific study site where the field work was conducted, although the species are sympatric in other countries. These additions have been incorporated into the Methods section under "Study species, reproductive strategies, and life history." ย 

      (3) Plotting the proportion of each tadpole microbiome attributed to R. variabilis and the proportion attributed to O. sylvatica on the same plot is confusing, as these points are nonindependent and there is no way for the reader to figure out which points originated from the same tadpole. I would suggest replacing Figure 1D with Figure S2C, which (if I understand correctly) displays the same data, but is separated according to source.

      We agree with the reviewer that Figure S2C allows for clearer interpretation of our results. In response, we implemented the suggested change and replaced Figure 1D with the alternative visualization previously shown in Figure S2C, which displays the same data separated by source. To provide readers with a complementary overview of the full dataset, we have retained the original combined plot in the supplementary material as Figure S2D.

      (4) On the first read, I found the use of "transport" in the cross-fostering experiment confusing until I understood that they weren't being transported "to" anywhere in particular, just carried for 6 hours. A change of phrasing might help readers here.

      We acknowledge the reviewerโ€™s concern and have replaced โ€œtransportedโ€ with โ€œcarriedโ€ to avoid confusion for readers who may be unfamiliar with the behavioral terminology. However, because โ€œtransportโ€ is the term widely used by specialists to describe this behavior, we now introduce it in the context of the experimental design with the following phrasing:

      โ€œFor this design, sequence-based surveys of amplified 16S rRNA genes were used to assess the composition of skin-associated microbial communities on tadpoles and their adult caregivers (i.e., the frogs carrying the tadpoles, typically referred to as โ€˜transportingโ€™ frogs).โ€

      (5) "Horizontal transfer" typically refers to bacteria acquired from other hosts, not environmental source pools (line 394).

      We addressed this concern by rephrasing the sentence in the Discussion to avoid potential confusion. The revised text now reads:

      โ€œAcross species, newborns might acquire bacteria not only through transfer from environmental source pools and other hosts (โ€ฆ)โ€ ย 

      (6) The authors suggest that tadpole transport may have evolved in Rv and Af to promote microbial diversity because "increased microbial diversity is linked to better health outcomes" (lines 477-479). It is often tempting to assume that more diversity is always better/more adaptive, but this is not universally true. The fact that the Ll frogs seem to be doing fine in the same environment despite their lower microbiome diversity suggests that this interpretation might be too far of a reach based on the data here.

      We appreciate the reviewerโ€™s concern, agree that increased microbial diversity is not inherently advantageous and have revised the paragraph to make this clearer. ย 

      โ€œWhile increased microbial diversity is not inherently advantageous, it has been associated with beneficial outcomes such as improved immune function, lower disease risk, and enhanced fitness in multiple other vertebrate systems.โ€

      However, rather than claiming that greater diversity is always advantageous, we suggest that this possibility should not be excluded and consider it a relevant aspect of a comprehensive discussion. We also note that whether poison frog tadpoles perform equally well with lower microbial diversity remains an open question. Drawing such conclusions would require experimental validation and cannot be inferred from comparisons with an evolutionarily distant species that differs in life history.

      Reviewer #2:

      (1)ย Figure 2: Are the data points in C a subset (just the tadpoles for each species) of B? The numbers look a little different between them. The number of observed ASVs in panel B for Rv look a bit higher than the observed ASVs in panel C.

      The data shown in panel C are indeed a subset of the samples presented in panel B, focusing specifically on tadpoles of each species. The slight differences in the number of observed ASVs between panels result from differences in rarefaction depth between comparisons: due to variation in sequencing depth across species and life stages, we performed rarefaction separately for each comparison in order to retain the highest number of taxa while ensuring comparability within each group. Although we acknowledge that this is not a standard approach, we found that results were consistent when rarefying across the full dataset, but chose the presented approach to better accommodate variation in our sample structure. This methodological detail is described in the Methods section:

      โ€œAll alpha diversity analyses were conducted with datasets rarefied to 90% of the read number of the sample with the fewest reads in each comparison and visualized with boxplots.โ€

      It is also noted in the figure legend: โ€œThe dataset was separately rarefied to the lowest read depth f each comparison.โ€ We hope this clarification adequately addresses the reviewerโ€™s concern and therefore have not made additional changes.

      (2) Lines 304-305: in the Figure 4B plot, there appear to be 12 transported tadpoles and 8 non-transported tadpoles.

      Thank you for catching this. We have corrected the plot and the associated statistics (alpha and beta diversity) in the results section as well as in the figure. Importantly, the correction did not affect any other results, and the overall findings and interpretations remain unchanged. ย 

      (3) Line 311: I think this should be Figure 4B.

      (4) Line 430: tadpole transport.

      (5) Line 431: I believe commas need to surround this phrase "which range from a few hours to several days depending on the species (Lรถtters et al., 2007; McDiarmid & Altig, 1999; Paลกukonis et al., 2019)".

      We thank the reviewer for the thorough review and have corrected all typographical and formatting errors noted in comments (3) โ€“ (5).

    1. eLife Assessment

      This study demonstrates the application of END-seq, originally developed to study genomewide DNA double-strand breaks, to telomere biology; the work packs a punch, concisely demonstrating the utility of this approach and the new insights that can be gained. The authors confirm that telomeres in telomerase-positive cells terminate with 5'-ATC in a Pot1-dependent manner, and demonstrate that this principle holds true in telomerase-negative ALT cells as well. S1-END-seq is similarly developed for telomeres, showing that ALT cells harbor several regions of ssDNA. The study is well-executed and convincing, the new insights are fundamental and compelling, and the optimized END-seq approaches will be widely utilized. The work will prompt additional studies that the reviewers look forward to, including combining telomeric END-seq with long-read sequencing to address the distribution and origin of variant telomere repeats and ssDNA along telomeres in ALT and telomerase-positive settings.

    2. Reviewer #1 (Public review):

      Summary

      This manuscript from Azeroglu et al. presents the application of END-Seq to examine the sequence composition of chromosome termini, i.e., telomeres. END-seq is a powerful genome sequencing strategy developed in Andre Nussesweig's lab to examine the sequences at DNA break sites. Here, END-Seq is applied to explore the nucleotide sequences at telomeres and to ascertain (i) whether the terminal end sequence is conserved in cells that activate ALT telomere elongation mechanism and (ii) whether the processes responsible for telomere end sequence regulation are conserved. With these aims clearly articulated, the authors convincingly show the power of this technique to examine telomere end-processing.

      Strengths

      (1) The authors effectively demonstrate the application of END-seq for these purposes. They verify prior data that 5'terminal sequences of telomeres in Hela and RPE cells end in a canonical ATC sequence motif. They verify that the same sequence is present at the 5' ends of telomeres by performing END-seq across a panel of ALT cancer cells. As in non-ALT cells, the established role of POT1, a ssDNA telomere binding protein, in coordinating the mechanism that maintains the canonical ATC motif is likewise verified. However, by performing END-Seq in mouse cells lacking POT1 isoforms, POT1a and POT1b, the authors uncover that POT1b is dispensable for this process. This reveals a novel, important insight relating to the evolution of POT1 as a telomere regulatory factor.

      (2) The authors then demonstrate the utility of S1-END-seq, a variation of END-Seq, to explore the purported abundance of single-stranded DNA at telomeres within telomeres of ALT cancer cells. Here, they demonstrate that ssDNA abundance is an intrinsic aspect of ALT telomeres and is dependent on the activity of BLM, a crucial mediator of ALT.

      Overall, the authors have effectively shown that END-seq can be applied to examine processes maintaining telomeres in normal and cancerous cells across multiple species. Using END-Seq, the authors confirm prior cell biological and sequencing data and the role of POT1 and BLM in regulating telomere termini sequences and ssDNA abundance. The study is nice and well-written, with the experimental rationale and outcomes clearly explained.

      Weaknesses

      This reviewer finds little to argue with in this study. It is timely and highly valuable for the telomere field. One minor question would be whether the authors could expand more on the application of END-Seq to examine the processive steps of the ALT mechanism? Can they speculate if the ssDNA detected in ALT cells might be an intermediate generated during BIR (i.e., is the ssDNA displaced strand during BIR) or a lesion? Furthermore, have the authors assessed whether ssDNA lesions are due to the loss of ATRX or DAXX, either of which can be mutated in the ALT setting?

      Comments on revisions:

      The authors addressed the comments. Thank you.

    3. Reviewer #2 (Public review):

      This is a short yet very clear manuscript demonstrating that two methods (END-seq and S1-END-seq), previously developed in the Nussenzweig laboratory to study DSBs in the genome, can also be applied to the 5' ends of mammalian telomeres and the accumulation of telomeric single-stranded DNA.

      The authors first validate the applicability of END-seq using different approaches and confirm that mammalian telomeres preferentially end with an ATC 5' end through a mechanism that requires intact POT1 (POT1a in mice). They then extend their analysis to cells that maintain telomeres through the ALT mechanism and demonstrate that, in these cells as well, telomeres frequently end in an ATC 5' sequence via a POT1-dependent mechanism. Using S1-END-seq, the authors further show that ALT telomeres contain single-stranded DNA and estimate that each telomere in ALT cells harbors at least five regions of ssDNA.

      I find this work very interesting and incisive. It clearly demonstrates that END-seq can be applied with unprecedented depth and precision to the study of telomeric features such as the 5' end and ssDNA. The data are very clear and thoroughly interpreted, and the manuscript is well written. The results are carefully analyzed and effectively presented. Overall, I find this manuscript worthy of publication, as the optimized END-seq methods described here will likely be widely utilized in the telomere field.

      Also, the authors have satisfactorily addressed my previous comments.

    4. Reviewer #3 (Public review):

      Summary:

      A subset of cancer cells attain replicative immortality by activating the ALT mechanism of telomere maintenance, which is currently the subject of intense research due to its potential for novel targeted therapies. Key questions remain in the field, such as whether ALT telomeres adhere to the same end-protection rules as telomeres in telomerase-expressing cells, or if ALT telomeres possess unique properties that could be targeted with new, less toxic cancer therapies. Both questions, along with the approaches developed by the authors to address them, are highly relevant.

      Strengths:

      Since chromosome ends resemble one-ended DSBs, the authors hypothesized that the previously described END-SEQ protocol could be used to accurately sequence the 5' end of telomeres on the C-rich strand. As expected, most reads corresponded to the C-rich strand and, confirming previous observation by the de Lange's group, most chromosomes end with the ATC-5' sequence, a feature that was found to be dependent on POT1 and to be conserved in both human ALT cells and mouse cells. Through a complementary method, S1-END-SEQ, the authors further explored ssDNA regions at telomeres, providing new insights into the characteristics of ALT telomeres. The study is original, the experiments were well-controlled and excellently executed.

      Weaknesses:

      A few additional experiments would have strengthened the results such as combining error-free long-read sequencing with END-SEQ to compare the abundance of VTRs within telomeres versus at their distal ends.<br /> Along this line, are VTRs increased at ssDNA regions of ALT telomeres? What is the frequency of VTRs in the END-SEQ analysis of TRF1-FokI-expressing ALT cells? Is it also increased? Has TRF1-FokI been applied to telomerase-expressing cells to compare VTR frequencies at internal sites between ALT and telomerase-expressing cells?<br /> To what extent do ECTRs contribute to telomeric ssDNA?<br /> Future experiments may help shed light on this

    5. Author Response:

      The following is the authorsโ€™ response to the original reviews.

      Reviewer #1 (Recommendations for the authors):ย 

      One minor question would be whether the authors could expand more on the application of END-Seq to examine the processive steps of the ALT mechanism? Can they speculate if the ssDNA detected in ALT cells might be an intermediate generated during BIR (i.e., is the ssDNA displaced strand during BIR) or a lesion? Furthermore, have the authors assessed whether ssDNA lesions are due to the loss of ATRX or DAXX, either of which can be mutated in the ALT setting?

      We appreciate the reviewerโ€™s insightful questions regarding the application of our assays to investigate the nature of the ssDNA detected in ALT telomeres. Our primary aim in this study was to establish the utility of END-seq and S1-END-seq in telomere biology and to demonstrate their applicability across both ALT-positive and -negative contexts. We agree that exploring the mechanistic origins of ssDNA would be highly informative, and we anticipate that END-seqโ€“based approaches will be well suited for such future studies. However, it remains unclear whether the resolution of S1-END-seq is sufficient to capture transient intermediates such as those generated during BIR. We have now included a brief speculative statement in the revised discussion addressing the potential nature of ssDNA at telomeres in ALT cells.

      Reviewer #2 (Recommendations for the authors):

      How can we be sure that all telomeres are equally represented? The authors seem to assume that END-seq captures all chromosome ends equally, but can we be certain of this? While I do not see an obvious way to resolve this experimentally, I recommend discussing this potential bias more extensively in the manuscript.

      We thank the reviewer for raising this important point. END-seq and S1-END-seq are unbiased methods designed to capture either double-stranded or single-stranded DNA that can be converted into blunt-ended double-stranded DNA and ligated to a capture oligo. As such, if a subset of telomeres cannot be processed using this approach, it is possible that these telomeres may be underrepresented or lost. However, to our knowledge, there are no proposed telomeric structures that would prevent capture using this method. For example, even if a subset of telomeres possesses a 5โ€ฒ overhang, it would still be captured by END-seq. Indeed, we observed the consistent presence of the 5โ€ฒ-ATC motif across multiple cell lines and species (human, mouse, and dog). More importantly, we detected predictable and significant changes in sequence composition when telomere ends were experimentally altered, either in vivo (via POT1 depletion) or in vitro (via T7 exonuclease treatment). Together, these findings support the robustness of the method in capturing a representative and dynamic view of telomeres across different systems.

      That said, we have now included a brief statement in the revised discussion acknowledging that we cannot fully exclude the possibility that a subset of telomeres may be missed due to unusual or uncharacterized structures

      I believe Figures 1 and 2 should be merged.

      We appreciate the reviewerโ€™s suggestion to merge Figures 1 and 2. However, we feel that keeping them as separate figures better preserves the logical flow of the manuscript and allows the validation of END-seq and its application to be presented with appropriate clarity and focus. We hope the reviewer agrees that this layout enhances the clarity and interpretability of the data.

      Scale bars should be added to all microscopy figures.

      We thank the reviewer for pointing this out. We have now added scale bars to all the microscopy panels in the figures and included the scale details in the figure legends.

      Reviewer #3 (Recommendations for the authors):

      Overall, the discussion section is lacking depth and should be expanded and a few additional experiments should be performed to clarify the results.

      We thank the reviewer for the suggestions. Based on this reviewerโ€™s comments and comments for the other reviewers, we incorporated several points into the discussion. As a result, we hope that we provide additional depth to our conclusions.

      (1) The finding that the abundance of variant telomeric repeats (VTRs) within the final 30 nucleotides of the telomeric 5' ends is similar in both telomerase-expressing and ALT cells is intriguing, but the authors do not address this result. Could the authors provide more insight into this observation and suggest potential explanations? As the frequency of VTRs does not seem to be upregulated in POT1-depleted cells, what then drives the appearance of VTRs on the C-strand at the very end of telomeres? Is CST-Pola complex responsible?

      The reviewer raises a very interesting and relevant point. We are hesitant at this point to speculate on why we do not see a difference in variant repeats in ALT versus non-ALT cells, since additional data would be needed. One possibility is that variant repeats in ALT cells accumulate stochastically within telomeres but are selected against when they are present at the terminal portion of chromosome ends. However, to prove this hypothesis, we would need error-free long-read technology combined with END-seq. We feel that developing this approach would be beyond the scope of this manuscript.

      (2) The authors also note that, in ALT cells, the frequency of VTRs in the first 30 nucleotides of the S1-END-SEQ reads is higher compared to END-SEQ, but this finding is not discussed either. Do the authors think that the presence of ssDNA regions is associated with the VTRs? Along this line, what is the frequency of VTRs in the END-SEQ analysis of TRF1-FokI-expressing ALT cells? Is it also increased? Has TRF1-FokI been applied to telomerase-expressing cells to compare VTR frequencies at internal sites between ALT and telomerase-expressing cells?

      Similarly to what is discussed above, short reads have the advantage of being very accurate but do not provide sufficient length to establish the relative frequency of VTRs across the whole telomere sequence. The TRF1-FokI experiment is a good suggestion, but it would still be biased toward non-variant repeats due to the TRF1-binding properties. We plan to address these questions in a future study involving long-read sequencing and END-seq capture of telomeres.

      Finally, in these experiments (S1-END-SEQ or END-SEQ in TRF1-Fok1), is the frequency of VTRs the same on both the C- and the G-rich strands? It is possible that the sequences are not fully complementary in regions where G4 structures form.

      We thank the reviewer for this observation. While we do observe a higher frequency of variant telomeric repeats (VTRs) in the first 30 nucleotides of S1-END-seq reads compared to END-seq in ALT cells, we are currently unable to determine whether this difference is significant, as an appropriate control or matched normalization strategy for this comparison is lacking. Therefore, we refrain from overinterpreting the biological relevance of this observation.

      The reviewer is absolutely correct. Our calculation did not exclude the possibility of extrachromosomal DNA as a source of telomeric ssDNA. We have now addressed this point in our discussion.

      The reviewer is correct in pointing out that we still do not know what causes ssDNA at telomeres in ALT cells. Replication stress seems the most logical explanation based on the work of many labs in the field. However, our data did not reveal any significant difference in the levels of ssDNA at telomeres in non-ALT cells based on telomere length. We used the HeLa1.2.11 cell line (now clarified in the Materials section), which is the parental line of HeLa1.3 and has similarly long telomeres (~20 kb vs. ~23 kb). Despite their long telomeres and potential for replication-associated challenges such as G-quadruplex formation, HeLa1.2.11 cells did not exhibit the elevated levels of telomeric ssDNA that we observed in ALT cells (Figure 4B). Additional experiments are needed to map the occurrence of ssDNA at telomeres in relation to progression toward ALT.

      (3) Based on the ratio of C-rich to G-rich reads in the S1-END-SEQ experiment, the authors estimate that ALT cells contain at least 3-5 ssDNA regions per chromosome end. While the calculation is understandable, this number could be discussed further to consider the possibility that the observed ratios (of roughly 0.5) might result from the presence of extrachromosomal DNA species, such as C-circles. The observed increase in the ratio of C-rich to G-rich reads in BLM-depleted cells supports this hypothesis, as BLM depletion suppresses C-circle formation in U2OS cells. To test this, the authors should examine the impact of POLD3 depletion on the C-rich/G-rich read ratio. Alternatively, they could separate high-molecular-weight (HMW) DNA from low-molecular-weight DNA in ALT cells and repeat the S1-END-SEQ in the HMW fraction.

      The reviewer is absolutely correct. Our calculation did not exclude the possibility of extrachromosomal DNA as a source of telomeric ssDNA. We have now addressed this point in our discussion.

      (4) What is the authors' perspective on the presence of ssDNA at ALT telomeres? Do they attribute this to replication stress? It would be helpful for the authors to repeat the S1-END-SEQ in telomerase-expressing cells with very long telomeres, such as HeLa1.3 cells, to determine if ssDNA is a specific feature of ALT cells or a result of replication stress. The increased abundance of G4 structures at telomeres in HeLa1.3 cells (as shown in J. Wong's lab) may indicate that replication stress is a factor. Similar to Wong's work, it would be valuable to compare the C-rich/G-rich read ratios in HeLa1.3 cells to those in ALT cells with similar telomeric DNA content.

      The reviewer is correct in pointing out that we still do not know what causes ssDNA at telomeres in ALT cells. Replication stress seems the most logical explanation based on the work of many labs in the field. However, our data did not reveal any significant difference in the levels of ssDNA at telomeres in non-ALT cells based on telomere length. We used the HeLa1.2.11 cell line (now clarified in the Materials section), which is the parental line of HeLa1.3 and has similarly long telomeres (~20 kb vs. ~23 kb). Despite their long telomeres and potential for replication-associated challenges such as G-quadruplex formation, HeLa1.2.11 cells did not exhibit the elevated levels of telomeric ssDNA that we observed in ALT cells (Figure 4B). Additional experiments are needed to map the occurrence of ssDNA at telomeres in relation to progression toward ALT.

      Finally, Reviewer #3 raises a list of minor points:

      (1) The Y-axes of Figure 4 have been relabeled to account for the G-strand reads.

      (2) Statistical analyses have been added to the figures where applicable.

      (3) The manuscript has been carefully proofread to improve clarity and consistency throughout the text and figure legends

      (4) We have revised the text to address issues related to the lack of cross-referencing between the supplementary figures and their corresponding legends.

    1. eLife Assessment

      This important study addresses the role of non-genetic factors in individual differences in phenotype. Using C. elegans, the study finds that non-genetic differences in gene expression, partly influenced by the environment, correlate with individual differences in two reproductive traits. This supports the use of gene expression data as a key intermediate for understanding complex traits. The clever study design makes for compelling evidence.

    2. Reviewer #1 (Public review):

      Summary:

      Genome-wide association studies have been an important approach to identifying the genetic basis of human traits and diseases. Despite their successes, for many traits, a substantial amount of variation cannot be explained by genetic factors, indicating that environmental variation and individual 'noise' (stochastic differences as well as unaccounted for environmental variation) also play important roles. The authors' goal was to address how gene expression variation in genetically identical individuals, driven by historical environmental differences and 'noise', could be used to predict reproductive trait differences.

      Strengths:

      To address this question, the authors took advantage of genetically identical C. elegans individuals to transcriptionally profile 180 adult hermaphrodite individuals that were also measured for two reproductive traits. A major strength of the paper is in its experimental design. While experimenters aim to control the environment that each worm experiences, it is known that there are small differences even when worms are grown together on the same agar plate - e.g., the age of their mother, their temperature, the amount of food they eat, and the oxygen and carbon dioxide levels depending on where they roam on the plate. Instead of neglecting this unknown variation, the authors design the experiment up front to create two differences in the historical environment experienced by each worm: 1) the age of its mother and 2) 8 8-hour temperature difference, either 20 or 25 C. This helped the authors interpret the gene expression differences and trait expression differences that they observed.

      Using two statistical models, the authors measured the association of gene expression for 8824 genes with the two reproductive traits, considering both the level of expression and the historical environment experienced by each worm. Their data supports several conclusions. They convincingly show that gene expression differences are useful for predicting reproductive trait differences, predicting ~25-50% of the trait differences depending on the trait. Using RNAi, they also show that the genes they identify play a causal role in trait differences. Finally, they demonstrate an association with trait variation and the H3K27 trimethylation mark, suggesting that chromatin structure can be an important causal determinant of gene expression and trait variation.

      Overall, this work supports the use of gene expression data as an important intermediate for understanding complex traits. This approach is also useful as a starting point for other labs in studying their trait of interest.

      Weaknesses:

      There are no major weaknesses that I have noted. Some important limitations of their work are worth highlighting, though (and I believe the authors would agree with these points):

      (1) A large remaining question in the field of complex traits remains in splitting the role of non-genetic factors between environmental variation and stochastic noise. It is still an open question which role each of these factors plays in controlling the gene expression differences they measured between the individual worms.

      (2) The ability of the authors to use gene expression to predict trait variation was strikingly different between the two traits they measured. For the early brood trait, 448 genes were statistically linked to the trait difference, while for egg-laying onset, only 11 genes were found. Similarly, the total R2 in the test set was ~50% vs. 25%. It is unclear why the differences occur, but this somewhat limits the generalizability of this approach to other traits.

      (3) For technical reasons, this approach was limited to whole worm transcription. The role of tissue and cell-type expression differences is important to the field, so this limitation is relevant.

      Comments on revisions: The authors have addressed my previous comments to my satisfaction.

    3. Reviewer #2 (Public review):

      This paper measures associations between RNA transcript levels and important reproductive traits in the model organism C. elegans. The authors go beyond determining which gene expression differences underlie reproductive traits, but also (1) build a model that predicts these traits based on gene expression and (2) perform experiments to confirm that some transcript levels indeed affect reproductive traits. The clever study design allows the authors to determine which transcript levels impact reproductive traits, and also which transcriptional differences are driven by stochastic vs environmental differences. In sum, this is a comprehensive study that highlights the power of gene expression as a driver of phenotype, and also teases apart the various factors that affect the expression levels of important genes.

      Overall, this study has many strengths, is very clearly communicated, and has no substantial weaknesses that I can point to.

      One question that emerges for me is whether these findings apply broadly. In other words, I wonder whether gene expression levels are predictive of other phenotypes in other organisms. I think this question has largely been explored in microbes, where some studies (PMID: 17959824) but not others (PMID: 38895328) found that differences in gene expression were predictive of phenotypes like growth rate. Microbes are not the focus here, and instead, the discussion is mainly focused on using gene expression to predict health and disease phenotypes in humans. This feels a little complicated since humans have so many different tissues. Perhaps an area where this approach might be useful is in examining infectious single-cell populations (bacteria, tumors, fungi). But I suppose this idea might still work in humans, assuming the authors are thinking about targeting specific tissues for RNAseq.

      In sum, this is a great paper that really got me thinking about the predictive power of gene expression and where/when it could inform about (health-related) phenotypes.

      Comments on revisions: No additional comments

    4. Reviewer #3 (Public review):

      Summary:

      Webster et al. sought to understand if phenotypic variation in the absence of genetic variation can be predicted by variation in gene expression. To this end they quantified two reproductive traits, the onset of egg laying and early brood size in cohorts of genetically identical nematodes exposed to alternative ancestral (two maternal ages) and same generation life histories (either constant 20 ยบC temperature or 8-hour temperature shift to 25 ยบC upon hatching) in a two-factor design; then, they profiled genome-wide gene expression in each individual.

      Using multiple statistical and machine learning approaches, they showed that, at least for early brood size, phenotypic variation can be quite well predicted by molecular variation, beyond what can be predicted by life history alone.<br /> Moreover, they provide some evidence that expression variation in some genes might be causally linked to phenotypic variation.

      Strengths:

      Cleverly designed and carefully performed experiments that provide high-quality datasets useful for the community.

      Good evidence that phenotypic variation can be predicted by molecular variation.

      Weaknesses:

      What drives the molecular variation that impacts phenotypic variation remains unknown. While the authors show that variation in expression of some genes might indeed be causal, it is still not clear how much of the molecular variation is a cause rather than a consequence of phenotypic variation.

      Comments on revisions: I have no more comments for the authors

    5. Author Response:

      The following is the authorsโ€™ response to the original reviews

      Reviewer #1 (Public review):ย 

      Summary:ย 

      Genome-wide association studies have been an important approach to identifying the genetic basis of human traits and diseases. Despite their successes, for many traits, a substantial amount of variation cannot be explained by genetic factors, indicating that environmental variation and individual 'noise' (stochastic differences as well as unaccounted for environmental variation) also play important roles. The authors' goal was to address whether gene expression variation in genetically identical individuals, driven by historical environmental differences and 'noise', could be used to predict reproductive trait differences.ย 

      Strengths:ย 

      To address this question, the authors took advantage of genetically identical C. elegans individuals to transcriptionally profile 180 adult hermaphrodite individuals that were also measured for two reproductive traits. A major strength of the paper is its experimental design. While experimenters aim to control the environment that each worm experiences, it is known that there are small differences that each worm experiences even when they are grown together on the same agar plate - e.g. the age of their mother, their temperature, the amount of food they eat, and the oxygen and carbon dioxide levels depending on where they roam on the plate. Instead of neglecting this unknown variation, the authors design the experiment up front to create two differences in the historical environment experienced by each worm: 1) the age of its mother and 2) 8 8-hour temperature difference, either 20 or 25 {degree sign}C. This helped the authors interpret the gene expression differences and trait expression differences that they observed.ย 

      Using two statistical models, the authors measured the association of gene expression for 8824 genes with the two reproductive traits, considering both the level of expression and the historical environment experienced by each worm. Their data supports several conclusions. They convincingly show that gene expression differences are useful for predicting reproductive trait differences, predicting ~25-50% of the trait differences depending on the trait. Using RNAi, they also show that the genes they identify play a causal role in trait differences. Finally, they demonstrate an association with trait variation and the H3K27 trimethylation mark, suggesting that chromatin structure can be an important causal determinant of gene expression and trait variation.ย 

      Overall, this work supports the use of gene expression data as an important intermediate for understanding complex traits. This approach is also useful as a starting point for other labs in studying their trait of interest.ย 

      We thank the reviewer for their thorough articulation of the strengths of our study. ย 

      Weaknesses:ย 

      There are no major weaknesses that I have noted. Some important limitations of the work (that I believe the authors would agree with) are worth highlighting, however:ย 

      (1)ย A large remaining question in the field of complex traits remains in splitting the role of non-genetic factors between environmental variation and stochastic noise. It is still an open question which role each of these factors plays in controlling the gene expression differences they measured between the individual worms.ย 

      Yes, we agree that this is a major question in the field. In our study, we parse out differences driven between known historical environmental factors and unknown factors, but the โ€˜unknown factorsโ€™ could encompass both unknown environmental factors and stochastic noise.

      (2) The ability of the authors to use gene expression to predict trait variation was strikingly different between the two traits they measured. For the early brood trait, 448 genes were statistically linked to the trait difference, while for egg-laying onset, only 11 genes were found. Similarly, the total R2 in the test set was ~50% vs. 25%. It is unclear why the differences occur, but this somewhat limits the generalizability of this approach to other traits.ย 

      We agree that the difference in predictability between the two traits is interesting. A previous study from the Phillips lab measured developmental rate and fertility across Caenorhabditis species and parsed sources of variation (1). Results indicated that 83.3% of variation in developmental rate was explained by genetic variation, while only 4.8% was explained by individual variation. In contrast, for fertility, 63.3% of variation was driven by genetic variation and 23.3% was explained by individual variation. Our results, of course, focus only on predicting the individual differences, but not genetic differences, for these two traits using gene expression data. Considering both sets of results, one hypothesis is that we have more power to explain nongenetic phenotypic differences with molecular data if the trait is less heritable, which is something that could be formally interrogated with more traits across more strains.

      (3) For technical reasons, this approach was limited to whole worm transcription. The role of tissue and celltype expression differences is important to the field, so this limitation is important.ย 

      We agree with this assessment, and it is something we hope to address with future work.

      Reviewer #2 (Public review):ย 

      Summary:ย 

      This paper measures associations between RNA transcript levels and important reproductive traits in the model organism C. elegans. The authors go beyond determining which gene expression differences underlie reproductive traits, but also (1) build a model that predicts these traits based on gene expression and (2) perform experiments to confirm that some transcript levels indeed affect reproductive traits. The clever study design allows the authors to determine which transcript levels impact reproductive traits, and also which transcriptional differences are driven by stochastic vs environmental differences. In sum, this is a rather comprehensive study that highlights the power of gene expression as a driver of phenotype, and also teases apart the various factors that affect the expression levels of important genes.ย 

      Strengths:ย 

      Overall, this study has many strengths, is very clearly communicated, and has no substantial weaknesses that I can point to. One question that emerges for me is about the extent to which these findings apply broadly. In other words, I wonder whether gene expression levels are predictive of other phenotypes in other organisms. I

      think this question has largely been explored in microbes, where some studies (PMID: 17959824) but not others (PMID: 38895328) find that differences in gene expression are predictive of phenotypes like growth rate. Microbes are not the primary focus here, and instead, the discussion is mainly focused on using gene expression to predict health and disease phenotypes in humans. This feels a little complicated since humans have so many different tissues. Perhaps an area where this approach might be useful is in examining infectious single-cell populations (bacteria, tumors, fungi). But I suppose this idea might still work in humans, assuming the authors are thinking about targeting specific tissues for RNAseq.ย 

      In sum, this is a great paper that really got me thinking about the predictive power of gene expression and where/when it could inform about (health-related) phenotypes.ย 

      We thank the reviewer for recognizing the strengths of our study. We are also interested in determining the extent to which predictive gene expression differences operate in specific tissues.

      Reviewer #3 (Public review):ย 

      Summary:ย 

      Webster et al. sought to understand if phenotypic variation in the absence of genetic variation can be predicted by variation in gene expression. To this end they quantified two reproductive traits, the onset of egg laying and early brood size in cohorts of genetically identical nematodes exposed to alternative ancestral (two maternal ages) and same generation life histories (either constant 20C temperature or 8-hour temperature shift to 25C upon hatching) in a two-factor design; then they profiled genome-wide gene expression in each individual.ย 

      Using multiple statistical and machine learning approaches, they showed that, at least for early brood size, phenotypic variation can be quite well predicted by molecular variation, beyond what can be predicted by life history alone.ย 

      Moreover, they provide some evidence that expression variation in some genes might be causally linked to phenotypic variation.ย 

      Strengths:ย 

      (1)ย Cleverly designed and carefully performed experiments that provide high-quality datasets useful for the community.ย 

      (2)ย Good evidence that phenotypic variation can be predicted by molecular variation.ย 

      We thank the reviewer for recognizing the strengths of our study.

      Weaknesses:ย ย 

      What drives the molecular variation that impacts phenotypic variation remains unknown. While the authors show that variation in expression of some genes might indeed be causal, it is still not clear how much of the molecular variation is a cause rather than a consequence of phenotypic variation.ย 

      We agree that the drivers of molecular variation remain unknown. While we addressed one potential candidate (histone modifications), there is much to be done in this area of research. We agree that, while some gene expression differences cause phenotypic changes, other gene expression differences could in principle be downstream of phenotypic differences.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):ย 

      I have a number of suggestions that I believe will improve the Methods section.ย 

      (1)ย Strain N2-PD1073 will probably be confusing to some readers. I recommend spelling out that this is the Phillips lab version of N2.

      Thank you for this suggestion; we have added additional explanation of this strain in the Methods.

      (2)ย I found the details of the experimental design confusing, and I believe a supplemental figure will help. I have listed the following points that could be clarified:ย 

      a.ย What were the biological replicates? How many worms per replicate?

      Biological replicates were defined as experiments set up on different days (in this case, all biological replicates were at least a week apart), and the biological replicate of each worm can be found in Supplementary File 1 on the Phenotypic Data tab.

      b. I believe that embryos and L4s were picked to create different aged P0s, and eggs and L4s were picked to separate plates? Is this correct?

      Yes, this is correct.

      c.ย What was the spread in the embryo age?

      We assume this is asking about the age of the F1 embryos, and these were laid over the course of a 2-hour window. ย 

      d. While the age of the parents is different, there are also features about their growth plates that will be impacted by the experimental design. For example, their pheromone exposure is different due to the role that age plays in the combination of ascarosides that are released. It is worth noting as my reading of the paper makes it seem that parental age is the only thing that matters.

      The parents (P0) of different ages likely have differential ascaroside exposure because they are in the vicinity of other similarly aged worms, but the F1 progeny were exposed to their parents for only the 2-hour egg-laying window, in an attempt to minimize this type of effect as much as possible. ย 

      e.ย Were incubators used for each temperature?

      Yes.

      f.ย In line 443, why approximately for the 18 hours? How much spread?

      The approximation was based on the time interval between the 2-hour egg-laying window on Day 4 and the temperature shift on Day 5 the following morning. The timing was within 30 minutes of 18 hours either direction.

      g.ย  In line 444, "continually left" is confusing. Does this mean left in the original incubator?

      Yes, this means left in the incubator while the worms shifted to 25ยฐC were moved. To avoid confusion, we re-worded this to state they โ€œremained at 20ยฐC while the other half were shifted to 25ยฐCโ€.

      h.ย In line 445, "all worms remained at 20 {degree sign}C" was confusing to me as to what it indicated. I assume, unless otherwise noted, the animals would not be moved to a new temperature.

      This was an attempt to avoid confusion and emphasize that all worms were experiencing the same conditions for this part of the experiment. ย 

      i.ย What size plates were the worms singled onto?

      They were singled onto 6-cm plates.

      j.ย If a figure were to be made, having two timelines (with respect to the P0 and F1) might be useful.

      We believe the methods should be sufficient for someone who hopes to repeat the experiment, and we believe the schematic in Figure 1A labeling P0 and F1 generations is sufficient to illustrate the key features of the experimental design.

      k.ย Not all eggs that are laid end up hatching. Are these censored from the number of progeny calculations?

      Yes, only progeny that hatched and developed were counted for early brood.

      (3)ย For the lysis, was the second transfer to dH20 also a wash step?

      Yes.

      (4)ย What was used for the Elution buffer?

      We used elution buffer consisting of 10 mM Tris, 0.1 mM EDTA. We have added this to the โ€œCell lysate generationโ€ section of the methods

      (5)ย The company that produced the KAPA mRNA-seq prep kit should be listed.

      We added that the kit was from Roche Sequencing Solutions.

      (6)ย For the GO analysis - one potential issue is that the set of 8824 genes might also be restricted to specific GO categories. Was this controlled for?

      We originally did not explicitly control for this and used the default enrichGO settings with OrgDB = org.Ce.eg.db as the background set for C. elegans. We have now repeated the analysis with the โ€œuniverseโ€ set to the 8824-gene background set. This did not qualitatively change the significant GO terms, though some have slightly higher or lower p-values. For comparison purposes, we have added the background-corrected sets to the GO_Terms tab of Supplementary File 1 with each of the three main gene groups appended with โ€œBackgroundOf8824โ€.

      Reviewer #2 (Recommendations for the authors):ย 

      (1) The abstract, introduction, and experimental design are well thought through and very clear.

      Thank you.

      (2)ย Figure 1B could use a clearer or more intuitive label on the horizontal axis. The two examples help. Maybe the genes (points) on the left side should be blue to match Figure 1C, where the genes with a negative correlation are in the blue cluster.

      Thank you for these suggestions. We re-labeled the x-axis as โ€œSlope of early brood vs. gene expression (normalized by CPM)โ€, which we hope gives readers a better intuition of what the coefficient from the model is measuring. We also re-colored the points previously colored red in Figure 1B to be color-coded depending on the direction of association to match Figure 1C, so these points are now color-coded as pink and purple. ย 

      (3)ย If red/blue are pos/neg correlated genes in 1C, perhaps different colors should be used to label ELO and brood in Figures 2 and 3. Green/purple?

      We appreciate this point, but since we ended up using the cluster colors of pink and purple in Figure 1, we opted to leave Figures 2 and 3 alone with the early brood and ELO colorcoding of red and blue.

      (4)ย I am unfamiliar with this type of beta values, but I thought the explanation and figure were very clear. It could be helpful to bold beta1 and beta2 in the top panels of Figure 2, so the readers are not searching around for those among all the other betas. It could also be helpful to add an English phrase to the vertical axes inFigures 2C and 2D, in addition to the beta1 and beta2. Something like "overall effect (beta1)" and"environment-controlled effect (beta2)". Or maybe "effect of environment + stochastic expression differences

      (beta1)" and "effect of stochastic expression differences alone (beta2)". I guess those are probably too big to fit on the figure, but it might be nice to have a label somewhere on this figure connecting them to the key thing you are trying to measure - the effect of gene expression and environment.

      Thank you for these suggestions. We increased the font sizes and bolded ฮฒ1 and ฮฒ2 in Figure 2A-B. In Figure 2C-D, we added a parenthetical under ฮฒ1 to say โ€œ(env + noise)โ€ and ฮฒ2 to say โ€œ(noise)โ€. We agree that this should give the reader more intuition about what the ฮฒ values are measuring. ย 

      Reviewer #3 (Recommendations for the authors):ย 

      The authors collected individuals 24 hours after the onset of egg laying for transcriptomic profiling. This is a well-designed experiment to control for the physiological age of the germline. However, this does not properly control for somatic physiological age. Somatic age can be partially uncoupled from germline age across individuals, and indeed, this can be due to differences in maternal age (Perez et al, 2017). This is because maternal age is associated with increased pheromone exposure (unless you properly controlled for it by moving worms to fresh plates), which causes a germline-specific developmental delay in the progeny, resulting in a delayed onset of egg production compared to somatic development (Perez et al. 2021). You control for germline age, therefore, it is likely that the progeny of day 1 mothers are actually somatically older than the progeny of day 3 mothers. This would predict that many genes identified in these analyses might just be somatic genes that increase or decrease their expression during the young adult stage.ย 

      For example, the abundance of collagen genes among the genes negatively associated (including col-20, which is the gene most significantly associated with early brood) is a big red flag, as collagen genes are known to be changing dynamically with age. If variation in somatic vs germline age is indeed what is driving the expression variation of these genes, then the expectation is that their expression should decrease with age. Vice versa, genes positively associated with early brood that are simply explained by age should be increasing.ย  So I would suggest that the authors first check this using time series transcriptomic data covering the young adult stage they profiled. If this is indeed the case, I would then suggest using RAPToR ( https://github.com/LBMC/RAPToR ), a method that, using reference time series data, can estimate physiological age (including tissue-specific one) from gene expression. Using this method they can estimate the somatic physiological age of their samples, quantify the extent of variation in somatic age across individuals, quantify how much of the observed differences in expressions are explained just by differences in somatic age and correct for them during their transcriptomic analysis using the estimated soma age as a covariate (https://github.com/LBMC/RAPToR/blob/master/vignettes/RAPToR-DEcorrection-pdf.pdf).ย 

      This should help enrich a molecular variation that is not simply driven by hidden differences between somatic and germline age.ย 

      To first address some of the experimental details mentioned for our paper, parents were indeed moved to fresh plates where they were allowed to lay embryos for two hours and then removed. Thus, we believe this minimizes the effects of ascarosides as much as possible within our design. As shown in the paper, we also identified genes that were not driven by parental age and for all genes quantified to what extent each geneโ€™s association was driven by parental age. Thus, it is unlikely that differences in somatic and germline age is the sole explanatory factor, even if it plays some role. We also note that we accounted for egg-laying onset timing in our experimental design, and early brood was calculated as the number of progeny laid in the first 24 hours of egg-laying, where egg-laying onset was scored for each individual worm to the hour. The plot of each wormโ€™s ELO and early brood traits is in Figure S1. Nonetheless, we read the RAPToR paper with interest, as we highlighted in the paper that germline genes tend to be positively associated with early brood while somatic genes tend to be negatively associated. While the RAPToR paper discusses using tissue-specific gene sets to stage genetically diverse C. elegans RILs, the RAPToR reference itself was not built using gene expression data acquired from different C. elegans tissues and is based on whole worms, typically collected in bulk. I.e., age estimates in RILs differ depending on whether germline or somatic gene sets are used to estimate age when the the aging clock is based on N2 samples. Thus, it is unclear whether such an approach would work similarly to estimate age in single worm N2 samples. In addition, from what we can tell, the RAPToR R package appears to implement the overall age estimate, rather than using the tissue-specific gene sets used for RILs in the paper. Because RAPToR would be estimating the overall age of our samples using a reference that is based on fewer samples than we collected here, and because we already know the overall age of our samples measured using standard approaches, we believe that estimating the age with the package would not give very much additional insight. ย 

      Bonferroni correction:ย 

      First, I think there is some confusion in how the author report their p-values: I don't think the authors are using a cut-off of Bonferroni corrected p-value of 5.7 x 10-6 (it wouldn't make sense). It's more likely that they are using a Bonferroni corrected p of 0.05 or 0.1, which corresponds to a nominal p value of 5.7 x 10-6, am I right?

      Yes, we used a nominal p-value of 5.7 x 10-6 to correspond to a Bonferroni-corrected p-value of 0.05, calculated as 0.05/8824. We have re-worded this wherever Bonferroni correction was mentioned.

      Second, Bonferroni is an overly stringent correction method that has now been substituted by the more powerful Benjamini Hochberg method to control the false discovery rate. Using this might help find more genes and better characterize the molecular variation, especially the one associated with ELO?

      We agree that Bonferroni is quite stringent and because we were focused on identifying true positives, we may have some false negatives. Because all nominal p-values are included in the supplement, it is straightforward for an interested reader to search the data to determine if a gene is significant at any other threshold.ย  ย 

      Minor comments:ย 

      (1)ย "In our experiment, isogenic adult worms in a common environment (with distinct historical environments) exhibited a range of both ELO and early brood trait values (Fig S1A)" I think this and the figure is not really needed, Figure S1B is already enough to show the range of the phenotypes and how much variation is driven by the life history traits.

      We agree that the information in S1A is also included in S1B, but we think it is a little more straightforward if one is primarily interested in viewing the distribution for a single trait.

      (2) Line 105 It should be Figure S2, not S3.

      Thank you for catching this mistake.

      (3)ย Gene Ontology on positive and negatively associated genes together: what about splitting the positive and negative?

      We have added a split of positive and negative GO terms to the GO_Terms tab of Supplement File 1. Broadly speaking, the most enriched positively associated genes have many of the same GO terms found on the combined list that are germline related (e.g., involved in oogenesis and gamete generation), whereas the most enriched negatively associated genes have GO terms found on the combined list that are related to somatic tissues (e.g., actin cytoskeleton organization, muscle cell development). This is consistent with the pattern we see for somatic and germline genes shown in Figure 4.

      (4)ย A lot of muscle-related GOs, can you elaborate on that?

      Yes, there are several muscle-related GOs in addition to germline and epidermis. While we do not know exactly why from a mechanistic perspective these muscle-related terms are enriched, it may be important to note that many of these terms have highly overlapping sets of genes which are listed in Supplementary File 1. For example, โ€œmuscle system processโ€ and โ€œmuscle contractionโ€ have the exact same set of 15 genes causing the term to be significantly enriched. Thus, we tend to not interpret having many GO terms on a given tissue as indicating that the tissue is more important than others for a given biological process. While it is clear there are genes related to muscle that are associated with early brood, it is not yet clear that the tissue is more important than others. ย 

      (5)ย "consistent with maternal age affecting mitochondrial gene expression in progeny " - has this been previously reported?

      We do not believe this particular observation has been reported. It is important to note that these genes are involved in mitochondrial processes, but are expressed from the nuclear rather than mitochondrial genome. We re-worded the quoted portion of the sentence to say โ€œconsistent with parental age affecting mitochondria-related gene expression in progenyโ€.

      (6)ย PCA: "Therefore, the optimal number of PCs occurs at the inflection points of the graph, which is after only7 PCs for early brood (R2 of 0.55) but 28 PCs for ELO (R2 of 0.56)."ย 

      Not clear how this is determined: just graphically? If yes, there are several inflection points in the plot. How did you choose which one to consider? Also, a smaller component is not necessarily less predictive of phenotypic variation (as you can see from the graph), so instead of subsequently adding components based on the variance, they explain the transcriptomic data, you might add them based on the variance they explain in the phenotypic data? To this end, have you tried partial least square regression instead of PCA? This should give gene expression components that are ranked based on how much phenotypic variance they explain. ย 

      Thank you for this thoughtful comment. We agree that, unlike for Figure 3B, there is some interpretation involved on how many PCs is optimal because additional variance explained with each PC is not strictly decreasing beyond a certain number of PCs. Our assessment was therefore made both graphically and by looking at the additional variance explained with each additional PC. For example, for early brood, there was no PC after PC7 that added more than 0.04 to the R2. We could also have plotted early brood and ELO separately and had a different ordering of PCs on the x-axis. By plotting the data this way, we emphasized that the factors that explain the most variation in the gene expression data typically explain most variation in the phenotypic data. ย 

      (7) The fact that there are 7 PC of molecular variation that explain early brood is interesting. I think the authors can analyze this further. For example, could you perform separate GO enrichment for each component that explains a sizable amount of phenotypic variance? Same for the ELO. ย 

      Because each gene has a PC loading in for each PC, and each PC lacks the explanatory power of combined PCs, we believe doing GO Terms on the list of genes that contribute most to each PC is of minimal utility. The power of the PCA prediction approach is that it uses the entire transcriptome, but the other side of the coin is that it is perhaps less useful to do a gene-bygene based analysis with PCA. This is why we separately performed individual gene associations and 10-gene predictive analyses. However, we have added the PC loadings for all genes and all PCs to Supplementary File 1.

      (8) Avoid acronyms when possible (i.e. ELO in figures and figure legends could be spelled out to improve readability).

      We appreciate this point, but because we introduced the acronym both in Figure 1 and the text and use it frequently, we believe the reader will understand this acronym. Because it is sometimes needed (especially in dense figures), we think it is best to use it consistently throughout the paper.

      (9)ย Multiple regression: I see the most selected gene is col-20, which is also the most significantly differentially expressed from the linear mixed model (LMM). But what is the overlap between the top 300 genes in Figure 3F and the 448 identified by the LMM? And how much is the overlap in GO enrichment?

      Genes that showed up in at least 4 out of 500 iterations were selected more often than expected by chance, which includes 246 genes (as indicated by the red line in Figure 3F). Of these genes, 66 genes (27%) are found in the set of 448 early brood genes. The proportion of overlap increases as the number of iterations required to consider a gene predictive increases, e.g., 34% of genes found in 5 of 500 iterations and 59% of genes found in 10 of 500 iterations overlap with the 448 early brood genes. However, likely because of the approach to identify groups of 10 genes that are predictive, we do not find significant GO terms among the 246 genes identified with this approach after multiple test correction. We think this makes sense because the LMM identifies genes that are individually associated with early brood, whereas each subsequent gene included in multiple regression affects early brood after controlling for all previous genes. These additional genes added to the multiple regression are unlikely to have similar patterns as genes that are individually correlated with early brood. ย 

      (10)ย Elastic nets: prediction power is similar or better than multiple regression, but what is the overlap between genes selected by the elastic net (not presented if I am not mistaken) and multiple regression and the linear mixed model?

      For the elastic net models, we used a leave-one-out cross validation approach, meaning there were separate models fit by leaving out the trait data for each worm, training a model using the trait data and transcriptomic data for the other worms, and using the transcriptomic data of the remaining worm to predict the trait data. By repeating this for each worm, the regressions shown in the paper were obtained. Each of these models therefore has its own set of genes. Of the 180 models for early brood, the median model selects 83 genes (range from 72 to 114 genes). Across all models, 217 genes were selected at least once. Interestingly, there was a clear bimodal distribution in terms of how many models a given gene was selected for: 68 genes were selected in over 160 out of 180 models, while 114 genes were selected in fewer than 20 models (and 45 genes were selected only once). Therefore, we consider the set of 68 genes as highly robustly selected, since they were selected in the vast majority of models. This set of 68 exhibits substantial overlap with both the set of 448 early brood-associated genes (43 genes or 63% overlap) and the multiple regression set of 246 genes (54 genes or 79% overlap). For ELO, the median model selected 136 genes (range of 96 to 249 genes) and a total of 514 genes were selected at least once. The distribution for ELO was also bimodal with 78 genes selected over 160 times and 255 genes selected fewer than 20 times. This set of 78 included 6 of the 11 significant ELO genes identified in the LMM.ย  We have added tabs to Supplementary File 1 that include the list of genes selected for the elastic net models as well as a count of how many times they were selected out of 180 models.

      (11) In other words, do these different approaches yield similar sets of genes, or are there some differences?

      In the end, which approach is actually giving the best predictive power? From the perspective of R2, both the multiple regression and elastic net models are similarly predictive for early brood, but elastic net is more predictive for ELO. However, in presenting multiple approaches, part of our goal was identifying predictive genes that could be considered the โ€˜bestโ€™ in different contexts. The multiple regression was set to identify exactly 10 genes, whereas the elastic net model determined the optimal number of genes to include, which was always over 70 genes. Thus, the elastic net model is likely better if one has gene expression data for the entire transcriptome, whereas the multiple regression genes are likely more useful if one were to use reporters or qRTPCR to measure a more limited number of genes. ย 

      (12) Line 252: "Within this curated set, genes causally affected early brood in 5 of 7 cases compared to empty vector (Figure 4A).

      " It seems to me 4 out of 7 from Figure 4A. In Figure 4A the five genes are (1) cin-4, (2) puf5; puf-7, (3) eef-1A.2, (4) C34C12.8, and (5) tir-1. We did not count nex-2 (p = 0.10) or gly-13 (p = 0.07), and empty vector is the control.

      (13)ย Do puf-5 and -7 affect total brood size or only early brood size? Not clear. What's the effect of single puf-5 and puf-7 RNAi on brood?

      We only measured early brood in this paper, but a previous report found that puf-5 and puf-7 act redundantly to affect oogenesis, and RNAi is only effective if both are knocked down together(2). We performed pilot experiments to confirm that this was the case in our hands as well. ย 

      (14)ย  To truly understand if the noise in expression of Puf-5 and /or -7 really causes some of the observed difference in early brood, could the author use a reporter and dose response RNAi to reduce the level of puf-5/7 to match the lower physiological noise range and observe if the magnitude of the reduction of early brood by the right amount of RNAi indeed matches the observed physiological "noise" effect of puf-5/7 on early brood?

      We agree that it would be interesting to do the dose response of RNAi, measure early brood, and get a readout of mRNA levels to determine the true extent of gene knockdown in each worm (since RNAi can be noisy) and whether this corresponds to early brood when the knockdown is at physiological levels. While we believe we have shown that a dose response of gene knockdown results in a dose response of early brood, this additional analysis would be of interest for future experiments.

      (15) Regulated soma genes (enriched in H3K27me3) are negatively correlated with early brood. What would be the mechanism there? As mentioned before, it is more likely that these genes are just indicative of variation in somatic vs germline age (maybe due to latent differences in parental perception of pheromone).

      We can think of a few potential mechanisms/explanations, but at this point we do not have a decisive answer. Regulated somatic genes marked with H3K27me3 (facultative heterochromatin) are expressed in particular tissues and/or at particular times in development. In this study and others, genes marked with H3K27me3 exhibit more gene expression noise than genes with other marks. This could suggest that there are negative consequences for the animal if genes are expressed at higher levels at the wrong time or place, and one interpretation of the negative association is that higher expressed somatic genes results in lower fitness (where early brood is a proxy for fitness). Another related interpretation is that there are tradeoffs between somatic and germline development and each individual animal lands somewhere on a continuum between prioritizing germline or somatic development, where prioritizing somatic integrity (e.g. higher expression of somatic genes) comes at a cost to the germline resulting in fewer progeny. Additional experiments, including measurements of histone marks in worms measured for the early brood trait, would likely be required to more decisively answer this question. ย 

      (16) Line 151: "Among significant genes for both traits, ฮฒ2 values were consistently lower than ฮฒ1 (Figures 2CD), suggesting some of the total effect size was driven by environmental history rather than pure noise".

      We are interpreting this quote as part of point 17 below.

      (17) It looks like most of the genes associated with phenotypes from the univariate model have a decreased effect once you account for life history, but have you checked for cases where the life history actually masks the effect of a gene? In other words, do you have cases where the effect of gene expression on a phenotype is only (or more) significant after you account for the effect of life history (ฮฒ2 values higher than ฮฒ1)?

      This is a good question and one that we did not explicitly address in the paper because we focused on beta values for genes that were significant in the univariate analysis. Indeed, for the sets of 448 early brood genes ad 11 ELO genes, there are no genes for which ฮฒ2 is larger than ฮฒ1. In looking at the larger dataset of 8824 genes, with a Bonferroni-corrected p-value of 0.05, there are 306 genes with a significant ฮฒ2 for early brood. The majority (157 genes) overlap with the 448 genes significant in the univariate analysis and do not have a higher ฮฒ2 than ฮฒ1. Of the remaining genes, 72 of these have a larger ฮฒ2 than ฮฒ1. However, in most cases, this difference is relatively small (median difference of 0.025) and likely insignificant. There are only three genes in which ฮฒ1 is not nominally significant, and these are the three genes with the largest difference between ฮฒ1 and ฮฒ2 with ฮฒ2 being larger (differences of 0.166, 0.155, and 0.12). In contrast, the median difference between ฮฒ1 and ฮฒ2 the 448 genes (in which ฮฒ1 is larger) is 0.17, highlighting the most extreme examples of ฮฒ2 > ฮฒ1 are smaller in magnitude than the typical case of ฮฒ1 > ฮฒ2. For ELO, there are no notable cases where ฮฒ2 > ฮฒ1. There are eight genes with a significant ฮฒ2 value, and all of these have a ฮฒ1 value that is nominally significant. Therefore, while this phenomenon does occur, we find it to be relatively rare overall. For completeness, we have added the ฮฒ1 and ฮฒ2 values for all 8824 genes as a tab in Supplementary File 1.

    1. eLife Assessment

      The authors address a fundamental question for cell and tissue biology. They use the skin epidermis as a paradigm and ask how stratifying self-renewing epithelia induce differentiation and upward migration in basal dividing progenitor cells to generate suprabasal barrier-forming cells that are essential for a functional barrier formed by such an epithelium. The authors provide compelling evidence time that an increase in intracellular actomyosin contractility, a hallmark of barrier-forming keratinocytes, is sufficient to trigger terminal differentiation, providing in vivo evidence of the interdependency of cell mechanics and differentiation. To illustrate their points, the authors use a combination of genetic mouse models, RNA sequencing, and immunofluorescence analysis. Precisely how the changes in gene expression, cell morphology, mechanics, and cell position are instructive and whether consecutive changes in differentiation are required still remain unclear, but the paper takes a nice step in advancing our knowledge of the process.

    2. Reviewer #1 (Public review):

      Summary:

      The authors address a fundamental question for cell and tissue biology using the skin epidermis as a paradigm and ask how stratifying self-renewing epithelia induce differentiation and upwards migration in basal dividing progenitor cells to generate suprabasal barrier-forming cells that are essential for a functional barrier formed by such an epithelium. The authors show for the first time that an increase in intracellular actomyosin contractility, a hallmark of barrier-forming keratinocytes, is sufficient to trigger terminal differentiation. Hence the data provide in vivo evidence of the more general interdependency of cell mechanics and differentiation. The data appear to be of high quality and the evidences are strengthened through a combination of different genetic mouse models, RNA sequencing and immunofluorescence analysis.

      To generate and maintain the multilayered, barrier-forming epidermis, keratinocytes of the basal stem cell layer differentiate and move suprabasally accompanied by stepwise changes not only in gene expression but also in cell morphology, mechanics and cell position. If any of these changes are instructive for differentiation itself, and whether consecutive changes in differentiation are required, remains unclear. Also, there are few comprehensive data sets on the exact changes in gene expression between different states of keratinocyte differentiation. In this study, through genetic fluorescence labeling of cell states at different developmental timepoints the authors were able to analyze gene expression of basal stem cells and suprabasal differentiated cells at two different stages of maturation: E14 (embryonic day 14) when the epidermis comprises mostly two functional compartments (basal stem cells and suprabasal so called intermediate cells) and E16 when the epidermis comprise three (living) compartments where the spinous layer separates basal stem cells from the barrier forming granular layer, as is the case in adult epidermis. Using RNA bulk sequencing, the authors developed useful new markers for suprabasal stages of differentiation like MafB and Cox1. The transcription factor MafB was then shown to inhibit suprabasal proliferation in a MafB transgenic model.

      The data indicate that early in development at E14 the suprabasal intermediate cells resemble in terms of RNA expression, the barrier-forming granular layer at E16, suggesting that keratinocyte can undergo either stepwise (E16) or more direct (E14) terminal differentiation.

      Previous studies by several groups found an increased actomyosin contractility in the barrier forming granular layer and showed that this increase in tension is important for epidermal barrier formation and function. However, it was not clear whether contractility itself serves as an instructive signal for differentiation. To address this question, the authors use a previously published model to induce premature hypercontractility in the spinous layer by using spastin overexpression (K10-Spastin) to disrupt microtubules (MT) thereby indirectly inducing actomyosin contractility. A second model activates myosin contractility more directly through overexpression of a constitutively active RhoA GEF (K10-Arhgef11CA). Both models induce late differentiation of suprabasal keratinocytes regardless of the suprabasal position in either spinous or granular layer indicating that increased contractility is key to induce late differentiation of granular cells. A potential weakness is the use of the K10-spastin model that disrupts MT and likely has additional roles in altering differentiation next to the induction of hypercontractility. Their previous publications provided some evidence that the effect on differentiation is driven by the increase in contractility (Ning et al. cell stem cell 2021). Moreover, their data are now further supported by a second model activating myosin through RhoA. This manuscript extends their previous findings that indicated a role for contractility in early differentiation, now focussing on the regulation of late differentiation in barrier forming cells. This data set thus help to unravel the interdependencies of cell position, mechanical state and differentiation in the epidermis, and suggest that an increase in cellular contractility within the epidermis can induce terminal differentiation. Importantly the authors show that despite contractility induced nuclear localization of the mechanoresponsive transcription factor YAP in the barrier forming granular layer, YAP nuclear localization is not sufficient to drive premature differentiation when forced to the nucleus in the spinous layer.

      Overall, this is a well written manuscript and comprehensive dataset.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript from Prado-Mantilla and co-workers addresses mechanisms of embryonic epidermis development, focusing on the intermediate layer cells, a transient population of suprabasal cells that contributes to the expansion of the epidermis through proliferation. Using bulk-RNA they show that these cells are transcriptionally distinct from the suprabasal spinous cells and identify specific marker genes for these populations. They then use transgenesis to demonstrate that one of these selected spinous layer-specific markers, the transcription factor MafB is capable of suppressing proliferation in the intermediate layers, providing a potential explanation for the shift of suprabasal cells into a non-proliferative state during development. Further, lineage tracing experiments show that the intermediate cells become granular cells without a spinous layer intermediate. Finally, the authors show that the intermediate layer cells express high levels of contractility-related genes than spinous layers and overexpression of cytoskeletal regulators accelerates differentiation of spinous layer cells into granular cells.

      Overall, the manuscript presents a number of interesting observations on the developmental stage-specific identities of suprabasal cells and their differentiation trajectories, and points to a potential role of contractility in promoting differentiation of suprabasal cells into granular cells. The precise mechanisms by which MafB suppresses proliferation, how the intermediate cells bypass the spinous layer stage to differentiate into granular cells and how contractility feeds into these mechanisms remain open. Interestingly, while the mechanosensitive transcription factor YAP appears differentially active in the two states, it is shown to be downstream rather than upstream of the observed differences in mechanics.

      Strengths:

      The authors use a nice combination of RNA sequencing, imaging, lineage tracing and transgenesis to address the suprabasal to granular layer transition. The imaging is convincing and the biological effects appear robust. The manuscript is clearly written and logical to follow.

      Weaknesses:

      While the data overall supports the authors claims, there are a few minor weaknesses that pertain to the aspect of the role of contractility, The choice of spastin overexpression to modulate contractility is not ideal as spastin has multiple roles in regulating microtubule dynamics and membrane transport which could also be potential mechanisms explaining some of the phenotypes. Use of Arghap11 overexpression mitigates this effect to some extent but overall it would have been more convincing to manipulate myosin activity directly. It would also be important to show that these manipulations increase the levels of F-actin and myosin II as shown for the intermediate layer. It would also be logical to address if further increasing contractility in the intermediate layer would enhance the differentiation of these cells.

      Despite these minor weaknesses, the manuscript is overall of high quality, sheds new light on the fundamental mechanisms of epidermal stratification during embryogenesis and will likely be of interest to the skin research community.

    4. Reviewer #3 (Public review):

      Summary:

      This is an interesting paper by Lechler and colleagues describing the transcriptomic signature and fate of intermediate cells (ICs), a transient and poorly defined embryonic cell type in the skin. ICs are the first suprabasal cells in the stratifying skin and unlike later-developing suprabasal cells, ICs continue to divide. Using bulk RNA seq to compare ICs to spinous and granular transcriptomes, the authors find that IC-specific gene signatures include hallmarks of granular cells, such as genes involved in lipid metabolism and skin barrier function that are not expressed in spinous cells. ICs were assumed to differentiate into spinous cells, but lineage tracing convincingly shows ICs differentiate directly into granular cells without passing through a spinous intermediate. Rather, basal cells give rise to the first spinous cells. They further show that transcripts associated with contractility are also shared signatures of ICs and granular cells, and overexpression of two contractility inducers (Spastin and ArhGEF-CA) can induce granular and repress spinous gene expression. This contractility-induced granular gene expression does not appear to be mediated by the mechanosensitive transcription factor, Yap. The paper also identifies new markers that distinguish IC and spinous layers, and shows the spinous signature gene, MafB, is sufficient to repress proliferation when prematurely expressed in ICs.

      Strengths:

      Overall this is a well-executed study, and the data are clearly presented and the findings convincing. It provides an important contribution to the skin field by characterizing the features and fate of ICs, a much understudied cell type, at a high levels of spatial and transcriptomic detail. The conclusions challenge the assumption that ICs are spinous precursors through compelling lineage tracing data. The demonstration that differentiation can be induced by cell contractility is an intriguing finding, and adds a growing list of examples where cell mechanics influence gene expression and differentiation.

      Weaknesses:

      A weakness of the study is an over-reliance on overexpression and sufficiency experiments to test the contributions of MafB, Yap, and contractility in differentiation. The inclusion of loss-of-function approaches would enable one to determine if, for example, contractility is required for the transition of ICs to granular fate, and whether MafB is required for spinous fate. Second, whether the induction of contractility-associated genes is accompanied by measurable changes in the physical properties or mechanics of the IC and granular layers is not directly shown. Inclusion of physical measurements would bolster the conclusion that mechanics lies upstream of differentiation.

      Finally, the role of ICs in epidermal development remains unclear. Although not essential to support the conclusions of this study, insights into the function of this transient cell layer would strengthen the overall impact.

    5. Author Response:

      The following is the authorsโ€™ response to the original reviews

      Reviewer #1 (Public review):ย 

      Summary:ย 

      The authors address a fundamental question for cell and tissue biology using the skin epidermis as a paradigm and ask how stratifying self-renewing epithelia induce differentiation and upward migration in basal dividing progenitor cells to generate suprabasal barrier-forming cells that are essential for a functional barrier formed by such an epithelium. The authors show for the first time that an increase in intracellular actomyosin contractility, a hallmark of barrier-forming keratinocytes, is sufficient to trigger terminal differentiation. Hence the data provide in vivo evidence of the more general interdependency of cell mechanics and differentiation. The data appear to be of high quality and the evidences are strengthened through a combination of different genetic mouse models, RNA sequencing, and immunofluorescence analysis.ย 

      To generate and maintain the multilayered, barrier-forming epidermis, keratinocytes of the basal stem cell layer differentiate and move suprabasally accompanied by stepwise changes not only in gene expression but also in cell morphology, mechanics, and cell position. Whether any of these changes is instructive for differentiation itself and whether consecutive changes in differentiation are required remains unclear. Also, there are few comprehensive data sets on the exact changes in gene expression between different states of keratinocyte differentiation. In this study, through genetic fluorescence labeling of cell states at different developmental time points the authors were able to analyze gene expression of basal stem cells and suprabasal differentiated cells at two different stages of maturation: E14 (embryonic day 14) when the epidermis comprises mostly two functional compartments (basal stem cells and suprabasal socalled intermediate cells) and E16 when the epidermis comprise three (living) compartments where the spinous layer separates basal stem cells from the barrier-forming granular layer, as is the case in adult epidermis. Using RNA bulk sequencing, the authors developed useful new markers for suprabasal stages of differentiation like MafB and Cox1. The transcription factor MafB was then shown to inhibit suprabasal proliferation in a MafB transgenic model.ย 

      The data indicate that early in development at E14 the suprabasal intermediate cells resemble in terms of RNA expression, the barrier-forming granular layer at E16, suggesting that keratinocytes can undergo either stepwise (E16) or more direct (E14) terminal differentiation.ย 

      Previous studies by several groups found an increased actomyosin contractility in the barrierforming granular layer and showed that this increase in tension is important for epidermal barrier formation and function. However, it was not clear whether contractility itself serves as an instructive signal for differentiation. To address this question, the authors use a previously published model to induce premature hypercontractility in the spinous layer by using spastin overexpression (K10-Spastin) to disrupt microtubules (MT) thereby indirectly inducing actomyosin contractility. A second model activates myosin contractility more directly through overexpression of a constitutively active RhoA GEF (K10-Arhgef11CA). Both models induce late differentiation of suprabasal keratinocytes regardless of the suprabasal position in either spinous or granular layer indicating that increased contractility is key to induce late differentiation of granular cells. A potential weakness of the K10-spastin model is the disruption of MT as the primary effect which secondarily causes hypercontractility. However, their previous publications provided some evidence that the effect on differentiation is driven by the increase in contractility (Ning et al. cell stem cell 2021). Moreover, the data are confirmed by the second model directly activating myosin through RhoA. These previous publications already indicated a role for contractility in differentiation but were focused on early differentiation. The data in this manuscript focus on the regulation of late differentiation in barrier-forming cells. These important data help to unravel the interdependencies of cell position, mechanical state, and differentiation in the epidermis, suggesting that an increase in cellular contractility in most apical positions within the epidermis can induce terminal differentiation. Importantly the authors show that despite contractility-induced nuclear localization of the mechanoresponsive transcription factor YAP in the barrier-forming granular layer, YAP nuclear localization is not sufficient to drive premature differentiation when forced to the nucleus in the spinous layer.ย 

      Overall, this is a well-written manuscript and a comprehensive dataset. Only the RNA sequencing result should be presented more transparently providing the full lists of regulated genes instead of presenting just the GO analysis and selected target genes so that this analysis can serve as a useful repository. The authors themselves have profited from and used published datasets of gene expression of the granular cells. Moreover, some of the previous data should be better discussed though. The authors state that forced suprabasal contractility in their mouse models induces the expression of some genes of the epidermal differentiation complex (EDC). However, in their previous publication, the authors showed that major classical EDC genes are actually not regulated like filaggrin and loricrin (Muroyama and Lechler eLife 2017). This should be discussed better and necessitates including the full list of regulated genes to show what exactly is regulated.ย 

      We thank the reviewers for their suggestions and comments.

      Thank you for the suggestion to include gene lists. We had an excel document with all this data but neglected to upload it with the initial manuscript. This includes all the gene signatures for the different cell compartments across development. We also include a tab that lists all EDC genes and whether they were up-regulated in intermediate cells and cells in which contractility was induced. Further, we note that all the RNA-Seq datasets are available for use on GEO (GSE295753). ย 

      In our previous publication, we indeed included images showing that loricrin and filaggrin were both still expressed in the differentiated epidermis in the spastin mutant. Both Flg and Lor mRNA were up in the RNA-Seq (although only Flg was statistically significant), though we didnโ€™t see a notable change in protein levels. It is unclear whether this is just difficult to see on top of the normal expression, or whether there are additional levels of regulation where mRNA levels are increased but protein isnโ€™t. That said, our data clearly show that other genes associated with granular fate were increased in the contractile skin.ย 

      Reviewer #2 (Public review):ย 

      Summary:ย 

      The manuscript from Prado-Mantilla and co-workers addresses mechanisms of embryonic epidermis development, focusing on the intermediate layer cells, a transient population of suprabasal cells that contributes to the expansion of the epidermis through proliferation. Using bulk-RNA they show that these cells are transcriptionally distinct from the suprabasal spinous cells and identify specific marker genes for these populations. They then use transgenesis to demonstrate that one of these selected spinous layer-specific markers, the transcription factor MafB is capable of suppressing proliferation in the intermediate layers, providing a potential explanation for the shift of suprabasal cells into a non-proliferative state during development. Further, lineage tracing experiments show that the intermediate cells become granular cells without a spinous layer intermediate. Finally, the authors show that the intermediate layer cells express higher levels of contractility-related genes than spinous layers and overexpression of cytoskeletal regulators accelerates the differentiation of spinous layer cells into granular cells.ย 

      Overall the manuscript presents a number of interesting observations on the developmental stage-specific identities of suprabasal cells and their differentiation trajectories and points to a potential role of contractility in promoting differentiation of suprabasal cells into granular cells. The precise mechanisms by which MafB suppresses proliferation, how the intermediate cells bypass the spinous layer stage to differentiate into granular cells, and how contractility feeds into these mechanisms remain open. Interestingly, while the mechanosensitive transcription factor YAP appears deferentially active in the two states, it is shown to be downstream rather than upstream of the observed differences in mechanics.ย 

      Strengths:ย 

      The authors use a nice combination of RNA sequencing, imaging, lineage tracing, and transgenesis to address the suprabasal to granular layer transition. The imaging is convincing and the biological effects appear robust. The manuscript is clearly written and logical to follow.ย 

      Weaknesses:ย 

      While the data overall supports the authors' claims, there are a few minor weaknesses that pertain to the aspect of the role of contractility, The choice of spastin overexpression to modulate contractility is not ideal as spastin has multiple roles in regulating microtubule dynamics and membrane transport which could also be potential mechanisms explaining some of the phenotypes. Use of Arghap11 overexpression mitigates this effect to some extent but overall it would have been more convincing to manipulate myosin activity directly. It would also be important to show that these manipulations increase the levels of F-actin and myosin II as shown for the intermediate layer. It would also be logical to address if further increasing contractility in the intermediate layer would enhance the differentiation of these cells.ย 

      We agree with the reviewer that the development of additional tools to precisely control myosin activity will be of great use to the field. That said, our series of publications has clearly demonstrated that ablating microtubules results in increased contractility and that this phenocopies the effects of Arhgef11 induced contractility. Further, we showed that these phenotypes were rescued by myosin inhibition with blebbistatin. Our prior publications also showed a clear increase in junctional acto-myosin through expression of either spastin or Arhgef11, as well as increased staining for the tension sensitive epitope of alpha-catenin (alpha18).ย  We are not aware of tools that allow direct manipulation of myosin activity that currently exist in mouse models. ย 

      The gene expression analyses are relatively superficial and rely heavily on GO term analyses which are of course informative but do not give the reader a good sense of what kind of genes and transcriptional programs are regulated. It would be useful to show volcano plots or heatmaps of actual gene expression changes as well as to perform additional analyses of for example gene set enrichment and/or transcription factor enrichment analyses to better describe the transcriptional programsย 

      We have included an excel document that lists all the gene signatures. In addition, a volcano plot is included in the new Fig 2, Supplement 1. All our NGS data are deposited in GEO for others to perform these analyses. As the paper does not delve further into transcriptional regulation, we do not specifically present this information in the paper. ย 

      Claims of changes in cell division/proliferation changes are made exclusively by quantifying EdU incorporation. It would be useful to more directly look at mitosis. At minimum Y-axis labels should be changed from "% Dividing cells" to % EdU+ cells to more accurately represent findingsย 

      We changed the axis label to precisely match our analysis. We note that Figure 1, Supplement 1 also contains data on mitosis. ย 

      Despite these minor weaknesses the manuscript is overall of high quality, sheds new light on the fundamental mechanisms of epidermal stratification during embryogenesis, and will likely be of interest to the skin research community.ย 

      Reviewer #3 (Public review):ย 

      Summary:ย 

      This is an interesting paper by Lechler and colleagues describing the transcriptomic signature and fate of intermediate cells (ICs), a transient and poorly defined embryonic cell type in the skin. ICs are the first suprabasal cells in the stratifying skin and unlike later-developing suprabasal cells, ICs continue to divide. Using bulk RNA seq to compare ICs to spinous and granular transcriptomes, the authors find that IC-specific gene signatures include hallmarks of granular cells, such as genes involved in lipid metabolism and skin barrier function that are not expressed in spinous cells. ICs were assumed to differentiate into spinous cells, but lineage tracing convincingly shows ICs differentiate directly into granular cells without passing through a spinous intermediate. Rather, basal cells give rise to the first spinous cells. They further show that transcripts associated with contractility are also shared signatures of ICs and granular cells, and overexpression of two contractility inducers (Spastin and ArhGEF-CA) can induce granular and repress spinous gene expression. This contractility-induced granular gene expression does not appear to be mediated by the mechanosensitive transcription factor, Yap. The paper also identifies new markers that distinguish IC and spinous layers and shows the spinous signature gene, MafB, is sufficient to repress proliferation when prematurely expressed in ICs.ย 

      Strengths:ย 

      Overall this is a well-executed study, and the data are clearly presented and the findings convincing. It provides an important contribution to the skin field by characterizing the features and fate of ICs, a much-understudied cell type, at high levels of spatial and transcriptomic detail. The conclusions challenge the assumption that ICs are spinous precursors through compelling lineage tracing data. The demonstration that differentiation can be induced by cell contractility is an intriguing finding and adds a growing list of examples where cell mechanics influence gene expression and differentiation.ย 

      Weaknesses:ย 

      A weakness of the study is an over-reliance on overexpression and sufficiency experiments to test the contributions of MafB, Yap, and contractility in differentiation. The inclusion of loss-offunction approaches would enable one to determine if, for example, contractility is required for the transition of ICs to granular fate, and whether MafB is required for spinous fate. Second, whether the induction of contractility-associated genes is accompanied by measurable changes in the physical properties or mechanics of the IC and granular layers is not directly shown. The inclusion of physical measurements would bolster the conclusion that mechanics lies upstream of differentiation.ย 

      We agree that loss of function studies would be useful. For MafB, these have been performed in cultured human keratinocytes, where loss of MafB and its ortholog cMaf results in a phenotype consistent with loss of spinous differentiation (Pajares-Lopez et al, 2015). Due to the complex genetics involved, generating these double mutant mice is beyond the scope of this study. Loss of function studies of myosin are also complicated by genetic redundancy of the non-muscle type II myosin genes, as well as the role for these myosins in cell division and in actin cross linking in addition to contractility. In addition, we have found that these myosins are quite stable in the embryonic intestine, with loss of protein delayed by several days from the induction of recombination. Therefore, elimination of myosins by embryonic day e14.5 with our current drivers is not likely possible. Generation of inducible inhibitors of contractility is therefore a valuable future goal.ย 

      Several recent papers have used AFM of skin sections to probe tissue stiffness. We have not attempted these studies and are unclear about the spatial resolution and whether, in the very thin epidermis at these stages, we could spatially resolve differences. That said, we previously assessed the macro-contractility of tissues in which myosin activity was induced and demonstrated that there was a significant increase in this over a tissue-wide scale (Ning et al, Cell Stem Cell, 2021). ย 

      Finally, whether the expression of granular-associated genes in ICs provides them with some sort of barrier function in the embryo is not addressed, so the role of ICs in epidermal development remains unclear. Although not essential to support the conclusions of this study, insights into the function of this transient cell layer would strengthen the overall impact. ย 

      By traditional dye penetration assays, there is no epidermal barrier at the time that intermediate cells exist. One interpretation of the data is that cells are beginning to express mRNAs (and in some cases, proteins) so that they are able to rapidly generate a barrier as they become granular cells. In addition, many EDC genes, important for keratinocyte cornification and barrier formation, are not upregulated in ICs at E14.5. We have attempted experiments to ablate intermediate cells with DTA expression - these resulted in inefficient and delayed death and thus did not yield strong conclusions about the role of intermediate cells. Our findings that transcriptional regulators of granular differentiation (such as Grhl3 and Hopx) are also present in intermediate cells, should allow future analysis of the effects of their ablation on the earliest stages of granular differentiation from intermediate cells. In fact, previous studies have shown that Grhl3 null mice have disrupted barrier function at embryonic stages (Ting et al, 2005), supporting the role of ICs in being important for barrier formation. (?)

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):ย 

      Overall, this is a well-written manuscript and a comprehensive dataset. Only the RNA sequencing result should be presented more transparently providing the full lists of regulated genes instead of presenting just the GO analysis and selected target genes so that this analysis can serve as a useful repository. The authors themselves have profited from and used the published dataset of gene expression of the granular cells. Moreover, some of the previous data should be better discussed though. The authors state that forced suprabasal contractility in their mouse models induces the expression of some genes of the epidermal differentiation complex (EDC). However, in their previous publication, the authors showed that major classical EDC genes are actually not regulated like filaggrin and loricrin (Muroyama and Lechler eLife 2017). This should be discussed better and necessitates including the full list of regulated genes to show what exactly is regulated.ย 

      A general point regarding statistics throughout the manuscript. It seems like regular T-tests or ANOVAs have been used assuming Gaussian distribution for sample sizes below N=5 which is technically not correct. Instead, non-parametric tests like e.g. the Mann-Whitney test should be used. Since Graph-Pad was used for statistics according to the methods this is easy to change.ย 

      Figure 1:ย It would be good to show the FACS plot of the analyzed and sorted population in the supplementary figures.ย 

      If granular cells can be analyzed and detected by FACS, why were they not included in the RNA sequencing analysis?ย 

      Figure 1 supplement 1c: cell division numbers are analyzed from only 2 mice and the combined 5 or 4 fields of view are used for statistics using a test assuming normal distribution which is not really appropriate. Means per mice should be used or if accumulated field of views are used, the number should be increased using more stringent tests. Otherwise, the p-values here clearly overstate the significance.ย 

      Granular cells could not be specifically isolated in the approach we used. The lectin binds to both upper spinous and granular cells. For this reason, we relied on a separate granular gene list as described.

      For Figure 1 Supplement 1, we removed the statistical analysis and use it simply as a validation of the data in Figure 1. ย 

      Figure 2: It is not completely clear on which basis the candidate genes were picked. They are described to be the most enriched but how do they compare to the rest of the enriched genes. The full list of regulated genes should be provided.ย 

      Some markers for IC or granular layer are verified either by RNA scope or immunofluorescence. Is there a technical reason for that? It would be good to compare protein levels for all markers.ย  Figure 2-Supplement 1: There is no statement about the number of animals that these images are representative for.ย 

      We have included a volcano plot to show where the genes picked reside. We have also included the full gene lists for interested readers.ย 

      When validated antibodies were available, we used them. When they were not, we performed RNA-Scope to validate the RNA-Seq dataset.ย 

      We have included animal numbers in the revised Fig 2-Supplement 2 legend (previously Fig 2Supplement 1). ย 

      Figure 4b: It would be good to include the E16 spinous cells to get an idea of how much closer ICs are to the granular population.ย 

      We have included a new Venn diagram showing the overlap between each of the IC and spinous signatures with the granular cell signature in Fig 4B. Overall, 36% of IC signature genes are in common with granular cells, while just 20% of spinous genes overlap. ย 

      Reviewer #2 (Recommendations for the authors):ย 

      (1)ย  Figure 6B is confusing as y-axis is labeled as EdU+ suprabasal cells whereas basal cells are also quantified.ย 

      We have altered the y-axis title to make it clearer. ย 

      (2)ย  Not clear why HA-control is sometimes included and sometimes not.ย 

      We include the HA when it did not disrupt visualization of the loss of fluorescence. As it was uniform in most cases, we excluded it for clarity in some images. HA staining is now included in Fig 3C.

      (3)ย  The authors might reconsider the title as it currently is somewhat vague, to more precisely represent the content of the manuscript.ย 

      We thank the reviewer for the suggestion. We considered other options but felt that this gave an overview of the breadth of the paper. ย 

      Reviewer #3 (Recommendations for the authors):ย 

      (1)ย  ICs are shown to express Tgm1 and Abca12, important for cornified envelope function and formation of lamellar bodies. Do ICs provide any barrier function at E14.5?ย 

      By traditional dye penetration assays, there is no epidermal barrier at the time that intermediate cells exist. One interpretation of the data is that cells are beginning to express mRNAs (and in some cases, proteins) so that they are able to rapidly generate a barrier as they become granular cells. ย 

      (2)ย  Genes associated with contractility are upregulated in ICs and granular cells. And ICs have higher levels of F-actin, MyoIIA, alpha-18, and nuclear Yap. Does this correspond to a measurable difference in stiffness? Can you use AFM to compare to physical properties of ICs, spinous, and granular cells?ย 

      Several recent papers have used AFM of skin sections to probe tissue stiDness. We have not attempted these studies and are unclear about the spatial resolution and whether in the very thin epidermis at these stages whether we could spatially resolve diDerences. It is also important to note that this tissue rigidity is influenced by factors other than contractility. That said, we previously assessed the macro-contractility of tissues in which myosin activity was induced and demonstrated that there was a significant increase in this over a tissue-wide scale (Ning et al, Cell Stem Cell, 2021).

      (3)ย  Overexpression of two contractility inducers (spastin and ArhGEF-CA) can induce granular gene expression and repress spinous gene expression, suggesting differentiation lies downstream of contractility. Is contractility required for granular differentiation?ย 

      This is an important question and one that we hope to directly address in the future. Published studies have shown defects in tight junction formation and barrier function in myosin II mutants. However, a thorough characterization of differentiation was not performed. ย 

      (4)ย  ICs are a transient cell type, and it would be important to know what is the consequence of the epidermis never developing this layer. Does it perform an important temporary structural/barrier role, or patterning information for the skin?

      We have attempted experiments to ablate intermediate cells with DTA expression - this resulted in ineDicient and delayed death and thus did not yield strong conclusions. Our findings that transcriptional regulators of granular diDerentiation (such as Grhl3 and Hopx) are also present in intermediate cells, should allow future analysis of the eDects of their ablation on the earliest stages of granular diDerentiation from intermediate cells.

    1. eLife assessment

      This convincing study advances our understanding of the physiological consequences of the strong overexpression of non-toxic proteins in baker's yeast. The findings suggest that a massive protein burden results in nitrogen starvation and a shift in metabolism likely regulated via the TORC1 pathway, as well as defects in ribosome biogenesis in the nucleolus. The study presents findings and tools that are important for the cell biology and protein homeostasis fields.

    2. Reviewer #1 (Public Review):

      Summary:

      The study "Impact of Maximal Overexpression of a Non-toxic Protein on Yeast Cell Physiology" by Fujita et al. aims to elucidate the physiological impacts of overexpressing non-toxic proteins in yeast cells. By identifying model proteins with minimal cytotoxicity, the authors claim to provide insights into cellular stress responses and metabolic shifts induced by protein overexpression.

      Strengths:

      The study introduces a neutrality index to quantify cytotoxicity and investigates the effects of protein burden on yeast cell physiology. The study identifies mox-YG (a non-fluorescent fluorescent protein) and Gpm1-CCmut (an inactive glycolytic enzyme) as proteins with the lowest cytotoxicity, capable of being overexpressed to more than 40% of total cellular protein while maintaining yeast growth. Overexpression of mox-YG leads to a state resembling nitrogen starvation probably due to TORC1 inactivation, increased mitochondrial function, and decreased ribosomal abundance, indicating a metabolic shift towards more energy-efficient respiration and defects in nucleolar formation.

      Weaknesses:

      While the introduction of the neutrality index seems useful to differentiate between cytotoxicity and protein burden, the biological relevance of the effects of overexpression of the model proteins is unclear.

    3. Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Fujita et al. characterized the neutrality indexes of several protein mutants in S. cerevisiae and uncovered that mox-YG and Gpm1-CCmut can be expressed as abundant as 40% of total proteins without causing severe growth defects. The authors then looked at the transcriptome and proteome of cells expressing excess mox-YG to investigate how protein burden affects yeast cells. Based on RNA-seq and mass-spectrometry results, the authors uncover that cells with excess mox-YG exhibit nitrogen starvation, respiration increase, inactivated TORC1 response, and decreased ribosomal abundance. The authors further showed that the decreased ribosomal amount is likely due to nucleoli defects, which can be partially rescued by nuclear exosome mutations.

      Strengths:

      Overall, this is a well-written manuscript that provides many valuable resources for the field, including the neutrality analysis on various fluorescent proteins and glycolytic enzymes, as well as the RNA-seq and proteomics results of cells overexpressing mox-YG. Their model on how mox-YG overexpression impairs the nucleolus and thus leads to ribosomal abundance decline will also raise many interesting questions for the field.

      Weaknesses:

      The authors concluded from their RNA-seq and proteomics results that cells with excess mox-YG expression showed increased respiration and TORC1 inactivation. I think it will be more convincing if the authors can show some characterization of mitochondrial respiration/membrane potential and the TOR responses to further verify their -omic results.

      In addition, the authors only investigated how overexpression of mox-YG affects cells. It would be interesting to see whether overexpressing other non-toxic proteins causes similar effects, or if there are protein-specific effects. It would be good if the authors could at least discuss this point considering the workload of doing another RNA-seq or mass-spectrum analysis might be too heavy.

    4. Reviewer #3 (Public Review):

      Summary:

      Protein overexpression is widely used in experimental systems to study the function of the protein, assess its (beneficial or detrimental) effects in disease models, or challenge cellular systems involved in synthesis, folding, transport, or degradation of proteins in general. Especially at very high expression levels, protein-specific effects and general effects of a high protein load can be hard to distinguish. To overcome this issue, Fujita et al. use the previously established genetic tug-of-war system to identify proteins that can be expressed at extremely high levels in yeast cells with minimal protein-specific cytotoxicity (high 'neutrality'). They focus on two versions of the protein mox-GFP, the fluorescent version and a point mutation that is non-fluorescent (mox-YG) and is the most 'neutral' protein on their screen. They find that massive protein expression (up to 40% of the total proteome) results in a nitrogen starvation phenotype, likely inactivation of the TORC1 pathway, and defects in ribosome biogenesis in the nucleolus.

      Strengths:

      This work uses an elegant approach and succeeds in identifying proteins that can be expressed at surprisingly high levels with little cytotoxicity. Many of the changes they see have been observed before under protein burden conditions, but some are new and interesting. This work solidifies previous hypotheses about the general effects of protein overexpression and provides a set of interesting observations about the toxicity of fluorescent proteins (that is alleviated by mutations that render them non-fluorescent) and metabolic enzymes (that are less toxic when mutated into inactive versions).

      Weaknesses:

      The data are generally convincing, however in order to back up the major claim of this work - that the observed changes are due to general protein burden and not to the specific protein or condition - a broader analysis of different conditions would be highly beneficial.

      Major points:

      (1) The authors identify several proteins with high neutrality scores but only analyze the effects of mox/mox-YG overexpression in depth. Hence, it remains unclear which molecular phenotypes they observe are general effects of protein burden or more specific effects of these specific proteins. To address this point, a proteome (and/or transcriptome) of at least a Gpm1-CCmut expressing strain should be obtained and compared to the mox-YG proteome. Ideally, this analysis should be done simultaneously on all strains to achieve a good comparability of samples, e.g. using TMT multiplexing (for a proteome) or multiplexed sequencing (for a transcriptome). If feasible, the more strains that can be included in this comparison, the more powerful this analysis will be and can be prioritized over depth of sequencing/proteome coverage.

      (2) The genetic tug-of-war system is elegant but comes at the cost of requiring specific media conditions (synthetic minimal media lacking uracil and leucine), which could be a potential confound, given that metabolic rewiring, and especially nitrogen starvation are among the observed phenotypes. I wonder if some of the changes might be specific to these conditions. The authors should corroborate their findings under different conditions. Ideally, this would be done using an orthogonal expression system that does not rely on auxotrophy (e.g. using antibiotic resistance instead) and can be used in rich, complex mediums like YPD. Minimally, using different conditions (media with excess or more limited nitrogen source, amino acids, different carbon source, etc.) would be useful to test the robustness of the findings towards changes in media composition.

      (3) The authors suggest that the TORC1 pathway is involved in regulating some of the changes they observed. This is likely true, but it would be great if the hypothesis could be directly tested using an established TORC1 assay.

      (4) The finding that the nucleolus appears to be virtually missing in mox-YG-expressing cells (Figure 6B) is surprising and interesting. The authors suggest possible mechanisms to explain this and partially rescue the phenotype by a reduction-of-function mutation in an exosome subunit. I wonder if this is specific to the mox-YG protein or a general protein burden effect, which the experiments suggested in point 1 should address. Additionally, could a mox-YG variant with a nuclear export signal be expressed that stays exclusively in the cytosol to rule out that mox-YG itself interferes with phase separation in the nucleus?

      Minor points:

      (5) It would be great if the authors could directly compare the changes they observed at the transcriptome and proteome levels. This can help distinguish between changes that are transcriptionally regulated versus more downstream processes (like protein degradation, as proposed for ribosome components).

    5. Author Response:

      The following is the authorsโ€™ response to the original reviews.

      General response

      (1) Evaluation of mitochondrial activity in mox-YG overexpression cells

      To determine whether the observed โ€œmitochondrial developmentโ€ seen in transcriptomic, proteomic, and microscopic analyses corresponds to an actual phenotypic shift toward respiration, we measured oxygen consumption in mox-YG overexpression cells. The results showed that oxygen consumption rates were indeed elevated in these cells, suggesting a metabolic shift from fermentation toward respiration. These findings have been incorporated into the revised manuscript as new Figure 4E and Figure 4โ€”figure supplement 9, along with the corresponding descriptions in the Results section.

      (2) Evaluation of TORC1 Pathway Inactivation in mox-YG Overexpression Cells

      While the proteomic response in mox-YG overexpression cells overlapped with known responses to TORC1 pathway inactivation, we had not obtained direct evidence that TORC1 activity was indeed reduced. To address this, we assessed TORC1 activity by testing the effect of rapamycin, a TORC1 inhibitor, and by attempting to detect the phosphorylation state of known TORC1 targets. Our results showed that mox-YG overexpressing cells exhibited reduced sensitivity to rapamycin compared to vector control cells, supporting the idea that TORC1 is already inactivated in the mox-YG overexpression condition.

      In parallel, we attempted to detect phosphorylation of TORC1 targets Sch9 and Atg13 by Western blotting. Specifically, we tested several approaches: detecting phospho-Sch9 using a phospho-specific antibody, assessing the band shift of HA-tagged Sch9, and monitoring Atg13 band shift using an anti-Atg13 antibody. While we were unable to detect Sch9 phosphorylation, likely due to technical limitations, we finally succeeded in detecting Atg13 with the help of our new co-author, Dr. Kamada. However, we observed a marked reduction in Atg13 protein levels in mox-YG overexpression cells, making it difficult to interpret the biological significance of any apparent decrease in phosphorylation. Therefore, we decided not to pursue further experiments on TORC1 phosphorylation within the current revision period.

      These findings have been summarized in new Figure 4โ€”figure supplement 7, and the relevant description has been added to the Results section.

      (3) Phenotypes of Gpm1-CCmut

      We focused our initial analysis on the phenotypes of cells overexpressing mox-YG, the protein with the lowest Neutrality Index (NI) in our dataset, as a model of protein burden. However, it remained unclear to what extent the phenotypes observed in mox-YG overexpression cells are generalizable to protein burden as a whole. We agree with the reviewersโ€™ suggestion that it is important to examine whether similar phenotypes are also observed in cells overexpressing Gpm1-CCmut, which was newly identified in this study as having a similarly low NI. We therefore performed validation experiments using Gpm1-CCmut overexpression cells to assess whether they exhibit the characteristic phenotypes observed in mox-YG overexpression cells. These phenotypes included: transcriptional responses, mitochondrial development, metabolic shift toward respiration, and nucleolar shrinkage.

      As a result, mitochondrial development and nucleolar shrinkage were also observed in Gpm1-CCmut overexpression cells, consistent with mox-YG. In contrast, the transcriptional response associated with amino acid starvation and the metabolic shift toward respiration were not observed. Furthermore, an abnormal rounding of cell morphologyโ€”absent in mox-YG overexpression cellsโ€”was uniquely observed in Gpm1-CCmut cells. These results suggest that the phenotypes observed under mox-YG overexpression may comprise both general effects of protein burden and effects specific to the mox-YG protein. Alternatively, it is possible that Gpm1-CCmut imposes a different kind of constraint or toxicity not shared with mox-YG. In any case, these findings highlight that the full range of phenotypes associated with protein burden cannot yet be clearly defined and underscore the need for future analyses using a variety of โ€œnon-toxicโ€ proteins.

      Given that these results form a coherent set, we have relocated original Figure 3โ€”which previously presented the NI values of Gpm1 and Tdh3 in the original versionโ€”to new Figure 6, which now includes all related phenotypic analyses. Correspondingly, we have added new Figures 6โ€”figure supplement 1 through 6โ€”figure supplement 7. The associated results have been incorporated into the Results section, and we have expanded the Discussion to address this point

      As a result of these revisions, the order of figures has changed from the original version. The correspondence between the original and revised versions is as follows:

      originalโ†’ Revised

      Figure 1 โ†’ Figure 1<br /> ย Figure 2 โ†’ Figure 2<br /> ย Figure 3 โ†’ Figure 6<br /> ย Figure 4 โ†’ Figure 3<br /> ย Figure 5 โ†’ Figure 4<br /> ย Figure 6 โ†’ Figure 5

      Public Reviews:

      Reviewer #1 (Public Review):

      Weaknesses:

      While the introduction of the neutrality index seems useful to differentiate between cytotoxicity and protein burden, the biological relevance of the effects of overexpression of the model proteins is unclear.

      Thank you for your comment. This point is in fact the core message we wished to convey in this study. We believe that every protein possesses some degree of what can be described as โ€œcytotoxicity,โ€ and that this should be defined by the expression limitโ€”specifically, the threshold level at which growth inhibition occurs. This index corresponds to what we term the neutrality index. We further argue that protein cytotoxicity arises from a variety of constraints inherent to each protein. These constraints act in a stepwise manner to determine the expression limit (i.e., the neutrality) of a given protein (Figure 1A). To demonstrate the real existence of such constraints, there are two complementary approaches: an inductive one that involves large-scale, systematic investigation of naturally occurring proteins, and a deductive one that tests hypotheses using selected model proteins. Our current study follows the latter approach. In addition, we define protein burden as a phenomenon that can only be elicited by proteins that are ultimately harmless (Figure 1B). We assume that such burden results in a shared physiological state, such as depletion of cellular resources. Through continued efforts to identify a protein suitable for investigating this phenomenon, we eventually arrived at mox-YG. As the reviewer rightly pointed out, examining only mox-YG does not reveal the full picture of protein burden. In fact, in response to the reviewerโ€™s suggestion, we investigated the physiological consequences of overexpressing a mutant glycolytic protein, Gpm1-CCmut (General Response 3). We found that the resulting phenotype was notably different from that observed in cells overexpressing mox-YG. Going forward, we believe that our study provides a foundation for further systematic exploration of โ€œharmless proteinsโ€ and the cellular impacts of their overexpression.

      Reviewer #2 (Public Review):

      Weaknesses:

      The authors concluded from their RNA-seq and proteomics results that cells with excess mox-YG expression showed increased respiration and TORC1 inactivation. I think it will be more convincing if the authors can show some characterization of mitochondrial respiration/membrane potential and the TOR responses to further verify their -omic results.

      These points are addressed in General Response 1 and 2.

      In addition, the authors only investigated how overexpression of mox-YG affects cells. It would be interesting to see whether overexpressing other non-toxic proteins causes similar effects, or if there are protein-specific effects. It would be good if the authors could at least discuss this point considering the workload of doing another RNA-seq or mass-spectrum analysis might be too heavy.

      These points are addressed in General Response 3.

      Reviewer #3 (Public Review):

      Weaknesses:

      The data are generally convincing, however in order to back up the major claim of this work - that the observed changes are due to general protein burden and not to the specific protein or condition - a broader analysis of different conditions would be highly beneficial.

      These points are addressed in General Response 3.

      Major points:

      (1) The authors identify several proteins with high neutrality scores but only analyze the effects of mox/mox-YG overexpression in depth. Hence, it remains unclear which molecular phenotypes they observe are general effects of protein burden or more specific effects of these specific proteins. To address this point, a proteome (and/or transcriptome) of at least a Gpm1-CCmut expressing strain should be obtained and compared to the mox-YG proteome. Ideally, this analysis should be done simultaneously on all strains to achieve a good comparability of samples, e.g. using TMT multiplexing (for a proteome) or multiplexed sequencing (for a transcriptome). If feasible, the more strains that can be included in this comparison, the more powerful this analysis will be and can be prioritized over depth of sequencing/proteome coverage.

      This comment has been addressed in General Response 3. Gpm1-CCmut overexpression cells exhibited both phenotypes that were shared with, and distinct from, those observed in mox-YG overexpression cells. To define a unified set of phenotypes associated with "protein burden," we believe that extensive omics analyses targeting multiple "non-toxic" protein overexpression strains will be necessary. However, such an effort goes beyond the scope of the current study, and we would like to leave it as an important subject for future investigation.

      (2) The genetic tug-of-war system is elegant but comes at the cost of requiring specific media conditions (synthetic minimal media lacking uracil and leucine), which could be a potential confound, given that metabolic rewiring, and especially nitrogen starvation are among the observed phenotypes. I wonder if some of the changes might be specific to these conditions. The authors should corroborate their findings under different conditions. Ideally, this would be done using an orthogonal expression system that does not rely on auxotrophy (e.g. using antibiotic resistance instead) and can be used in rich, complex mediums like YPD. Minimally, using different conditions (media with excess or more limited nitrogen source, amino acids, different carbon source, etc.) would be useful to test the robustness of the findings towards changes in media composition.

      We appreciate the reviewerโ€™s clear understanding of both the advantages and limitations of the gTOW system. As rightly pointed out, since our system relies on leucine depletion, it is essential to carefully consider the potential impact this may have on cellular metabolism. Another limitationโ€”though it also serves as one of the strengthsโ€”of the gTOW system is its reliance on copy number variation to achieve protein overexpression. This feature limits the possibility of observing rapid responses, as immediate induction is not feasible. To address this issue, we have recently developed a strong and inducible promoter that minimizes effects on other metabolic systems (Higuchi et al., 2024), and we believe this tool will be essential in future experiments.

      In response to the reviewerโ€™s comments, we conducted two additional sets of experiments. First, we established a new overexpression system in nutrient-rich conditions (YPD medium) that is conceptually similar to gTOW but uses aureobasidin A and the AUR1d resistance gene to promote gene amplification (new Figure 4โ€”figure supplement 2). Using this system, we observed that non-fluorescent YG mutants led to increased expression of mox. Total protein levels appeared to rise correspondingly, suggesting that the overall synthetic capacity of cells might be higher in YPD compared to SC medium. However, the degree of overexpression achieved in this system was insufficient to strongly inhibit growth, meaning we could not replicate the stress conditions observed with the original gTOW system. Further studies will be needed to determine whether stronger induction under these nutrient-rich conditions will yield comparable responses.

      Second, we performed a control experiment to examine whether the amino acid starvation response observed in mox-YG overexpressing cells could be attributed to leucine depletion from the medium (new Figure 3โ€”figure supplement 3). By titrating leucine concentrations in SC medium, we confirmed that lower leucine levels reduced the growth rate of vector control cells, indicating leucine limitation. However, GAP1 induction was not observed under these conditions. In contrast, mox-YG overexpression led to strong GAP1 induction under similar growth-inhibitory conditions, suggesting that the amino acid starvation response is not simply due to environmental leucine depletion, but rather a consequence of the cellular burden imposed by mox-YG overexpression.

      These findings have been incorporated into the manuscript, along with the corresponding figures (new Figure 4โ€”figure supplement 2, Figure 3โ€”figure supplement 3), and relevant descriptions have been added to the Results and Discussion sections.

      (3) The authors suggest that the TORC1 pathway is involved in regulating some of the changes they observed. This is likely true, but it would be great if the hypothesis could be directly tested using an established TORC1 assay.

      This comment has been addressed in General Response 2. We assessed the rapamycin sensitivity of mox-YG overexpression cellsโ€”which was found to be reducedโ€”and attempted to detect phosphorylation of the TORC1 target Atg13, although the latter was only partially successful. These findings have been incorporated into the Results section.

      (4) The finding that the nucleolus appears to be virtually missing in mox-YG-expressing cells (Figure 6B) is surprising and interesting. The authors suggest possible mechanisms to explain this and partially rescue the phenotype by a reduction-of-function mutation in an exosome subunit. I wonder if this is specific to the mox-YG protein or a general protein burden effect, which the experiments suggested in point 1 should address. Additionally, could a mox-YG variant with a nuclear export signal be expressed that stays exclusively in the cytosol to rule out that mox-YG itself interferes with phase separation in the nucleus?

      As also described in our General Response 3, we observed nucleolar shrinkage upon Gpm1-CCmut overexpression as well (new Figure 6E and 6โ€”figure supplement 7), suggesting that this phenomenon may represent a general feature of protein burden. The reviewerโ€™s suggestion to test whether this effect persists when mox-YG is excluded from the nucleus is indeed intriguing. However, based on our previous work, we have shown that overexpression of NES-tagged proteins (e.g., NES-EGFP) causes severe growth inhibition due to depletion of nuclear export factors (Kintaka et al., 2020). Unfortunately, this technical limitation makes it difficult for us to carry out the proposed experiment as suggested.

      Minor points:

      (5) It would be great if the authors could directly compare the changes they observed at the transcriptome and proteome levels. This can help distinguish between changes that are transcriptionally regulated versus more downstream processes (like protein degradation, as proposed for ribosome components).

      We also considered this point to be important, and therefore compared the transcriptomic and proteomic changes associated with mox-YG overexpression. However, somewhat unexpectedly, we found little correlation between these two layers of response. As shown in new Figure 3 and 4 (original Figures 4 and 5), while genes related to oxidative phosphorylation were consistently upregulated at both the mRNA and protein levels in mox-YG overexpressing cells, ribosomal proteins showed a discordant pattern: their mRNA levels were significantly increased, whereas their protein levels were significantly decreased.

      Several factors may explain this discrepancy: (1) differences in analytical methods between transcriptomics and proteomics; (2) temporal mismatches arising from the dynamic changes in mRNA and protein expression during batch culture; and (3) the possibility that, under protein burden conditions, specific regulatory mechanisms may govern the selective translation or targeted degradation of certain proteins. However, at this point, we were unable to clearly determine which of these factors account for the observed differences.

      For this reason, we did not originally include a global transcriptomeโ€“proteome comparison in the manuscript. In response to the reviewerโ€™s comment, however, we have now included the comparison data (new Figure 4โ€”figure supplement 3D).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major points:

      (1) While the study provides a detailed description of physiological changes, the underlying mechanisms remain speculative. For example, the exact reasons for nitrogen source depletion or increased respiration are unclear. The transcriptomic and proteomic data should be complemented by basic growth assay tests on rapamycin or glycerol to strengthen these observations.

      This comment has been addressed in General Responses 1 and 2. We conducted oxygen consumption assays and growth assays in the presence of rapamycin, and incorporated these results into the revised version of the manuscript.

      We also performed culture experiments using glycerol as a carbon source. However, both the vector control and mox-YG overexpression cells showed extremely poor growth. Although there was a slight difference between the two, we judged that it would be difficult to draw any meaningful conclusions from these results. Therefore, we have chosen not to include them in the main text (the data are attached below for reference).

      Author response image 1.

      (2) The study mainly focuses on two proteins, mox-YG/ FP proteins and Gpm1-CCmut. Did the authors look also at a broader range of proteins with varying degrees of cytotoxicity to validate the neutrality index and generalize their findings? Such as known cytotoxic proteins.

      In our calculation of the Neutrality Index (NI), we use two parameters: the maximum growth rate (expressed as %MGR relative to the control) and the protein expression level. For the latter, we measure the abundance of the overexpressed protein as a percentage of total cellular protein, based on the assumption that the protein is expressed at a sufficiently high level to be detectable by SDS-PAGE. In our view, proteins typically regarded as โ€œcytotoxicโ€ cannot be overexpressed to levels detectable by SDS-PAGE without the use of more sensitive techniques such as Western blotting. This limitation in expression itself is an indication of their high cytotoxicity. Consequently, for such proteins, NI is determined solely by the MGR value, and will inherently fall below 100.

      To test whether this interpretation is valid, we re-evaluated a group of EGFP variants previously reported by us to exhibit higher cytotoxicity than EGFP (Kintaka et al., 2016), due to overloading of specific cellular transport pathways. These include EGFPs tagged with localization signals. At the time of the original study, we had not calculated their NI values. Upon re-analysis, we found that all of these localization-tagged EGFP variants indeed have NI values below 100.

      This result has been included as a new Figure 2โ€”figure supplement 3, and the relevant descriptions have been added to the Results section.

      (3) The partial rescue of ribosomal biosynthesis defects by a mutation in the nuclear exosome is intriguing but not fully explored. The specific role of the nuclear exosome in managing protein burden remains unclear. This result could be supported by alternative experiments. For example, would tom1 deletion or proteasome inhibition (degradation of ribosomal proteins in the nucleus) partially rescue the nuclear formation?

      As described in the main text, our interest in exosome mutants was prompted by our previous SGA (Synthetic Genetic Array) analysis, in which these mutants exhibited positive genetic interactions with GFP overexpressionโ€”namely, they acted in a rescuing manner (Kintaka et al., 2020). In contrast, proteasome mutants did not show such positive interactions in the same screening. On the contrary, proteasome mutants that displayed negative genetic interactions have been identified, such as the pre7ts mutant. Furthermore, the proteasome is involved in various aspects of proteostasis beyond just orphan ribosomal proteins, making the interpretation of its effects potentially quite complex.

      Regarding the TOM1 mutant raised by the reviewer, we attempted to observe nucleolar morphology using the NSR1-mScarlet-I marker in the tom1ฮ” deletion strain. However, we were unsuccessful in constructing the strain. This failure may be due to the strong detrimental effects of this perturbation in the tom1ฮ” background. As we were unable to complete this experiment within the revision period, we would like to address this issue in future work.

      Minor comments:

      (1) It would be interesting to include long-term cellular and evolutionary responses to protein overexpression to understand how cells adapt to chronic protein burden.

      Thank you for the suggestion. We are currently conducting experiments related to these points. However, as they fall outside the scope of the present study, we would like to refrain from including the data in this manuscript.

      (2) The microscopy of Nsr1 in Figure 6G does not clearly demonstrate the restored formation of the nucleolus in the mrt4-1 mutant. Electron microscopy images would be a better demonstration.

      The restoration of nucleolar size in the mtr4-1 mutant, as shown in Figure 5โ€”figure supplement 5 (original Figure 6_S5), is statistically significant. However, as described in the main text, the degree of rescue by the mutation is partial, and, as the reviewer notes, not clearly distinguishable by eye. It becomes apparent only when analyzing a large number of cells, allowing for detection as a statistically significant difference. Given that electron microscopy images are inherently limited in the number of cells that can be analyzed and pose challenges for statistical evaluation, we believe it would be difficult to detect such a subtle difference using this method. Therefore, we respectfully ask for your understanding that we will not include additional EM experiments in this revision.

      (3) On page 24, line 451 it says that of the 84 ribosomal proteins... latest reviews and structures described/ identified 79 ribosomal proteins in budding yeast of which the majority are incorporated into the pre-ribosomal particles in the nucleolus. We could not find this information in the provided reference. Please align with the literature.

      Thank you for the comment. In S. cerevisiae, many ribosomal protein genes are duplicated due to gene duplication events, resulting in a total of 136 ribosomal proteins (http://ribosome.med.miyazaki-u.ac.jp/rpg.cgi?mode=genetable). However, not all of them are duplicated, and among the duplicated pairs, some can be distinguished by proteomic analysis based on differences in amino acid sequences, while others cannot. As a result, we report that 84 ribosomal proteins were โ€œdetectedโ€ in our proteomic analysis. To avoid confusion, we have added the following explanation to the legend of Figure 5โ€”figure supplement 1 (original Figure 6_S1), as follows.

      โ€œNote that when the amino acid sequences of paralogs are identical, they cannot be distinguished by proteomic analysis, and the protein abundance of both members of the paralog pair is represented under the name of only one.โ€

      Reviewer #2 (Recommendations for the authors):

      (1) The authors mentioned that based on their proteomics results, overexpressing mox-YG appears to increase respiration. I think it is worth doing some quick verification, such as oxygen consumption experiments or mitochondrial membrane potential staining to provide some verification on that.

      This comment has been addressed in General Response 1. We measured oxygen consumption in mox-YG overexpression cells and found that it was indeed elevated, suggesting a metabolic shift from fermentation toward aerobic respiration.

      (2) Similar to point 1, the authors concluded from their proteomics data that the mox-YG overexpression induced responses that are similar to TORC1 inactivation. It might be worth testing whether there is any actual TORC1 inactivation, e.g. by detecting whether there is reduced Sch9 phosphorylation by western blot.

      This comment has been addressed in General Response 2. We assessed the rapamycin sensitivity of mox-YG overexpression cellsโ€”which was found to be reducedโ€”and attempted to detect phosphorylation of the TORC1 target Atg13, although the latter was only partially successful. These findings have been incorporated into the Results section.

      (3) The authors showed that overexpressing excess mox-YG caused downregulated glycolysis pathways. It is worth discussing whether overexpressing glycolysis-related non-toxic proteins such as Gpm1-CCmut will also lead to similar results.

      This comment has been addressed in General Response 3. Gpm1-CCmut overexpression cells exhibited both phenotypes shared with mox-YG overexpression and distinct ones. These findings suggest that a unified set of phenotypes associated with "protein burden" has yet to be clearly defined, and further investigation will be necessary to elucidate this.

      Reviewer #3 (Recommendations for the authors):

      (1) The authors identify several proteins with high neutrality scores but only analyze the effects of mox/mox-YG overexpression in depth. Hence, it remains unclear which molecular phenotypes they observe are general effects of protein burden or more specific effects of these specific proteins. To address this point, a proteome (and/or transcriptome) of at least a Gpm1-CCmut expressing strain should be obtained and compared to the mox-YG proteome. Ideally, this analysis should be done simultaneously on all strains to achieve a good comparability of samples, e.g. using TMT multiplexing (for a proteome) or multiplexed sequencing (for a transcriptome). If feasible, the more strains that can be included in this comparison, the more powerful this analysis will be and can be prioritized over depth of sequencing/proteome coverage.

      This comment has been addressed in General Response 3. Gpm1-CCmut overexpression cells exhibited both phenotypes that were shared with, and distinct from, those observed in mox-YG overexpression cells. To define a unified set of phenotypes associated with "protein burden," we believe that extensive omics analyses targeting multiple "non-toxic" protein overexpression strains will be necessary. However, such an effort goes beyond the scope of the current study, and we would like to leave it as an important subject for future investigation.

      (2) The genetic tug-of-war system is elegant but comes at the cost of requiring specific media conditions (synthetic minimal media lacking uracil and leucine), which could be a potential confound, given that metabolic rewiring, and especially nitrogen starvation are among the observed phenotypes. I wonder if some of the changes might be specific to these conditions. The authors should corroborate their findings under different conditions. Ideally, this would be done using an orthogonal expression system that does not rely on auxotrophy (e.g. using antibiotic resistance instead) and can be used in rich, complex mediums like YPD. Minimally, using different conditions (media with excess or more limited nitrogen source, amino acids, different carbon source, etc.) would be useful to test the robustness of the findings towards changes in media composition.

      We appreciate the reviewerโ€™s clear understanding of both the advantages and limitations of the gTOW system. As rightly pointed out, since our system relies on leucine depletion, it is essential to carefully consider the potential impact this may have on cellular metabolism. Another limitationโ€”though it also serves as one of the strengthsโ€”of the gTOW system is its reliance on copy number variation to achieve protein overexpression. This feature limits the possibility of observing rapid responses, as immediate induction is not feasible. To address this issue, we have recently developed a strong and inducible promoter that minimizes effects on other metabolic systems (Higuchi et al., 2024), and we believe this tool will be essential in future experiments.

      In response to the reviewerโ€™s comments, we conducted two additional sets of experiments. First, we established a new overexpression system in nutrient-rich conditions (YPD medium) that is conceptually similar to gTOW but uses aureobasidin A and the AUR1d resistance gene to promote gene amplification (new Figure 4โ€”figure supplement 2). Using this system, we observed that non-fluorescent YG mutants led to increased expression of mox. Total protein levels appeared to rise correspondingly, suggesting that the overall synthetic capacity of cells might be higher in YPD compared to SC medium. However, the degree of overexpression achieved in this system was insufficient to strongly inhibit growth, meaning we could not replicate the stress conditions observed with the original gTOW system. Further studies will be needed to determine whether stronger induction under these nutrient-rich conditions will yield comparable responses.

      Second, we performed a control experiment to examine whether the amino acid starvation response observed in mox-YG overexpressing cells could be attributed to leucine depletion from the medium (new Figure 3โ€”figure supplement 3). By titrating leucine concentrations in SC medium, we confirmed that lower leucine levels reduced the growth rate of vector control cells, indicating leucine limitation. However, GAP1 induction was not observed under these conditions. In contrast, mox-YG overexpression led to strong GAP1 induction under similar growth-inhibitory conditions, suggesting that the amino acid starvation response is not simply due to environmental leucine depletion, but rather a consequence of the cellular burden imposed by mox-YG overexpression.

      These findings have been incorporated into the manuscript, along with the corresponding figures (new Figure 4โ€”figure supplement 2, Figure 3โ€”figure supplement 3), and relevant descriptions have been added to the Results and Discussion sections.

      (3) The authors suggest that the TORC1 pathway is involved in regulating some of the changes they observed. This is likely true, but it would be great if the hypothesis could be directly tested using an established TORC1 assay.

      This comment has been addressed in General Response 2. We assessed the rapamycin sensitivity of mox-YG overexpression cellsโ€”which was found to be reducedโ€”and attempted to detect phosphorylation of the TORC1 target Atg13, although the latter was only partially successful. These findings have been incorporated into the Results section.

      (4) The finding that the nucleolus appears to be virtually missing in mox-YG-expressing cells (Figure 6B) is surprising and interesting. The authors suggest possible mechanisms to explain this and partially rescue the phenotype by a reduction-of-function mutation in an exosome subunit. I wonder if this is specific to the mox-YG protein or a general protein burden effect, which the experiments suggested in point 1 should address. Additionally, could a mox-YG variant with a nuclear export signal be expressed that stays exclusively in the cytosol to rule out that mox-YG itself interferes with phase separation in the nucleus?

      As also described in our General Response 3, we observed nucleolar shrinkage upon Gpm1-CCmut overexpression as well (new Figure 6E and 6โ€”figure supplement 7), suggesting that this phenomenon may represent a general feature of protein burden. The reviewerโ€™s suggestion to test whether this effect persists when mox-YG is excluded from the nucleus is indeed intriguing. However, based on our previous work, we have shown that overexpression of NES-tagged proteins (e.g., NES-EGFP) causes severe growth inhibition due to depletion of nuclear export factors (Kintaka et al., 2020). Unfortunately, this technical limitation makes it difficult for us to carry out the proposed experiment as suggested.

      (5) It would be great if the authors could directly compare the changes they observed at the transcriptome and proteome levels. This can help distinguish between changes that are transcriptionally regulated versus more downstream processes (like protein degradation, as proposed for ribosome components).

      We also considered this point to be important, and therefore compared the transcriptomic and proteomic changes associated with mox-YG overexpression. However, somewhat unexpectedly, we found little correlation between these two layers of response. As shown in new Figure 3 and 4 (original Figures 4 and 5), while genes related to oxidative phosphorylation were consistently upregulated at both the mRNA and protein levels in mox-YG overexpressing cells, ribosomal proteins showed a discordant pattern: their mRNA levels were significantly increased, whereas their protein levels were significantly decreased.

      Several factors may explain this discrepancy: (1) differences in analytical methods between transcriptomics and proteomics; (2) temporal mismatches arising from the dynamic changes in mRNA and protein expression during batch culture; and (3) the possibility that, under protein burden conditions, specific regulatory mechanisms may govern the selective translation or targeted degradation of certain proteins. However, at this point, we were unable to clearly determine which of these factors account for the observed differences.

      For this reason, we did not originally include a global transcriptomeโ€“proteome comparison in the manuscript. In response to the reviewerโ€™s comment, however, we have now included the comparison data (new Figure 4โ€”figure supplement 3D).

      Minor points:

      (1) The authors repeatedly state that 'mitochondrial function' is increased. This is inaccurate in two ways: first, mitochondria have multiple functions, and it should be specified which one is referred to (probably mitochondrial respiration); second, the claim is based solely on the abundance of transcripts/proteins, which may or may not reflect increased activity.

      The authors should either perform functional tests (e.g. measure oxygen consumption or extracellular acidification), or change their wording to more accurately reflect the findings.

      To more directly reflect our findings, we revised two instances of the phrase โ€œmitochondrial functionโ€ to โ€œmitochondrial proteinsโ€ in the manuscript. Furthermore, as described in General Response 1, we confirmed that oxygen consumption is elevated in mox-YG overexpression cells. This observation suggests that mitochondrial respiratory activity is indeed enhanced under these conditions.

      (2) Similarly, the authors state that FPs are 'not localized' (e.g. line 137). This should be specified (e.g. 'not actively sorted into cellular compartments other than the cytosol').

      As pointed out by the reviewer, we have revised the relevant sections accordingly.

      (3) In Figure 4D, some of the reporter assays don't fully recapitulate the RNAseq findings (e.g. for PHO84 and ZPS1, where mox-FS and mox-YG behave differently in the reporter assay, but not in the RNAseq data). This may stem from technical limitations given that the reporter assay relies on RFP expression which could generally be affected by protein overexpression (cf. ACT1pro in mox-FS), but it should be mentioned in the text.

      We apologize for the confusion caused by our insufficient explanation of "moxFS" in new Figure 3D (original Figure 4D). As clarified here, "moxFS" refers to a frameshift mutant in which the mRNA is transcribed but the protein is not translated due to an early frameshift mutation. This is not a functional mox protein. The behavior of this mutant is nearly identical to that of the vector control, indicating that the transcriptional response observed in this assay is not triggered by mRNA expression itself, but rather by events occurring after protein synthesis begins. Importantly, the transcriptional responses identified by RNA-seq in mox-YG overexpression cells are largely recapitulated by this reporter assay, supporting the reliability of our experimental design.

      We appreciate the reviewerโ€™s comment, which helped us recognize the lack of clarity in our original description. In response, we have added an explanation of the FS mutation to the figure legend (new Figure 3D), and we have also expanded the description of the moxFS experimental results in the Results section.

    1. eLife Assessment

      The authors attempt to identify which patients with benign lesions will progress to cancer using a liquid biomarker. Although the study is valuable, the evidence provided for the liquid biopsy EV miRNA signature developed based on radiomics features remains incomplete. There remain key details missing and validation experiments that would better support the conclusions of the study.

    2. Reviewer #1 (Public review):

      Summary:

      The study aimed to develop a liquid biopsy EV miRNA signature associated with radiomics features for early diagnosis of pancreatic cancer. Flawed study design and inadequate description of clinical characteristics of the enrolled samples makes the findings unconvincing.

      Strengths:

      The concept of developing EV miRNA signature associated with disease relevant radiomics features is a strength.

      Weaknesses:

      There are many weaknesses in this manuscript, which include drawing association of data derived from unmatched sample sets, selection of low abundance miRNAs for developing the signature with inadequate rationale, incomplete description of experimental methods and confusing statements in the text.

    3. Reviewer #2 (Public review):

      Summary:

      This study investigates a low abundance microRNA signature in extracellular vesicles to subtype pancreatic cancer and for early diagnosis. In this revision, there remain several major and minor issues.

      Strengths:

      The authors did a comprehensive job with numerous analyses of moderately sized cohorts to describe the clinical and translational significance of their miRNA signature.

      Weaknesses:

      The weaknesses of the study largely revolve around a lack of clarity about the methodology used and the validation of their findings.

      (1) The WGCNA analysis was critical to identify the EV miRNAs associated with imaging features, but the "cut-off criteria" for MM and GS have no clear justification. How were these cut-offs determined? How sensitive were the results to these cut-offs?

      (2) The authors now clarify that patients for the sub-study on differentiating early stage from benign pancreatic lesions were matched by age and that the benign pancreatic lesions were predominantly IPMNs. This scientific design is flawed. The CT features extracted likely differentiate solid from cystic pancreatic lesions, and the miRNA signature is doing the same. The authors need to incorporate the following benign controls into their imaging analysis and their EV miRNA analysis: pancreatitis and normal pancreata.

      (3) For the radiomics features, the authors should include an additional external validation set to better support the ability to use these features reproducibly, especially given that the segmentation was manual and reliant on specific people.

      (4) The DF selection process still lacks cited references as originally requested in the first review.

      (5) In Figure 2, more quantitative details are needed in the manuscript. The reviewers failed to incorporate this and only responded in their rebuttal. Add details to the manuscript as originally requested.

      (6) It is still not clear what Figure 4A is illustrating as regards to model performance. The authors need to state in the manuscript very clearly what they are showing in the figure and what the modules represent.

      (7) Figure 5 and the descriptions for the public serum miRNA datasets need more details. Were these pancreatic cancers all adenocarcinoma, what stage, age range, sex distribution, comorbid conditions were the cases? Were the controls all IPMNs or were there other conditions in the controls?

      (8) The subtype results in figures 6 and 7 are not convincing. An association on univariate analysis is not sufficient. The explanation that clinical data is not available to do a multivariable analysis indicates that the authors do not have the ability to claim that they have identified unique subtypes that have clinical relevance. A thorough evaluation of the prognostic significance and the associated molecular features of these tumors is needed.

      Summary:

      There remain key details and validation experiments to better support the conclusions of the study.

    4. Author Response:

      The following is the authorsโ€™ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Shi et al, has utilized multiple imaging datasets and one set of samples for analyzing serum EV-miRNAs & EV-RNAs to develop an EV miRNA signature associated with disease-relevant radiomics features for early diagnosis of pancreatic cancer. CT imaging features (in two datasets (UMMD & JHC and WUH) were derived from pancreatic benign disease patients vs pancreatic cancer cases), while circulating EV miRNAs were profiled from samples obtained from a different center (DUH). The EV RNA signature from external public datasets (GSE106817, GSE109319, GSE113486, GSE112264) were analyzed for differences in healthy controls vs pancreatic cancer cases. The miRNAs were also analyzed in the TCGA tissue miRNA data from normal adjacent tissue vs pancreatic cancer.

      Strengths:

      The concept of developing EV miRNA signatures associated with disease relevant radiomics features is a strength.

      Weaknesses:

      While the overall concept of developing EV miRNA signature associated with radiomics features is interesting, the findings reported are not convincing for the reasons outlined below:

      (1) Discrepant datasets for analyzing radiomic features with EV-miRNAs: It is not justified how CT images (UMMD & JHC and WUH) and EV-miRNAs (DUH) on different subjects and centers/cohorts shown in Figures 1 &2 were analyzed for association. It is stated that the samples were matched according to age but there is no information provided for the stages of pancreatic cancer and the kind of benign lesions analyzed in each instance.

      Thank you to the reviewer for the valuable comments. We acknowledge that the radiomics data and EV-miRNA data were derived from different patient cohorts. The primary aim of this study was to explore the integration of data from different omics sources in an exploratory manner to identify potential shared biological features.

      We have revised the Methods section accordingly. Regarding the imaging data, we mainly performed batch effect correction on CT images from different centers to eliminate variability. As you correctly pointed out, the EV-miRNA data and CT images from DUH were matched by age. Since all the patients we included had early-stage pancreatic cancer, and the benign pancreatic lesions were predominantly IPMN, we did not specifically highlight this aspect. However, we have now clarified this approach in the data collection section. Thank you for your attention.

      (2) The study is focused on low-abundance miRNAs with no adequate explanation of the selection criteria for the miRNAs analyzed.

      We used MAD (Median Absolute Deviation) to filter low-abundance miRNAs in the manuscript, as this concept was introduced by us for the first time in this context, and we acknowledge that there is still considerable room for refinement and improvement.

      (3) While EV-miRNAs were profiled or sequenced (not well described in the Methods section) with two different EV isolation methods, the authors used four public datasets of serum circulating miRNAs to validate the findings. It would be better to show the expression of the three miRNAs in the additional dataset(s) of EV-miRNAs and compare the expressions of the three EV-miRNAs in pancreatic cancer with healthy and benign disease controls.

      Thank you for your suggestion. We have attempted to identify available EV-miRNA datasets; however, due to current limitations in data access, we opted to use serum samples for validation. In our follow-up studies, we are already in the process of collecting relevant EV samples for further validation.

      (4) It is not clear how the 12 EV-miRNAs in Figure 4C were identified.

      These 12 EV-miRNAs were identified through WGCNA analysis and are associated with the high-risk group.

      (5) Box plots in Figures 4D-F and G-I of three miRNAs in serum and tissue should show all quantitative data points.

      We have completed the revisions. Kindly review them at your convenience.

      (6) What is the GBM model in Figure 5?

      Thank you to the reviewer for raising this question. The "GBM model" referred to in Figure 5 is a classification model built using the Gradient Boosting Machine (GBM) algorithm, designed to predict the diagnostic status of pancreatic cancer by integrating EV-miRNA expression and radiomics features. We implemented the model using the `GradientBoostingClassifier` from the scikit-learn library (version 1.2.2), and optimized the modelโ€™s hyperparametersโ€”including learning rate, maximum depth, and number of treesโ€”within a five-fold cross-validation framework. The training process and performance evaluation of the model, including the ROC curve and AUC values, are presented in Figure 5.

      (7) What are the AUCs of individual EV-miRNAs integrated as a panel of three EV-miRNAs?

      Thanks for your comments, Our GBM model integrates the panel of these three EV-miRNAs.

      (8) The authors could have compared the performance of CA19-9 with that of the three EV-miRNAs.

      Since our main focus is on the panel of three EV-miRNAs, we did not present the AUC for each individual miRNA separately. However, we have included the performance of CA19-9 in our dataset as a reference. The predictive AUC for CA19-9 is 0.843 (95% CI, 0.762โ€“0.924).

      (9) How was the diagnostic performance of the three EV-miRNAs in the two molecular subtypes identified in Figure 6&7? Do the C1 & C2 clusters correlate with the classical/basal subtypes, staging, and imaging features?

      Thank you to the reviewer for raising this important question. In fact, our EV panel is primarily designed to distinguish between normal and tumor samples, whereas both C1 and C2 represent tumor subtypes, and thus the panel is not applicable for diagnostic purposes in this context. Additionally, our subtypes are novel and do not align with the conventional classical and basal-like gene expression profiles. Furthermore, the C1 subtype is more frequently observed in stage III tumors (Figure 6J) and is associated with distinct imaging features such as higher texture heterogeneity and lower CT density.

      Reviewer #2 (Public review):

      Summary:

      This study investigates a low abundance microRNA signature in extracellular vesicles to subtype pancreatic cancer and for early diagnosis. There are several major questions that need to be addressed. Numerous minor issues are also present.

      Strengths:

      The authors did a comprehensive job with numerous analyses of moderately sized cohorts to describe the clinical and translational significance of their miRNA signature.

      Weaknesses:

      There are multiple weaknesses of this study that should be addressed:

      (1) The description of the datasets in the Materials and Methods lacks details. What were the benign lesions from the various hospital datasets? What were the healthy controls from the public datasets? No pancreatic lesions? No pancreatic cancer? Any cancer history or other comorbid conditions? Please define these better.

      We sincerely thank the reviewer for the detailed and important suggestions regarding sample definition. Indeed, the source of the datasets and the definition of control groups are critical for ensuring the rigor and interpretability of the study. In response to this comment, we have added clarifications in the revised "Materials and Methods" section.

      First, for the benign lesion group derived from various clinical centers (DUH, UMMD, WUH, etc.), we have carefully reviewed the pathological and clinical records and defined these samples as histologically confirmed non-malignant pancreatic lesions, primarily IPMN. All patients in the benign lesion group had no diagnosis of pancreatic cancer at the time of sample collection, and for cohorts with available follow-up data, no evidence of malignant progression was observed within at least six months.

      Second, the healthy control group from public databases was derived from healthy individuals.

      Finally, to eliminate potential confounding factors, we excluded any samples with a history of other malignancies (e.g., breast cancer, colorectal cancer, etc.) from all datasets with available clinical information, to ensure the specificity of the EV-miRNA expression analysis.

      (2) It is unclear how many of the controls and cases had both imaging for radiomics and blood for biomarkers.

      Due to limitations in resource availability, our study does not include samples with both CT imaging and serological data from the same individuals. Instead, we integrated blood samples and CT imaging data collected from different clinical centers.

      (3) The authors should define the imaging methods and protocols used in more detail. For the CT scans, what slice thickness? Was a pancreatic protocol used? What phase of contrast is used (arterial, portal venous, non-contrast)? Any normalization or pre-processing?

      Thank you to the reviewer for the professional suggestions regarding the imaging section. We have added detailed technical information on CT imaging in the revised Materials and Methods section. All CT images were acquired using a 64-slice multidetector spiral CT scanner, with a standard slice thickness of 1.0โ€“1.5 mm and a reconstruction interval of 1 mm. All pancreatic cancer patients underwent a standard pancreatic protocol triphasic contrast-enhanced CT examination, which included non-contrast, arterial phase (approximately 25โ€“30 seconds), and portal venous phase (approximately 65โ€“70 seconds) imaging.

      For the radiomics analysis, images from the portal venous phase were selected, as this phase provides consistent clarity in delineating tumor boundaries and surrounding vasculature. To ensure data consistency, all imaging data underwent preprocessing, including resampling, intensity normalization of grayscale values (standardized using z-score normalization to a mean of 0 and a standard deviation of 1), and N4 bias field correction to address potential low-frequency signal inhomogeneities.

      (4) Who performed the segmentation of the lesions? An experienced pancreatic radiologist? A student? How did the investigators ensure that the definition of the lesions was performed correctly? Raidomics features are often sensitive to the segmentation definitions.

      All lesion segmentations were performed on portal venous phase contrast-enhanced CT images. Manual delineation was conducted using 3D Slicer (version 4.11) by two radiologists with extensive experience in pancreatic tumor diagnosis. A consensus was reached between the two radiologists on the ROI definition criteria prior to analysis.

      To further assess the robustness of radiomic features to segmentation boundary variations, we selected a subset of representative cases and created โ€œexpanded/shrunk ROIsโ€ by adding or subtracting a 2-pixel margin at the lesion boundary. Feature extraction was then repeated, and the coefficient of variation (CV) for the main features included in the model was found to be below 10%, indicating that the model is stable with respect to minor boundary fluctuations.

      (5) Figure 1 is full of vague images that do not convey the study design well. Numbers from each of the datasets, a summary of what data was used for training and for validation, definitions of all of the abbreviations, references to the Roman numerals embedded within the figure, and better labeling of the various embedded graphs are needed. It is not clear whether the graphs are real results or just artwork to convey a concept. I suspect that they are just artwork, but this remains unclear.

      We thank the reviewer for the detailed feedback on Figure 1. We would like to clarify that Figure 1 is a conceptual schematic intended to visually illustrate the overall design of the study, the relationships among different data modules, and the logical sequence of the analytical strategy. It is not meant to present actual results or quantitative details.

      Regarding the reviewerโ€™s concerns about sample sizes, the division between training and validation cohorts, explanations of specific abbreviations, and the precise meaning of each panel, we have provided comprehensive and detailed clarifications in Figure 2.

      (6) The DF selection process lacks important details. Please reference your methods with the Boruta and Lasso models. Please explain what machine learning algorithms were used. There is a reference in the "Feature selection.." section of "the model formula listed below" but I do not see a model formula below this paragraph.

      We thank the reviewer for the thoughtful and detailed comments on the feature selection strategy. We first applied the Boruta algorithm (based on random forests, implemented using the Boruta R package) to the original feature setโ€”which included both radiomics and EV-miRNA featuresโ€”to identify variables that consistently demonstrated importance across multiple rounds of random resampling.

      Subsequently, we used LASSO regression with five-fold cross-validation to further reduce the dimensionality of the Boruta-selected features and to construct the final feature set used for modeling. The formula for the model is as follows: each regression coefficient is multiplied by the corresponding feature expression level, and the resulting products are summed to generate the Risk Score.

      (7) In Figure 2, more quantitative details are needed. How are patients dichotomized into non-obese and obese? What does alcohol/smoking mean? Is it simply no to both versus one or the other as yes? These two risk factors should be separated and pack years of smoking should be reported. The details of alcohol use should also be provided. Is it an alcohol abuse history? Any alcohol use, including social drinking? Similarly, "diabetes" needs to be better explained. Type I, type II, type 3c? P values should be shown to demonstrate any statistically significant differences in the proportions of the patients from one dataset to another.

      Our definition of obesity was based on the standard BMI threshold (30 kg/mยฒ). A history of smoking or alcohol consumption was defined as continuous use for more than one year. Specific details regarding smoking and alcohol use were recorded at baseline under the category of โ€œsmoking/alcohol historyโ€; unfortunately, we did not collect follow-up data on these variables. As for diabetes, only type II diabetes was documented. Statistically significant p-values have been added. Thank you.

      (8) In the section "Different expression radiomic features between pancreatic benign lesions and aggressive tumors", there is a reference to "MUJH" for the first time. What is this? There is also the first reference to "aggressive tumors" in the section. Do the authors just mean the cases? Otherwise there is no clear definition of "aggressive" (vs. indolent) pancreatic cancer. This terminology of tumor "aggressiveness" either needs to be removed or better defined.

      We have corrected the abbreviation (MUJH); it should in fact be JHC. Additionally, regarding the term "aggressive," we have reviewed the literature and used it to convey the highly malignant nature of pancreatic cancer.

      (9) Figure 3 needs to have the specific radiomic features defined and how these features were calculated. Labeling them as just f1, f2, etc is not sufficient for another group to replicate the results independently.

      We have presented these features in Supplementary Table 1. Kindly refer to it for details.

      (10) It is not clear what Figure 4A illustrates as regards model performance. What do the different colors represent, and what are the models used here? This is very confusing.

      This represents the correlation between WGCNA modules and miRNAs. Different module colors indicate distinct miRNA clustersโ€”for example, the green module contains 12 miRNAs grouped together. The colors themselves do not carry any intrinsic meaning.

      (11) Figure 5 shows results for many more model runs than the described 10, please explain what you are trying to convey with each row. What are "Test A" and "Test B"? There is no description in the manuscript of what these represent. In the figure caption, there is a reference to "our center data" which is not clear. Be more specific about what that data is.

      We have indicated this using arrows in Figure 5 from Test A/B/C. Please check.

      (12) Figure 6 describes the subtypes identified in this study, but the authors do not show a multi-variable cox proportional hazards model to show that this subtype classification independently predicts DFS and OS when incorporating confounding variables. This is essential to show the subtypes are clinically relevant. In particular, the authors need to account for the stage of the patients, and receipt of chemotherapy, surgery, and radiation. If surgery was done, we need to know whether they had R1 or R0 resection. The details about the years in which patients were included is also important.

      We sincerely thank the reviewer for this critical comment. We fully agree that incorporating a multivariate Cox proportional hazards model to control for potential confounding factors would provide a more robust validation of the independent prognostic value of our proposed subtypes for DFS and OS.

      However, as the clinical data used in this study were retrospectively collected and access to certain variables is currently restricted, we were only able to obtain limited clinical information. At this stage, we are unable to systematically include key variables such as tumor staging, adjuvant chemoradiotherapy regimens, and resection margin status (R0 vs. R1), which prevents us from performing a rigorous multivariate Cox analysis.

      Similarly, regarding the postoperative resection status, after reviewing the original surgical reports and pathology records, we regret to confirm that margin status (R0 vs. R1) is missing in a substantial portion of cases, making it unsuitable for reliable statistical analysis.

      We fully acknowledge this as a limitation of the current study and have explicitly addressed it in the Discussion section. To address this gap, we are currently designing a more comprehensive prospective cohort study, which will allow us to validate the clinical independence and utility of the proposed subtypes in future research.

      (13) How do these subtypes compare to other published subtypes?

      We sincerely thank the reviewer for raising this important point. Clusters 1 and 2 represent a novel molecular classification proposed for the first time in this study, driven by EV-miRNA profiles. This classification approach is conceptually independent from traditional transcriptome-based subtyping systems, such as the classical/basal-like subtypes, as well as other existing classification schemes. Comparisons with previously reported subtypes and validation of clinical relevance will require further investigation in future studies.

      Reviewer #3 (Public review):

      Summary:

      The authors appear to be attempting to identify which patients with benign lesions will progress to cancer using a liquid biomarker. They used radiomics and EV miRNAs in order to assess this.

      Strengths:

      It is a strength that there are multiple test datasets. Data is batch-corrected. A relatively large number of patients is included. Only 3 miRNAs are needed to obtain their sensitivity and specificity scores.

      Weaknesses:

      This manuscript is not clearly written, making interpretation of the quality and rigor of the data very difficult. There is no indication from the methods that the patients in their cohorts who are pancreatic cancer patients (from the CT images) had prior benign lesions, limiting the power of their analysis. The data regarding the cluster subtypes is very confusing. There is no discussion or comparison if these two clusters are just representing classical and basal subtypes (which have been well described).

      Sorry,we donโ€™t have the data of record from patients, in addition, Regarding the relationship between Cluster 1/Cluster 2 and classical subtypes:We are very grateful for the reviewerโ€™s insightful question. We would like to clarify that Clusters 1 and 2, as shown in Figures 6 and 7, are derived from a novel EV-miRNAโ€“driven molecular classification proposed for the first time in this study. This classification system is constructed independently of the traditional transcriptome-based classical/basal-like subtypes.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      There are errors in reference citations and several typos, misspellings, and grammatical errors throughout the manuscript.

      We have made the necessary revisions.

      Reviewer #2 (Recommendations for the authors):

      (1) Were the radiomic features associated with the subtypes and prognostic in the subset of patients who had CT scans?

      Unfortunately, there are no corresponding CT imaging results available for these cases, as the genes were identified based on predicted miRNA targets and were not derived from patients who had undergone CT scans.

      (2) There is a whole body of literature on prognostic imaging-based subtypes of pancreatic cancer that needs to be cited.

      Thank you for your suggestion. We have cited the relevant references accordingly in the manuscript.

      (3) Similarly, the authors should be more comprehensive about prognostic and early detection markers for miRNAs for pancreatic cancer. Early detection markers really should be described separately from prognostic markers. The authors did not do a PROBE phase 3 study, so early detection is not really relevant. Please see https://edrn.nci.nih.gov/about-edrn/five-phase-approach-and-prospective-specimen-collection-retrospective-blinded-evaluation-study-design/

      The primary objective of our study is early detection. We acknowledge the absence of third-phase validation results, which we will address in the limitations section. Additionally, the subtype classification represents our secondary objective.

      (4) If they want to couch this as a PROBE phase 2 study, then they should review the PROBE guidelines and ensure they are meeting standards. Many of the comments above regarding methodologies, definitions, and patient cohort descriptions would address this concern.

      We have revised the Methods section accordingly. Please kindly review the updated version.

      (5) The entire manuscript needs to have a review for the use of the English language. There are numerous typos and grammatical errors that make this manuscript difficult to follow and hard to interpret.

      We have revised the Methods section accordingly. Please kindly review the updated version.

      (6) In the section on "Definition and identification of low abundance EV-derived miRNA transcripts", provide a reference for the "edger" function.

      We have revised the Methods section accordingly. Please kindly review the updated version.

      (7) In the Abstract: The purpose section only mentions early diagnosis as the goal of this study. It seems subtyping is also a major goal, but it is not mentioned.

      The primary objective of our study is early detection.Additionally, the subtype classification represents our secondary objective.so,we didnโ€™t add it in the purpose.

      (8) The experimental design fails to describe any of the 8 datasets that were used. How many patients? What were the ethnic and racial backgrounds, which is one of the key aspects of this study and mentioned in the title? What range of stages? When were the images and the blood collected in relation to diagnosis? Over what time frame were the patients included? What patients were excluded, if any? These details are important to understand the materials used, along with the methods to design the signatures and models.

      We have revised the Methods section accordingly. Please kindly review the updated version.

      (9) Again, the purpose section of the abstract does not align with the rest of the study, including the description of the experimental design. The last sentence of the experimental design section mentions predicting drug sensitivity and survival, which is unrelated to the aim of early diagnosis.

      We have revised the Methods section accordingly. Please kindly review the updated version.

      (10) The results section lacks key details to indicate the impact of the work. Vague descriptions of the findings are not sufficient. The performance of the biomarkers to differentiate benign from malignant lesions, hazard ratios, survival times, and p values should be reported for key results.

      Our aim was to develop an integrated panel for diagnostic purposes; therefore, we provided the AUC to evaluate its performance. However, since this is a diagnostic model, we did not include hazard ratios or survival time data.

      (11) What are "tow" molecular subtypes of pancreatic cancer? Did you mean "two"? What system was used to subtype the pancreatic cancers? Is some new subtyping or a previously published method to subtype the disease?

      Yes, it means two, previously published method.In method part, we have describe it.

      Reviewer #3 (Recommendations for the authors):

      The writing of this manuscript needs extensive re-wording and clarification to increase the readability and interpretability of the data presented. The authors could include a dataset of pancreatic cancer patient imaging data where the status of prior benign lesions was detected (as opposed to patients with benign lesions that do not develop pancreatic cancer). The authors could also address if their clusters 1 and 2 are representing (or are correlated with) the classical and basal subtypes that have been well described for pancreatic cancer.

      Thank you to the reviewer for the constructive comments. We sincerely appreciate your careful review, particularly regarding language clarity, data interpretability, and subtype correlation. To enhance the readability and scientific precision of the manuscript, we have conducted a thorough revision and language polishing throughout the text, improving logical structure, terminology consistency, and clarity in result descriptions. We have especially reinforced the Methods and Discussion sections to better explain key analytical steps and data interpretation.

      We fully understand the reviewerโ€™s suggestion to include information on โ€œthe presence of benign lesions prior to pancreatic cancer diagnosis.โ€ However, due to the retrospective nature of our study, the current imaging and EV-miRNA datasets do not contain systematically collected follow-up annotations of this type. Therefore, it is not feasible to incorporate such data into the present manuscript.

      That said, we fully recognize the importance of this direction. In future studies, we plan to evaluate longitudinal samples to investigate the dynamic changes in EV-miRNAs and imaging features during the progression from premalignant to malignant states, aiming to clarify their potential value for early cancer warning.

      Regarding the relationship between Cluster 1/Cluster 2 and classical subtypes:We are very grateful for the reviewerโ€™s insightful question. We would like to clarify that Clusters 1 and 2, as shown in Figures 6 and 7, are derived from a novel EV-miRNAโ€“driven molecular classification proposed for the first time in this study. This classification system is constructed independently of the traditional transcriptome-based classical/basal-like subtypes.

      Although we attempted a cross-comparison with existing TCGA subtypes, differences in data origin, analysis modality (EV-miRNA vs. tissue transcriptome), and limitations in sample matching prevent us from establishing a direct correspondence. In the revised Discussion, we have emphasized that these two classification approaches are complementary rather than equivalent, reflecting different dimensions of tumor heterogeneity. Further integrative multi-omics studies will be needed to validate their biological significance and clinical utility.

    1. eLife Assessment

      This study on the loss of DEGS1 in the developing larval brain convincingly shows the accumulation of dihydroceramide in the CNS which induces severe alterations in the morphology of glial subtypes as well as a reduction in glial number. The localization of DEGS1/ifc primarily to the ER is also compelling and interesting, and the loss of DEGS1/ifc clearly drives ER expansion and reduces the levels of TGs. This is an important contribution to the role of lipid metabolism in neural development and disease.

    2. Reviewer #1 (Public review):

      Summary:

      Zhu et al., investigate the cellular defects in glia as a result of loss in DEGS1/ifc encoding the dihydroceramide desaturase. Using the strength of Drosophila and its vast genetic toolkit, they find that DEGS1/ifc is mainly expressed in glia and it's loss leads to profound neurodegeneration. This supports a role for DEGS1 in the developing larval brain as it safeguards proper CNS development. Loss of DEGS1/ifc leads to dihydroceramide accumulation in the CNS and induces alteration in the morphology of glial subtypes and a reduction in glial number. Cortex and ensheathing glia appeared swollen and accumulated internal membranes. Astrocyte-glia on the other hand displayed small cell bodies, reduced membrane extension and disrupted organization in the dorsal ventral nerve cord. They also found that DEGS1/ifc localizes primarily to the ER. Interestingly, the authors observed that loss of DEGS1/ifc drives ER expansion and reduced TGs and lipid droplet numbers. No effect on PC and PE and a slight increase in PS.

      The conclusions of this paper are well supported by the data.

      Strengths:

      This is an interesting study that provides new insight into the role of ceramide metabolism in neurodegeneration.

      The strength of the paper is the generation of LOF lines, the insertion of transgenes and the use of the UAS-GAL4/GAL80 system to assess the cell-autonomous effect of DEGS1/ifc loss in neurons and different glial subtypes during CNS development.

      The imaging, immunofluorescence staining and EM of the larval brain and the use of the optical lobe and the nerve cord as a readout are very robust and nicely done.

      Drosophila is a difficult model to perform core biochemistry and lipidomics, but the authors used the whole larvae and CNS to uncover global changes in mRNA levels related to lipogenesis and the unfolded protein responses, as well as specific lipid alterations upon DEGS1/ifc loss.

      Weaknesses:

      No major weaknesses identified.

      Minor point: The authors performed lipidomics and RTqPCR on whole larvae and larval CNS which does not inform of any cell type-specific effects. Performing single-cell RNAseq on larval brains to tease apart the cell-type specific effect of DEGS1/ifc loss would be interesting to explore the future, but beyond the scope of the current study.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Zhu et al. describes phenotypes associated with the loss of the gene ifc using a Drosophila model. The authors suggest their findings are relevant to understanding the molecular underpinnings of a neurodegenerative disorder, HLD-18, which is caused by mutations in the human ortholog of ifc, DEGS1.

      The work begins with the authors describing the role for ifc during fly larval brain development, demonstrating its function in regulating developmental timing, brain size, and ventral nerve cord elongation. Further mechanistic examination revealed that loss of ifc leads to depleted cellular ceramide levels as well as dihydroceramide accumulation, eventually causing defects in ER morphology and function. Importantly, the authors showed that ifc is predominantly expressed in glia and is critical for maintaining appropriate glial cell numbers and morphology. Many of the key phenotypes caused by the loss of fly ifc can be rescued by overexpression of human DEGS1 in glia, demonstrating the conserved nature of these proteins as well as the pathways they regulate. Interestingly, the authors discovered that the loss of lipid droplet formation in ifc mutant larvae within the cortex glia, presumably driving the deficits in glial wrapping around axons and subsequent neurodegeneration, potentially shedding light on mechanisms of HLD-18 and related disorders.

      Strengths:

      Overall, the manuscript is thorough in its analysis of ifc function and mechanism. The data images are high quality, the experiments are well controlled, and the writing is clear. There are, however, some concerns that need to be addressed prior to publication.

      Weaknesses:

      The authors adequately addressed the previously indicated weaknesses, and no new weaknesses have been identified.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, the authors report three novel ifc alleles: ifc[js1], ifc[js2], and ifc[js3]. ifc[js1] and ifc[js2] encode missense mutations, V276D and G257S, respectively. ifc[js3] encodes a nonsense mutation, W162*. These alleles exhibit multiple phenotypes, including delayed progression to the late-third larval instar stage, reduced brain size, elongation of the ventral nerve cord, axonal swelling, and lethality during late larval or early pupal stages.

      Further characterization of these alleles the authors reveals that ifc is predominantly expressed in glia and localizes to the endoplasmic reticulum (ER). The expression of ifc gene governs glial morphology and survival. Expression of fly ifc cDNA or human DEGS1 cDNA specifically in glia, but not neurons, rescues the CNS phenotypes of ifc mutants, indicating a crucial role for ifc in glial cells and its evolutionary conservation. Loss of ifc results in ER expansion and loss of lipid droplets in cortex glia. Additionally, loss of ifc leads to ceramide depletion and accumulation of dihydroceramide. Moreover, it increases the saturation levels of triacylglycerols and membrane phospholipids. Finally, the reduction of dihydroceramide synthesis suppresses the CNS phenotypes associated with ifc mutations, indicating the key role of dihydroceramide in causing ifc LOF defects.

      Strengths:

      This manuscript unveils several intriguing and novel phenotypes of ifc loss-of-function in glia. The experiments are meticulously planned and executed, with the data strongly supporting their conclusions.

    5. Author Response:

      The following is the authorsโ€™ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary: Zhu et al., investigate the cellular defects in glia as a result of loss in DEGS1/ifc encoding the dihydroceramide desaturase. Using the strength of Drosophila and its vast genetic toolkit, they find that DEGS1/ifc is mainly expressed in glia and its loss leads to profound neurodegeneration. This supports a role for DEGS1 in the developing larval brain as it safeguards proper CNS development. Loss of DEGS1/ifc leads to dihydroceramide accumulation in the CNS and induces alteration in the morphology of glial subtypes and a reduction in glial number. Cortex and ensheathing glia appeared swollen and accumulated internal membranes. Astrocyte-glia on the other hand displayed small cell bodies, reduced membrane extension and disrupted organization in the dorsal ventral nerve cord. They also found that DEGS1/ifc localizes primarily to the ER. Interestingly, the authors observed that loss of DEGS1/ifc drives ER expansion and reduced TGs and lipid droplet numbers. No effect on PC and PE and a slight increase in PS.

      The conclusions of this paper are well supported by the data. The study could be further strengthened by a few additional controls and/or analyses.

      Strengths:

      This is an interesting study that provides new insight into the role of ceramide metabolism in neurodegeneration.

      The strength of the paper is the generation of LOF lines, the insertion of transgenes and the use of the UAS-GAL4/GAL80 system to assess the cell-autonomous effect of DEGS1/ifc loss in neurons and different glial subtypes during CNS development.

      The imaging, immunofluorescence staining and EM of the larval brain and the use of the optical lobe and the nerve cord as a readout are very robust and nicely done.

      Drosophila is a difficult model to perform core biochemistry and lipidomics but the authors used the whole larvae and CNS to uncover global changes in mRNA levels related to lipogenesis and the unfolded protein responses as well as specific lipid alterations upon DEGS1/ifc loss.

      Weaknesses:

      (1)ย The authors performed lipidomics and RTqPCR on whole larvae and larval CNS from which it is impossible to define the cell type-specific effects. Ideally, this could be further supported by performing single cell RNAseq on larval brains to tease apart the cell-type specific effect of DEGS1/ifc loss.

      We agree that using scRNAseq or pairing FACS-sorting of individual glial subtypes with bulk RNAseq would help tease apart the cell-type specific effects of DEGS1/ifc loss on glial cells. At this time, however, this approach extends beyond the scope of the current paper and means of the lab.ย 

      (2) It's clear from the data that the accumulation of dihydroceramide in the ER triggers ER expansion but it remains unclear how or why this happens. Additionally, the authors assume that, because of the reduction in LD numbers, that the source of fatty acids comes from the LDs. But there is no data testing this directly.

      As CERT, the protein that transports ceramide from the ER to the Golgi, is far more efficient at transporting ceramide than dihydroceramide, we speculate that dihydroceramide accumulates in the ER due to inefficient transport from the ER to the Golgi by CERT. We state this model more explicitly in the results under the subheading โ€œReduction of dihydroceramide synthesis suppresses the ifc CNS phenotypeโ€.

      We agree with the point on lipid droplet. We observe a correlation, not a causation, between reduction of lipid droplets and a large expansion of ER membrane. We have tried to clarify the text in the last paragraph of the discussion to make this point more clearly. See also response to reviewer 2 point 3.ย 

      (3) The authors performed a beautiful EMS screen identifying several LOF alleles in ifc. However, the authors decided to only use KO/ifcJS3. The paper could be strengthened if the authors could replicate some of the key findings in additional fly lines.

      We agree. We replicated the observed cortex glia swelling, ER expansion in cortex glia, and observed increase in neuronal cell death markers in late-third instar larvae mutant for either the ifcjs1 or ifcjs2 allele. These data are now provided as Supplementary Figure 7.

      (4)ย The authors use M{3xP3-RFP.attP}ZH-51D transgene as a general glial marker. However, it would be advised to show the % overlap between the glial marker and the RFP since a lot of cells are green positive but not per se RFP positive and vice versa.

      We visually reexamined the expression of the 3xP3 RFP transgene relative to FABP labeling for cortex glia, Ebony for astrocyte-like glia, and the Myr-GFP transgene driven by glial-subtype specific GAL4 driver lines for perineurial, subperineurial, and ensheathing glia. We note that RFP localizes to the nucleus cytoplasm while FABP and Ebony localize to the cytoplasm and Myr-GFP to the cell membrane. Thus, an observed lack of overlap of expression between RFP and the other markers can arise to differential localization of the two markers in the same cells (see, for example, Fig. S2D where Myr-GFP expression in the nuclear envelope encircles that of RFP in the nucleus. Through visual inspection of five larval-brain complexes for each glial subtype marker, we found that essentially all cortex, SPG, and ensheathing glia expressed RFP. Similarly, nearly all astrocyte-like glia also expressed RFP, but they expressed RFP at significantly lower levels than that observed for cortex, SPG, or ensheathing glia. This analysis also confirmed that most perineurial glia do not express RFP. The 3xP3 M{3xP3-RFP.attP}ZH-51D transgene then labels most glia in the Drosophila CNS. We have added text to Supplementary Figure 2 noting the above observations as to which glial cells express RFP.ย 

      (5)ย The authors indicate that other 3xP3 RFP and GFP transgenes at other genomic locations also label most glia in the CNS. Do they have a preferential overlap with the different glial subtypes?

      We assessed three different types of 3xP3 RFP and GFP transgenes: M{3xP3RFP.attp} transgenes (n=4), Mi{GFP[E.3xP3]=ET1} transgenes (n=3), and

      Tl{GFP[3xP3.cLa]=CRIMIC.TG4} transgenes (n>6). All labeled cortex glia, but different lines exhibited differential labeling of astrocyte and ensheathing glia. These data are now included as Supplementary Figure 3.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Zhu et al. describes phenotypes associated with the loss of the gene ifc using a Drosophila model. The authors suggest their findings are relevant to understanding the molecular underpinnings of a neurodegenerative disorder, HLD-18, which is caused by mutations in the human ortholog of ifc, DEGS1.

      The work begins with the authors describing the role for ifc during fly larval brain development, demonstrating its function in regulating developmental timing, brain size, and ventral nerve cord elongation. Further mechanistic examination revealed that loss of ifc leads to depleted cellular ceramide levels as well as dihydroceramide accumulation, eventually causing defects in ER morphology and function. Importantly, the authors showed that ifc is predominantly expressed in glia and is critical for maintaining appropriate glial cell numbers and morphology. Many of the key phenotypes caused by the loss of fly ifc can be rescued by overexpression of human DEGS1 in glia, demonstrating the conserved nature of these proteins as well as the pathways they regulate. Interestingly, the authors discovered that the loss of lipid droplet formation in ifc mutant larvae within the cortex glia, presumably driving the deficits in glial wrapping around axons and subsequent neurodegeneration, potentially shedding light on mechanisms of HLD-18 and related disorders.

      Strengths:

      Overall, the manuscript is thorough in its analysis of ifc function and mechanism. The data images are high quality, the experiments are well controlled, and the writing is clear.

      Weaknesses:

      (1)ย The authors clearly demonstrated a reduction in number of glia in the larval brains of ifc mutant flies. What remains unclear is whether ifc loss leads to glial apoptosis or a failure for glia to proliferate during development. The authors should distinguish between these two hypotheses using apoptotic markers and cell proliferation markers in glia.

      To address this point, we used phospho-histone H3 to assess mitotic index in the thoracic CNS of wild-type versus ifc mutant late third instar larvae and found a mild, but significant reduction in mitotic index in ifc mutant relative to wild-type nerve cords. We also assessed the ability of glial-specific expression of the potent anti-apoptotic gene p35 to rescue the observed loss of cortex glia phenotype in the thoracic region of the CNS of otherwise ifc mutant larvae and observed a clear increase in cortex glia in the presence versus the absence of glial-specific p35 expression (p<3 x 10-4). These data are now provided as Supplementary Figure S8 in the paper and referred to on page 8.

      (2) It is surprising that human DEGS1 expression in glia rescues the noted phenotypes despite the different preference for sphingoid backbone between flies and mammals. Though human DEGS1 rescued the glial phenotypes described, can animal lethality be rescued by glial expression of human DEGS1? Are there longer-term effects of loss of ifc that cannot be compensated by the overexpression of human DEGS1 in glia (age-dependent neurodegeneration, etc.)?

      We note explicitly that while glial expression of human DEGS1 does provide rescuing activity, it only partially rescues the ifc mutant CNS phenotype in contrast to glial expression of Drosophila ifc, which fully rescues this phenotype. Thus, the relative activity of human DEGS1 is far below that of Drosophila ifc when assayed in flies. To quantify the functional difference between the two transgenes, we assessed the ability of glial expression of fly ifc or of human DEGS1 to rescue the lethality of otherwise ifc mutant larvae: Glial expression of ifc was sufficient to rescue the adult viability of 57.9% of ifc mutant flies based on expected Mendelian ratios (n=2452), whereas glial expression of DEGS1 was sufficient to rescue just 3.9% of ifc mutant flies (n=1303), uncovering a ~15-fold difference in the ability of the two transgenes to rescue the lethality of otherwise ifc mutant flies. In the absence of either transgene, no ifc mutant larvae reached adulthood (n=1030). These data are now provided in the text on page 9 of the revised manuscript.ย 

      (3) The mechanistic link between the loss of ifc and lipid droplet defects is missing. How do defects in ceramide metabolism alter triglyceride utilization and storage? While the author's argument that the loss of lipid droplets in larval glia will lead to defects in neuronal ensheathment, a discussion of how this is linked to ceramides needs to be added.

      We have revised the text to address this point. We speculate that the apparent increased demand for membrane phospholipid synthesis may drive the depletion of lipid droplets, providing a link to ifc function and ceramides. Below we provide the rewritten last paragraph; the underlined section is the new text. ย 

      โ€œThe expansion of ER membranes coupled with loss of lipid droplets in ifc mutant larvae suggests that the apparent demand for increased membrane phospholipid synthesis may drive lipid droplet depletion, as lipid droplet catabolism can release free fatty acids to serve as substrates for lipid synthesis. At some point, the depletion of lipid droplets, and perhaps free fatty acids as well, would be expected to exhaust the ability of cortex glia to produce additional membrane phospholipids required for fully enwrapping neuronal cell bodies. Under wild-type conditions, many lipid droplets are present in cortex glia during the rapid phase of neurogenesis that occurs in larvae. During this phase, lipid droplets likely support the ability of cortex glia to generate large quantities of membrane lipids to drive membrane growth needed to ensheathe newly born neurons. Supporting this idea, lipid droplets disappear in the adult Drosophila CNS when neurogenesis is complete and cortex glia remodeling stops. We speculate that lipid droplet loss in ifc mutant larvae contributes to the inability of cortex glia to enwrap neuronal cell bodies. Prior work on lipid droplets in flies has focused on stress-induced lipid droplets generated in glia and their protective or deleterious roles in the nervous system. Work in mice and humans has found that more lipid droplets are often associated with the pathogenesis of neurodegenerative diseases, but our work correlates lipid droplet loss with CNS defects. In the future, it will be important to determine how lipid droplets impact nervous system development and disease.โ€

      (4) On page 10, the authors use the words "strong" and "weak" to describe where ifc is expressed. Since the use of T2A-GAL4 alleles in examining gene expression is unable to delineate the amount of gene expression from a locus, the terms "broad" and "sparse" labeling (or similar terms) should be used instead.

      The ifc T2A-GAL4 insert in the ifc locus reports on the transcription of the gene. We agree that GAL4 system will not reflect amount of gene expression differences when the expression levels are not dramatically different. However, when the expression levels differ dramatically, as in our case, GAL4 system can reflect this difference in the expression of a reporter gene.ย  We reworded this section to suggest that ifc is transcribed at higher levels in glia as compared to neurons. We canโ€™t use sparse or broad, as ifc is expressed in all, or at least in most, glia and neurons. The new text is as follows:โ€ Using this approach, we observed strong nRFP expression in all glial cells (Figures 4D and S10A) and modest nRFP expression in all neurons (Figures 4E and S10B), suggesting ifc is transcribed at higher levels in glial cells than neurons in the larval CNS.โ€ย ย 

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors report three novel ifc alleles: ifc[js1], ifc[js2], and ifc[js3]. ifc[js1] and ifc[js2] encode missense mutations, V276D and G257S, respectively. ifc[js3] encodes a nonsense mutation, W162*. These alleles exhibit multiple phenotypes, including delayed progression to the late-third larval instar stage, reduced brain size, elongation of the ventral nerve cord, axonal swelling, and lethality during late larval or early pupal stages.

      Further characterization of these alleles the authors reveals that ifc is predominantly expressed in glia and localizes to the endoplasmic reticulum (ER). The expression of ifc gene governs glial morphology and survival. Expression of fly ifc cDNA or human DEGS1 cDNA specifically in glia, but not neurons, rescues the CNS phenotypes of ifc mutants, indicating a crucial role for ifc in glial cells and its evolutionary conservation. Loss of ifc results in ER expansion and loss of lipid droplets in cortex glia. Additionally, loss of ifc leads to ceramide depletion and accumulation of dihydroceramide. Moreover, it increases the saturation levels of triacylglycerols and membrane phospholipids. Finally, the reduction of dihydroceramide synthesis suppresses the CNS phenotypes associated with ifc mutations, indicating the key role of dihydroceramide in causing ifc LOF defects.

      Strengths:

      This manuscript unveils several intriguing and novel phenotypes of ifc loss-of-function in glia. The experiments are meticulously planned and executed, with the data strongly supporting their conclusions.

      Weaknesses:

      I didn't find any obvious weakness.

      Reviewer #1 (Recommendations For The Authors):

      Additional minor comments below:

      (1) The authors state that TGs are the building blocks of membrane phospholipids. This is not exactly true. The breakdown of TGs can result in free FAs which can be used for membrane phospholipid synthesis. Also, membrane phospholipids can also be generated from free FAs that were never in TGs.

      To address this point, we have reworked a number of sentences in the text. On page 12 we reworded two small sections to the following:ย 

      โ€œIn the CNS, lipid droplets form primarily in cortex glia[29] and are thought to contribute to membrane lipid synthesis through their catabolism into free fatty acids versus acting as an energy source in the brain.[41] Consistent with the possibility that increased membrane lipid synthesis drives lipid droplet reduction, RNA-seq assays of dissected nerve cords revealed that loss of ifc drove transcriptional upregulation of genes that promote membrane lipid biogenesisโ€

      As TG breakdown results in free fatty acids that can be used for membrane phospholipid synthesis, we asked if changes in TG levels and saturation were reflected in the levels or saturation of the membrane phospholipids phosphatidylcholine (PC), phosphatidylethanolamine (PE), and phosphatidylserine (PS).

      (2) Figure 5J what does the dotted line indicate? Please specify in the figure legend or remove it.

      We have added the following text in the figure legend: Dotted line indicates a log2 fold change of 0.5 in the treatment group compared to the control group.

      (3)ย The text for your graphs is hard to read. Please make the font larger.

      We have increased font size to enhance the readability of the figures.

      (4) The authors mentioned that driving ifc expression in neurons rescues the phenotypes (ref 17). While the glial-specific role presented in this study is robust. I think some readers would appreciate some discussion of this study in light of the data presented here.

      We have added the below text on page 10 to address this point.

      โ€œResults of our gene rescue experiments conflict with a prior study on ifc in which expression of ifc in neurons was found to rescue the ifc phenotype. In this context, we note that elav-GAL4 drives UASlinked transgene expression not just in neurons, but also in glia at appreciable levels, and thus needs to be paired with repo-GAL80 to restrict GAL4-mediated gene expression to neurons. Thus, โ€œoff-targetโ€ expression in glial cells may account for the discrepant results. It is, however, more difficult to reconcile how neuronal or glial expression of ifc would rescue the observed lethality of the ifc-KO chromosome given the presence additional lethal mutations in the 21E2 region of the second chromosome.โ€

      (5) While the analysis of fatty acid saturation is experimentally well done. I'm not really sure what the significance of this data is.

      We included this information as a reference for future analysis of additional genes in the ceramide biogenesis pathway, as we expect that alteration of the levels and saturation levels of PE, PC, and PS in cell membranes may underlie key changes in the biophysical properties of glial cell membranes and their ability to enwrap or infiltrate their targets. Thus, we expect the significance of these data to grow as more work is done on additional members of the ceramide pathway in the nervous system in flies and other systems.ย ย 

      Reviewer #2 (Recommendations For The Authors):

      (1)ย There is a typo at the top of page 11: "internal membranes and fail enwrap neurons" is missing the word "to" before "enwrap"

      The typo was fixed.

      (2)ย  PMID: 36718090 should be included in the discussion of SPT and ORMDL complex in human disease.

      The reference was added.

      Reviewer #3 (Recommendations For The Authors):

      In this manuscript, the authors report three novel ifc alleles: ifc[js1], ifc[js2], and ifc[js3]. ifc[js1] and ifc[js2] encode missense mutations, V276D and G257S, respectively. ifc[js3] encodes a nonsense mutation, W162*. These alleles exhibit multiple phenotypes, including delayed progression to the late-third larval instar stage, reduced brain size, elongation of the ventral nerve cord, axonal swelling, and lethality during late larval or early pupal stages.

      Further characterization of these alleles the authors reveals that ifc is predominantly expressed in glia and localizes to the endoplasmic reticulum (ER). The expression of ifc gene governs glial morphology and survival. Expression of fly ifc cDNA or human DEGS1 cDNA specifically in glia, but not neurons, rescues the CNS phenotypes of ifc mutants, indicating a crucial role for ifc in glial cells and its evolutionary conservation. Loss of ifc results in ER expansion and loss of lipid droplets in cortex glia. Additionally, loss of ifc leads to ceramide depletion and accumulation of dihydroceramide. Moreover, it increases the saturation levels of triacylglycerols and membrane phospholipids. Finally, the reduction of dihydroceramide synthesis suppresses the CNS phenotypes associated with ifc mutations, indicating the key role of dihydroceramide in causing ifc LOF defects.

      In summary, this manuscript unveils several intriguing and novel phenotypes of ifc loss-of-function in glia. The experiments are meticulously planned and executed, with the data strongly supporting their conclusions. I have no additional comments and fully support the publication of this manuscript in eLife.

      The authors also note that they added one paragraph to the discussion that addresses the possibility that the increased detection of cell death markers could arise due to the inability of glial cells to remove cellular debris. The text of this paragraph is provided below:

      We note that cortex glia are the major phagocytic cell of the CNS and phagocytose neurons targeted for apoptosis as part of the normal developmental process.23-26ย  Thus, while we favor the model that ifc triggers neuronal cell death due to glial dysfunction, it is also possible that increased detection of dying neurons arises due at least in part to a decreased ability of cortex glia to clear dying neurons from the CNS. At present, the large number of neurons that undergo developmentally programmed cell death combined with the significant disruption to brain and ventral nerve cord morphology caused by loss of ifc function render this question difficult to address.Additional evidence does, however, support the idea that loss of ifc function drives excess neuronal cell death: Clonal analysis in the fly eye reveals that loss of ifc drives photoreceptor neuron degeneration17, indicating that loss of ifc function drives neuronal cell death; cortex-glia specific depletion of CPES, which acts downstream of ifc, disrupts neuronal function and induces photosensitive epilepsy in flies59, indicating that genes in the ceramide pathway can act nonautonomously in glia to regulate neuronal function; recent genetic studies reveal that other glial cells can compensate for impaired cortex glial cell function by phagocytosing dying neurons62, and we observe that the cell membranes of subperineurial glia enwrap dying neurons in ifc mutant larvae (Fig. S14), consistent with similar compensation occurring in this background, and in humans, loss of function mutations in DEGS1 cause neurodegeneration.7-9 Clearly, future work is required to address this question for ifc/DEGS1 and perhaps other members of the ceramide biogenesis pathway.

    1. eLife Assessment

      This study is important as it highlighted how IL-4 regulates the reactive state of a specific microglial population by increasing the proportion of CD11c+ microglial cells and ultimately suppressing neuropathic pain. The study employs a combination of behavioral assays, pharmacogenetic manipulation of microglial populations, and characterization of microglial markers to address these questions. It provided convincing evidence for the proposed mechanism of IL-4-mediated microglial regulation in neuropathic pain.

    2. Reviewer #2 (Public review):

      Summary:

      The authors aimed to investigate how IL-4 modulates the reactive state of microglia in the context of neuropathic pain. Specifically, they sought to determine whether IL-4 drives an increase in CD11c+ microglial cells, a population associated with anti-inflammatory responses, and whether this change is linked to the suppression of neuropathic pain. The study employs a combination of behavioral assays, pharmacogenetic manipulation of microglial populations, and characterization of microglial markers to address these questions.

      Strengths:

      Strengths: The methodological approach in this study is robust, providing convincing evidence for the proposed mechanism of IL-4-mediated microglial regulation in neuropathic pain. The experimental design is well thought out, utilizing two distinct neuropathic pain models (SpNT and SNI), each yielding different outcomes. The SpNT model demonstrates spontaneous pain remission and an increase in the CD11c+ microglial population, which correlates with pain suppression. In contrast, the SNI model, which does not show spontaneous pain remission, lacks a significant increase in CD11c+ microglia, underscoring the specificity of the observed phenomenon. This design effectively highlights the role of the CD11c+ microglial population in pain modulation. The use of behavioral tests provides a clear functional assessment of IL-4 manipulation, and pharmacogenetic tools allow for precise control of microglial populations, minimizing off-target effects. Notably, the manipulation targets the CD11c promoter, which presumably reduces the risk of non-specific ablation of other microglial populations, strengthening the experimental precision. Moreover, the thorough characterization of microglial markers adds depth to the analysis, ensuring that the changes in microglial populations are accurately linked to the behavioral outcomes.

      Weaknesses:

      One potential limitation of the study is that the mechanistic details of how IL-4 induces the observed shift in microglial populations are not fully explored. While the study demonstrates a correlation between IL-4 and CD11c+ microglial cells, a deeper investigation into the specific signaling pathways and molecular processes driving this population shift would greatly strengthen the conclusions. Additionally, the paper does not clearly integrate the findings into the broader context of microglial reactive state regulation in neuropathic pain.

      Comments on revisions:

      In the revised manuscript, the authors have successfully addressed my previous concerns as well as the other reviewers. I do not have further concerns about this study.

    3. Author Response:

      The following is the authorsโ€™ response to the original reviews.

      Reviewer #1 (Public review): ย 

      Summary:ย 

      Kohno et al. examined whether the anti-inflammatory cytokine IL-4 attenuates neuropathic pain by promoting the emergence of antinociceptive microglia in the dorsal horn of the spinal cord. In two models of neuropathic pain following peripheral nerve injury, intrathecal administration of IL-4 once a day for 3 days from day 14 to day 17 after injury, attenuates hypersensitivity to mechanical stimuli in the hind paw ipsilateral to nerve injury. Such an antinociceptive effect correlates with a higher number of CD11c+microglia in the dorsal horn of the spinal cord which is the termination area for primary afferent fibres injured in the periphery. Interestingly, CD11c+ microglia emerge spontaneously in the dorsal horn in concomitance with the resolution of pain in the spinal nerve model of pain, but not in the spared nerve injury model where pain does not resolve, confirming that this cluster of microglia is involved in resolution pain.ย 

      Based on existing evidence that the receptor for IL-4, namely IL-4R, is expressed by microglia, the authors suggest that IL-4R mediates IL-4 effect in microglia including up-regulation of Igf1 mRNA. They have previously reported that IGF-1 can attenuate pain neuron activity in the spinal cord.ย 

      Strengths:

      This study includes cutting-edge techniques such as flow cytometry analysis of microglia and transgenic mouse models.ย 

      Weaknesses:

      The conclusion of this paper is supported by data, but the interpretation of some data requires clarification. ย 

      We appreciate the reviewer's careful reading of our paper.ย  According to the reviewer's comments, we have performed new immunohistochemical experiments and added some discussion in the revised manuscript (please see the point-by-point responses below).

      Reviewer #2 (Public review):

      Summary:

      The authors aimed to investigate how IL-4 modulates the reactive state of microglia in the context of neuropathic pain. Specifically, they sought to determine whether IL-4 drives an increase in CD11c+ microglial cells, a population associated with anti-inflammatory responses and whether this change is linked to the suppression of neuropathic pain. The study employs a combination of behavioral assays, pharmacogenetic manipulation of microglial populations, and characterization of microglial markers to address these questions.ย 

      Strengths:ย 

      The methodological approach in this study is robust, providing convincing evidence for the proposed mechanism of IL-4-mediated microglial regulation in neuropathic pain. The experimental design is well thought out, utilizing two distinct neuropathic pain models (SpNT and SNI), each yielding different outcomes. The SpNT model demonstrates spontaneous pain remission and an increase in the CD11c+ microglial population, which correlates with pain suppression. In contrast, the SNI model, which does not show spontaneous pain remission, lacks a significant increase in CD11c+ microglia, underscoring the specificity of the observed phenomenon. This design effectively highlights the role of the CD11c+ microglial population in pain modulation. The use of behavioral tests provides a clear functional assessment of IL-4 manipulation, and pharmacogenetic tools allow for precise control of microglial populations, minimizing off-target effects. Notably, the manipulation targets the CD11c promoter, which presumably reduces the risk of non-specific ablation of other microglial populations, strengthening the experimental precision. Moreover, the thorough characterization of microglial markers adds depth to the analysis, ensuring that the changes in microglial populations are accurately linked to the behavioral outcomes.ย 

      Weaknesses:ย 

      One potential limitation of the study is that the mechanistic details of how IL-4 induces the observed shift in microglial populations are not fully explored. While the study demonstrates a correlation between IL-4 and CD11c+ microglial cells, a deeper investigation into the specific signaling pathways and molecular processes driving this population shift would greatly strengthen the conclusions. Additionally, the paper does not clearly integrate the findings into the broader context of microglial reactive state regulation in neuropathic pain. ย 

      We thank the reviewer for these insightful comments on our paper.ย  As the reviewer's suggested, further investigation of the specific signaling pathways and molecular processes by which IL-4 induces a transition of spinal microglia to the CD11c+ state would strengthen our conclusion and also provide important clues to discovering new therapeutic targets.ย  In revising the manuscript, we have included this in the Discussion section (line 264-267), and we hope that future studies clarify these points.ย  As for the additional comment, we have added a brief summary of existing research on microglial function in neuropathic pain at the beginning of the Discussion section (line 188โ€“196).

      Reviewer #1 (Recommendations for the authors):

      The conclusions of this paper are supported by data, but the interpretation of some data requires clarification.ย 

      (1)ย In Figure 1D and Figure 7 C, CD11c+ microglia numbers are higher in contralateral dorsal horns after IL-4 administration despite IL-4 having no effect on pain thresholds. The authors should discuss these findings. ย 

      As the reviewer pointed out, IL-4 increased the number of CD11c<sup>+</sup> microglia in the contralateral spinal dorsal horn (SDH) but did not affect pain thresholds in the contralateral hindpaw.ย  The data seem to be related to the selective effect of CD11c+ microglia and their factors (especially IGF1) on nerve injury-induced pain hypersensitivity.ย  In fact, depletion of CD11c+ spinal microglia and intrathecal administration of IGF1 do not elevate pain threshold of the contralateral hindpaw (Science 376: 86โ€“90, 2022).ย  We have added above statement in the Discussion section (line 208โ€“ 213).

      (2)ย  Do monocytes infiltrate the dorsal horn and DRG after intrathecal injections?

      To address this reviewer's comment, we performed new immunohistochemical experiments to analyze monocytes in the SDH using an antibody for CD169 (a marker for bone marrow-derived monocytes/macrophages, but not for resident microglia) (J Clin Invest 122: 3063โ€“ 3087, 2012; Cell Rep 3: 605โ€“614, 2016) and found no CD169+ monocytes in the SDH parenchyma after SpNT.ย  Consistent with this data, we have previously demonstrated that few bone marrow-derived monocytes/macrophages are recruited to the SDH after SpNT (Sci Rep 6: 23701, 2016).ย  Similarly, no CD169+ monocytes in the SDH parenchyma were observed in SpNT mice treated intrathecally with PBS or IL-4 (Figure 1โ€”figure supplement 1A).

      In the DRG, CD169 is constitutively expressed in macrophages.ย  Thus, in accordance with a recent report showing that monocytes infiltrating the DRG are positive for chemokine (C-C motif) receptor 2 (CCR2) (J Exp Med 221: e20230675, 2024), we analyzed CCR2+ cells and found that CCR2+ IBA1dim monocytes were observed in the capsule and parenchyma of the DRG of naive mice (Figure 1โ€”figure supplement 1B).ย  After SpNT, CCR2+ IBA1dim monocytes in the DRG parenchyma increased.ย  Intrathecal treatment of IL-4 increased CCR2+ IBA1dim cells in the DRG capsule.ย  However, the involvement of these monocytes in the DRG in IL-4-induced alleviation of neuropathic pain is unclear and warrants further investigation.ย  In revising the manuscript, we have included additional data (Figure 1โ€”figure supplement 1) and corresponding text in the Results (line 112โ€“114) and Discussion section (line 218โ€“222).

      (3)ย In Figure 4, depletion of CD11c+ cells in dorsal root ganglia (DRG) ameliorates neuropathic thresholds but does not alter the anti-nociceptive effect of IL-4 injected intrathecal. It appears that CD11c+ macrophages in DRG have an opposite role to CD11c+ microglia in the spinal cord. Please discuss this result.ย 

      We apologize for the confusion.ย  The aim of the experiments in Figure 4 was to examine the contribution of CD11c+ cells in the DRG to the pain-alleviating effect of intrathecal IL-4.ย  For this aim, we depleted CD11c+ cells in the DRG (but not in the SDH) by intraperitoneal injection of diphtheria toxin (DTX) immediately after the behavioral measurements performed on day 17 (Fig. 4A, B).ย  On day 18, the paw withdrawal threshold of DTX-treated mice was almost similar to that of PBS-treated mice, indicating that the depletion of CD11c+ cells in the DRG does not affect the pain-alleviating effect of IL-4.ย  These data are in stark contrast to those obtained from mice with depletion of CD11c+ cells in the SDH by intrathecal DTX (the depletion canceled the IL-4's effect) (Figure 2A).ย  Thus, it is conceivable that CD11c+ cells in the DRG are not involved in the IL-4-induced alleviating effect on neuropathic pain.ย  Because the confusion might be related to the statement in this paragraph of the initial version, we thus modified our statements to make this point more clearly (line 133โ€“139).

      Reviewer #2 (Recommendations for the authors):

      A discussion addressing how these results fit into existing research on microglial function in pain would enhance the study's impact.

      A brief summary of existing research on microglial function in neuropathic pain has been included at the beginning of the Discussion section (line 188โ€“196).

      It would be helpful for the authors to elaborate on the implications of their findings within the larger landscape of immune regulation in neuropathic pain.

      Our present findings showed an ability of IL-4, known as a T-cell-derived factor, to increase CD11c+ microglia and to control neuropathic pain.ย  Furthermore, recent studies have also indicated that immune cells such as CD8+ T cells infiltrating into the spinal cord (Neuron 113: 896-911.e9, 2025), and regulatory T cells (eLife 10: e69056, 2021; Science 388: 96โ€“104, 2025) and MRC1+ macrophages in the spinal meninges (Neuron 109: 1274โ€“1282, 2021) have important roles in regulating microglial states and neuropathic pain.ย  Thus, these findings provide new insights into the mechanisms of the neuro-immune interactions that regulate neuropathic pain.ย  In revising the manuscript, we have added above statement in the Discussion section (line 254โ€“260).

      Furthermore, a discussion on how these findings could inform the development of targeted therapies that modulate microglial populations in a controlled, disease-specific manner would be valuable. Exploring how these insights could lead to novel treatment strategies for neuropathic pain could provide important future directions for the research and broader clinical applications.

      We appreciate the reviewer's valuable suggestion.ย  Our current data, demonstrating that IL-4 increases CD11c+ microglia without affecting the total number of microglia, could open a new avenue for developing strategies to modulate microglial subpopulations through molecular targeting, which may lead to new analgesics.ย  However, given IL-4's association with allergic responses, targeting microglia-selective molecules involved in shifting microglia toward the CD11c+ stateโ€”such as intracellular signaling molecules downstream of IL-4 receptorsโ€”may offer a more selective and safer therapeutic approach.ย  Moreover, since CD11c+ microglia have been implicated in other CNS diseases [e.g., Alzheimer disease (Cell 169: 1276โ€“1290, 2017), amyotrophic lateral sclerosis (Nat Neurosci 25: 26โ€“38, 2022), and multiple sclerosis (Front Cell Neurosci 12: 523, 2019)], further investigations into the mechanisms driving CD11c+ microglia induction could provide insights into novel therapeutic strategies not only for neuropathic pain but also for other CNS diseases.ย  In revising the manuscript, we have added above statement in the Discussion section (line 260โ€“271).

    1. eLife Assessment

      This study provides valuable findings regarding potential correlates of protection against the African swine fever virus. The evidence supporting the claims is solid, although analysis using a higher number of animals and other virus strains will be required to further evaluate the relevance of the immune parameters associated to protection. The work will be of broad interest to veterinary immunologists, and particularly those working on African swine fever.

    2. Reviewer #1 (Public review):

      The study by Lotonin et al. investigates correlates of protection against African swine fever virus (ASFV) infection. The study is based on a comprehensive work, including the measurement of immune parameters using complementary methodologies. An important aspect of the work is the temporal analysis of the immune events, allowing for the capture of the dynamics of the immune responses induced after infection. Also, the work compares responses induced in farm and SPF pigs, showing the latter an enhanced capacity to induce a protective immunity. Overall, the results obtained are interesting and relevant for the field. The findings described in the study further validate work from previous studies (critical role of virus-specific T cell responses) and provide new evidence on the importance of a balanced innate immune response during the immunization process. This information increases our knowledge on basic ASF immunology, one of the important gaps in ASF research that needs to be addressed for a more rational design of effective vaccines. Further studies will be required to corroborate that the results obtained based on the immunization of pigs by a not completely attenuated virus strain are also valid in other models, such as immunization using live attenuated vaccines.

      While overall the conclusions of the work are well supported by the results, I consider that the following issues should be addressed to improve the interpretation of the results:

      (1) An important issue in the study is the characterization of the infection outcome observed upon Estonia 2014 inoculation. Infected pigs show a long period of viremia, which is not linked to clinical signs. Indeed, animals are recovered by 20 days post-infection (dpi), but virus levels in blood remain high until 141 dpi. This is uncommon for ASF acute infections and rather indicates a potential induction of a chronic infection. Have the authors analysed this possibility deeply? Are there lesions indicative of chronic ASF in infected pigs at 17 dpi (when they have sacrificed some animals) or, more importantly, at later time points? Does the virus persist in some tissues at late time points, once clinical signs are not observed? Has all this been tested in previous studies?

      (2) Virus loads post-Estonia infection significantly differ from whole blood and serum (Figure 1C), while they are very similar in the same samples post-challenge. Have the authors validated these results using methods to quantify infectious particles, such as Hemadsorption or Immunoperoxidase assays? This is important, since it would determine the duration of virus replication post-Estonia inoculation, which is a very relevant parameter of the model.

      (3) Related to the previous points, do the authors consider it expected that the induction of immunosuppressive mechanisms during such a prolonged virus persistence, as described in humans and mouse models? Have the authors analysed the presence of immunosuppressive mechanisms during the virus persistence phase (IL10, myeloid-derived suppressor cells)? Have the authors used T cell exhausting markers to immunophenotype ASFV Estonia-induced T cells?

      (4) A broader analysis of inflammatory mediators during the persistence phase would also be very informative. Is the presence of high VLs at late time points linked to a systemic inflammatory response? For instance, levels of IFN๏ก are still higher at 11 dpi than at baseline, but they are not analysed at later time points.

      (5) The authors observed a correlation between IL1b in serum before challenge and protection. The authors also nicely discuss the potential role of this cytokine in promoting memory CD4 T cell functionality, as demonstrated in mice previously. However, the cells producing IL1b before ASFV challenge are not identified. Might it be linked to virus persistence in some organs? This important issue should be discussed in the manuscript.

      (6) The lack of non-immunized controls during the challenge makes the interpretation of the results difficult. Has this challenge dose been previously tested in pigs of the age to demonstrate its 100% lethality? Can the low percentage of protected farm pigs be due to a modulation of memory T and B cell development by the persistence of the virus, or might it be related to the duration of the immunity, which in this model is tested at a very late time point? Related to this, how has the challenge day been selected? Have the authors analysed ASFV Estonia-induced immune responses over time to select it?

      (7) Also, non-immunized controls at 0 dpc would help in the interpretation of the results from Figure 2C. Do the authors consider that the pig's age might influence the immune status (cytokine levels) at the time of challenge and thus the infection outcome?

      (8) Besides anti-CD2v antibodies, anti-C-type lectin antibodies can also inhibit hemadsorption (DOI: 10.1099/jgv.0.000024). Please correct the corresponding text in the results and discussion sections related to humoral responses as correlates of protection. Also, a more extended discussion on the controversial role of neutralizing antibodies (which have not been analysed in this study), or other functional mechanisms such as ADCC against ASFV would improve the discussion.

    3. Reviewer #2 (Public review):

      Summary:

      In the current study, the authors attempt to identify correlates of protection for improved outcomes following re-challenge with ASFV. An advantage is the study design, which compares the responses to a vaccine-like mild challenge and during a virulent challenge months later. It is a fairly thorough description of the immune status of animals in terms of T cell responses, antibody responses, cytokines, and transcriptional responses, and the methods appear largely standard. The comparison between SPF and farm animals is interesting and probably useful for the field in that it suggests that SPF conditions might not fully recapitulate immune protection in the real world. I thought some of the conclusions were over-stated, and there are several locations where the data could be presented more clearly.

      Strengths:

      The study is fairly comprehensive in the depth of immune read-outs interrogated. The potential pathways are systematically explored. Comparison of farm animals and SPF animals gives insights into how baseline immune function can differ based on hygiene, which would also likely inform interpretation of vaccination studies going forward.

      Weaknesses:

      Some of the conclusions are over-interpreted and should be more robustly shown or toned down. There are also some issues with data presentation that need to be resolved and data that aren't provided that should be, like flow cytometry plots.

    4. Author Response:

      Reviewer #1 (Public review):

      The study by Lotonin et al. investigates correlates of protection against African swine fever virus (ASFV) infection. The study is based on a comprehensive work, including the measurement of immune parameters using complementary methodologies. An important aspect of the work is the temporal analysis of the immune events, allowing for the capture of the dynamics of the immune responses induced after infection. Also, the work compares responses induced in farm and SPF pigs, showing the latter an enhanced capacity to induce a protective immunity. Overall, the results obtained are interesting and relevant for the field. The findings described in the study further validate work from previous studies (critical role of virus-specific T cell responses) and provide new evidence on the importance of a balanced innate immune response during the immunization process. This information increases our knowledge on basic ASF immunology, one of the important gaps in ASF research that needs to be addressed for a more rational design of effective vaccines. Further studies will be required to corroborate that the results obtained based on the immunization of pigs by a not completely attenuated virus strain are also valid in other models, such as immunization using live attenuated vaccines.

      While overall the conclusions of the work are well supported by the results, I consider that the following issues should be addressed to improve the interpretation of the results:

      We thank Reviewer #1 for their thoughtful and constructive feedback, which will significantly contribute to improving the clarity and quality of our manuscript. Below, we respond to each of the reviewerโ€™s comments and outline the revisions we plan to incorporate.

      (1) An important issue in the study is the characterization of the infection outcome observed upon Estonia 2014 inoculation. Infected pigs show a long period of viremia, which is not linked to clinical signs. Indeed, animals are recovered by 20 days post-infection (dpi), but virus levels in blood remain high until 141 dpi. This is uncommon for ASF acute infections and rather indicates a potential induction of a chronic infection. Have the authors analysed this possibility deeply? Are there lesions indicative of chronic ASF in infected pigs at 17 dpi (when they have sacrificed some animals) or, more importantly, at later time points? Does the virus persist in some tissues at late time points, once clinical signs are not observed? Has all this been tested in previous studies?

      Tissue samples were tested for viral loads only at 17 dpi during the immunization phase, and long-term persistence of the virus in tissues has not been assessed in our previous studies. At 17 dpi, lesions were most prominently observed in the lymph nodes of both farm and SPF pigs. In a previous study using the Estonia 2014 strain (doi: 10.1371/journal.ppat.1010522), organs were analyzed at 28 dpi, and no pathological signs were detected. This finding calls into question the likelihood of chronic infection being induced by this strain.

      (2) Virus loads post-Estonia infection significantly differ from whole blood and serum (Figure 1C), while they are very similar in the same samples post-challenge. Have the authors validated these results using methods to quantify infectious particles, such as Hemadsorption or Immunoperoxidase assays? This is important, since it would determine the duration of virus replication post-Estonia inoculation, which is a very relevant parameter of the model.

      We did not perform virus titration but instead used qPCR as a sensitive and standardized method to assess viral genome loads. Although qPCR does not distinguish between infectious and non-infectious virus, it provides a reliable proxy for relative viral replication and clearance dynamics in this model. Unfortunately, no sample material remains from this experiment, but we agree that subsequent studies employing infectious virus quantification would be valuable for further refining our understanding of viral persistence and replication following Estonia 2014 infection.

      (3) Related to the previous points, do the authors consider it expected that the induction of immunosuppressive mechanisms during such a prolonged virus persistence, as described in humans and mouse models? Have the authors analysed the presence of immunosuppressive mechanisms during the virus persistence phase (IL10, myeloid-derived suppressor cells)? Have the authors used T cell exhausting markers to immunophenotype ASFV Estonia-induced T cells?

      We agree with the reviewer that the lack of long-term protection can be linked to immunosuppressive mechanisms, as demonstrated for genotype I strains (doi: 10.1128/JVI.00350-20). The proposed markers were not analyzed in this study but represent important targets for future investigation. We will address this point in the discussion.

      (4) A broader analysis of inflammatory mediators during the persistence phase would also be very informative. Is the presence of high VLs at late time points linked to a systemic inflammatory response? For instance, levels of IFNa are still higher at 11 dpi than at baseline, but they are not analysed at later time points.

      While IFN-ฮฑ levels remain elevated at 11 dpi, this response is typically transient in ASFV infection and likely not linked to persistent viremia. We agree that analyzing additional inflammatory markers at later time points would be valuable, and future studies should be designed to further understand viral persistence.

      (5) The authors observed a correlation between IL1b in serum before challenge and protection. The authors also nicely discuss the potential role of this cytokine in promoting memory CD4 T cell functionality, as demonstrated in mice previously. However, the cells producing IL1b before ASFV challenge are not identified. Might it be linked to virus persistence in some organs? This important issue should be discussed in the manuscript.

      We agree that identifying the cellular source of IL-1ฮฒ prior to challenge is important, and this should be addressed in subsequent studies. We will include a discussion on the potential link between elevated IL-1ฮฒ levels and virus persistence in certain organs.

      (6) The lack of non-immunized controls during the challenge makes the interpretation of the results difficult. Has this challenge dose been previously tested in pigs of the age to demonstrate its 100% lethality? Can the low percentage of protected farm pigs be due to a modulation of memory T and B cell development by the persistence of the virus, or might it be related to the duration of the immunity, which in this model is tested at a very late time point? Related to this, how has the challenge day been selected? Have the authors analysed ASFV Estonia-induced immune responses over time to select it?

      In our previous study, intramuscular infection with ~3โ€“6 ร— 10ยฒ TCIDโ‚…โ‚€/mL led to 100% lethality (doi: 10.1371/journal.ppat.1010522), which is notably lower than the dose used in the present study, although the route here was oronasal. The modulation of memory responses could be more thoroughly assessed in future studies using exhaustion markers. The challenge time point was selected based on the clearance of the virus from blood and serum. We agree that the lack of protection in some animals is puzzling and warrants further investigation, particularly to assess the role of immune duration, potential T cell exhaustion caused by viral persistence, or other immunological factors that may influence protection. Based on our experience, vaccine virus persistence alone does not sufficiently explain the lack-of-protection phenomenon. We will incorporate these important aspects into the revised discussion.

      (7) Also, non-immunized controls at 0 dpc would help in the interpretation of the results from Figure 2C. Do the authors consider that the pig's age might influence the immune status (cytokine levels) at the time of challenge and thus the infection outcome?

      We support the view that including non-immunized controls at 0 dpc would strengthen the interpretation of cytokine dynamics and will consider this in future experimental designs. Regarding age, while all animals were within a similar age range at the time of challenge, we acknowledge that age-related differences in immune status could influence baseline cytokine levels and infection outcomes, and this is an important factor to consider.

      (8) Besides anti-CD2v antibodies, anti-C-type lectin antibodies can also inhibit hemadsorption (DOI: 10.1099/jgv.0.000024). Please correct the corresponding text in the results and discussion sections related to humoral responses as correlates of protection. Also, a more extended discussion on the controversial role of neutralizing antibodies (which have not been analysed in this study), or other functional mechanisms such as ADCC against ASFV would improve the discussion.

      The relevant text in the Results and Discussion sections will be revised accordingly, and the discussion will be extended to more thoroughly address the roles of antibodies.

      Reviewer #2 (Public review):

      Summary:

      In the current study, the authors attempt to identify correlates of protection for improved outcomes following re-challenge with ASFV. An advantage is the study design, which compares the responses to a vaccine-like mild challenge and during a virulent challenge months later. It is a fairly thorough description of the immune status of animals in terms of T cell responses, antibody responses, cytokines, and transcriptional responses, and the methods appear largely standard. The comparison between SPF and farm animals is interesting and probably useful for the field in that it suggests that SPF conditions might not fully recapitulate immune protection in the real world. I thought some of the conclusions were over-stated, and there are several locations where the data could be presented more clearly.

      Strengths:

      The study is fairly comprehensive in the depth of immune read-outs interrogated. The potential pathways are systematically explored. Comparison of farm animals and SPF animals gives insights into how baseline immune function can differ based on hygiene, which would also likely inform interpretation of vaccination studies going forward.

      Weaknesses:

      Some of the conclusions are over-interpreted and should be more robustly shown or toned down. There are also some issues with data presentation that need to be resolved and data that aren't provided that should be, like flow cytometry plots.

      We appreciate the feedback from the Reviewer #2 and acknowledge the concerns raised regarding data presentation. In the revised manuscript, we will clarify our conclusions where needed and ensure that interpretations are better aligned with the data shown.

    1. eLife Assessment

      This study presents a potentially fundamental analysis of a fossil feather from a 125-million-year-old enantiornithine bird. Using sophisticated 3D microscopic and numerical methods, the authors conclude that the feather was iridescent and brightly colored, possibly indicating that this was a male bird that used its crest in sexual displays. At present, the strength of evidence supporting the conclusions is considered incomplete based on methodological shortcomings and questions about taphonomy.

    2. Reviewer #1 (Public review):

      Summary:

      Li et al describe a novel form of melanosome based iridescence in the crest of an Early Cretaceous enantiornithine avialan bird from the Jehol Group.

      This is an interesting manuscript that describes never before seen melanosome structures and explores fossilised feathers through new methods. This paper creates an opening for new work to explore coloration in extinct birds.

      Strengths:

      A novel set of methods applied to the study of fossil melanosomes.

      Comments on revised version:

      The authors provided a response to the previous 9 issues, for which additional response is provided here:

      (1) I respectfully disagree with the authors justification regarding the crest. They show one specimen of Confuciusornis with short feathers (which appears to be a unique feature of this species, possibly related to the fact it is beaked) but what about the more primitive Eoconfuciusornis, a referred specimen of which superficially has an enormous "crest" (Zheng et al 2017), as does Changchengornis (Ji et al 1999). Regardless, it would make more sense compare this new specimen to other enantiornithines. Although limited by the preservation of body feathers, which is not all that common, the following published enantiornithines also exhibit a "crest": bohaiornithid indet. (Peteya et al 2017); Brevirostruavis (Li et al 2021); Dapingfangornis (Li et al 2006); Eoenantiornis (Zhou et al 2005); Grabauornis (Dalsatt etal 2014); Junornis (Liu et al 2017); Longirostravis (Hou etal 2004); Monoenantiornis (Hu & O'Connor 2016); Neobohaiornis (Shen etal 2024); Orienantiornis (Liu etal 2019); Parabohaironis (Wang 2023); Parapengornis (Hu etal 2015); Paraprotopteryx (Zheng et al 2007); and every specimen of Protopteryx. In fact, every single published enantiornithine that preserves any feathering on the head has the feathers preserved perpendicular to the bone (in fact, the body feathers on all parts of the bed are splayed at a right angle to the bone due to compression), as shown in the confuciuornis specimen image provided by the authors. Since it is highly improbable they all had crests, the authors have no justification for the interpretation that this new specimen was crested. This does not mean that the feathers were not iridescent or take away from the novel methods these authors have used to explore preserved feathers.

      (2) Yes, this is possible, but see above for the very strong argument against interpretation of these feathers as forming a crest.

      (3) This just further makes the point that the isolated feather is not likely from the head. Since the neck feathers are missing, it is more likely that it is these feathers that have been disarticulated (and sampled) from the neck region rather than from the very complete looking head feathers; this has significant implications with regards to the birds colour pattern.

      (4) Thank you for acknowledging taphonomy.

      (5) An interesting hypothesis and one I look forward to seeing explored in the future.

      (6) Since the compression is in a single direction, in fact it is not reasonable to assume that distortion would be random. One might predict similar distortion, as with the feathers (spread out from the bone at a 90หš angle) and bone (crushed), which are all compressed in a single direction. However, I agree that such a consistent discovery suggests it is not an artifact of preservation, and only further studies will elucidate this

      (7) I still fail to detect this hexagonal pattern - could machine learning be used to quantify this pattern? The random arrangement of white arrows does little to clarify the authors interpretations.

      (8) Great to see additional sampling

      (9) Thank you for the explanation.

    3. Reviewer #3 (Public review):

      Summary:

      The paper presents an in-depth analysis of the original colour of a fossil feather from the crest of a 125-million-year-old enantiornithine bird. From its shape and location, it would be predicted that such a feather might well have shown some striking colour and pattern. The authors apply sophisticated microscopic and numerical methods to determine that the feather was iridescent and brightly coloured, and possibly indicates this was a male bird that used its crest in sexual displays.

      Strengths:

      The 3D micro-thin-sectioning techniques and the numerical analyses of light transmission are novel and state of the art. The example chosen is a good one, as a crest feather likely to have carried complex and vivid colours as a warning or for use in sexual display. The authors correctly warn that without such 3D study feather colours might be given simply as black from regular 2D analysis, and the alignment evidence for iridescence could be missed.

      Weaknesses: Trivial

    4. Author response:

      The following is the authorsโ€™ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Li et al describe a novel form of melanosome based iridescence in the crest of an Early Cretaceous enantiornithine avialan bird from the Jehol Group.

      Strengths:

      Novel set of methods applied to the study of fossil melanosomes.

      Weaknesses:

      (1) Firstly, several studies have argued that these structures are in fact not a crest, but rather the result of compression. Otherwise, it would seem that a large number of Jehol birds have crests that extend not only along the head but the neck and hindlimb. It is more parsimonious to interpret this as compression as has been demonstrated using actuopaleontology (Foth 2011).

      Firstly, we respectfully acknowledge the reviewerโ€™s interpretation.

      However, the new specimen we report here is distinct as preserved from Confuciusornis (Foth 2011), which belongs to a different clade and exhibits a differently preserved feather crest of a different shape compared to the species described in this study. Figure 3a Foth 2011, Palรคontologische Zeitschrift๏ผ›the cervical feather is much longer than feather from head region in the specimen the referee talked about; It is quite incompletely preserved and much shorter in proportional length (relative to the skull) than the specimen we sampled (see picture below).

      Author response image 1.

      Our new specimen with well-preserved and the feather crest were interpretated as the originally shaped๏ผ›the cervical feather is largely absent or very short

      In the new specimen there is a large feather crest that gradually extends from the cranial region of the fossil bird, rather than the cervical region, as observed in the previously proposed Confuciusornis crest. The feather crest extends in a consistent direction (caudodistally), and the feathers in the head region of the bird are exceptionally well-preserved, retaining their original shape. The feathers are measured about 1- 2cm at their longest barb. Feathers in the neck are much shorter (see Confuciusornisย  picture above).

      (2) The primitive morphology of the feather with their long and possibly not interlocking barbs also questions the ability of such feathers to be erected without geologic compression.

      We acknowledge that the specimen must have undergone some degree of compression during diagenesis and fossilization. Given that the rachis itself is already sufficiently thick (that the ligaments everting a crest would attach to), we conclude that it had the structural integrity to remain erect on the skull.

      (3) The feather is not in situ and therefore there is no way to demonstrate unequivocally that it is indeed from the head (it could just as easily be a neck feather)

      We conclude that it belongs to the head based on the similar suture, overall length, and its close position to the caudal part of the head. There are no similar types of feathers nearby, such as those found on the neck or other areas, which is why we reason that it is a head crest feather. Besides, the shape of the feather we sampled is dramatically different from the much softer and shorter ones detected on the neck.

      In addition, we further sampled the crest feather barb from in situ preserved feather crest. We also detected a similar pattern to what we originally found regarding the packing of melanosomes. This is now added to the text.

      (4) Melanosome density may be taphonomic; in fact, in an important paper that is notably not cited here (Pan et al. 2019) the authors note dense melanosome packing and attribute it to taphonomy. This paper describes densely packed (taphonomic) melanosomes in non-avian avialans, specifically stating, "Notably, we propose that the very dense arrangement of melanosomes in the fossil feathers (Fig. 2 B, C, and G-I, yellow arrows) does not reflect in-life distribution, but is, rather, a taphonomic response to postmortem or postburial compression" and if this paper was taken into account it seems the conclusions would have to change drastically. If in this case the density is not taphonomic, this needs to be justified explicitly (although clearly these Jehol and Yanliao fossils are heavily compressed).

      We have added a line acknowledging this possibility. We have accounted for the shrinkage effects caused by heat and compression, as detailed in our Supplementary Information (SI) file. Even when these changes are considered, they do not alter the main conclusions of our study. Besides given most melanosomes we used for simulation are mostly complete and well preserved๏ผŒwe consider the distortion is rather limited or at least minor compared to changes seen in taxonomic experiment shown.

      (5) Color in modern birds is affected by the outer keratin cortex thickness which is not preserved but the authors note the barbs are much thicker (10um) than extant birds; this surely would have affected color so how can the authors be sure about the color in this feather?

      In extant birds, feather barbs of similar size are primarily composed of air spaces and quasi-ordered keratin structures, largely lacking dense melanosomes. The color-producing barb we have described here does not directly correspond to a feather type in modern birds for comparison. Since there is no direct extant analog to inform the keratin thickness and similar melanosome density, we utilize advanced 3-D FDTD modeling approach to the question of coloration reconstruction, rather than relying on statistical DFA approaches. In additional to packed melanosomes, the external thin keratin cortex layer is also considered for the simulation.

      Additionally, even in the thinner melanosome-packed layers of barbules in living birds, iridescent coloration often is observed (e.g., Rafael Maia J. R. Soc. Interface 2009). This further supports the plausibility of our modeling approach and its relevance to understanding coloration in both extinct and extant species.

      (6) Authors describe very strange shapes that are not present in extant birds: "...different from all other known feather melanosomes from both extant and extinct taxa in having some extra hooks and an oblique ellipse shape in cross and longitudinal sections of individual melanosome" but again, how can it be determined that this is not the result of taphonomic distortion?

      We consistently observed similar hook-like structures not only in this feather but also in feathers from different positions of the crest. We do not believe that distortion would produce such a regular and consistent pattern; instead, distortion likely would result in random alterations, as demonstrated by prior taphonomic experiments.

      (7) The authors describe the melanosomes as hexagonally packed but this does not appear to be in fact the case, rather appearing quasi-periodic at best, or random. If the authors could provide some figures to justify this hexagonal interpretation?

      To further validate the regional hexagonal pattern, we expanded our sampling to additional sites. We observed similar patterns not only in various regions of the same barb but also across different feathers (see added SI Figures below). This extensive sampling supports the validity of the melanosome patterns identified in our original analysis.

      (8) One way to address these concerns would be to sample some additional fossil feathers to see if this is unique or rather due to taphonomy

      We sampled additional areas from the same feather as well as feathers from other regions of the head crest. The packing patterns are generally similar with slight variations in size (figure S6).

      (9) On a side, why are the feet absent in the CT scan image? "

      To achieve better image resolution, the field of view was adjusted, resulting in part of the feet being excluded from the CT scan.

      Reviewer #2 (Public review):

      Summary:

      The authors reconstructed the three-dimensional organization of melanosomes in fossilized feathers belonging to a spectacular specimen of a stem avialan from China. The authors then proceed to infer the original coloration and related ecological implications.

      Strengths:

      I believe the study is well executed and well explained. The methods are appropriate to support the main conclusions. I particularly appreciate how the authors went beyond the simple morphological inference and interrogated the structural implications of melanosome organization in three dimensions. I also appreciate how the authors were upfront with the reliability of their methods, results, and limitations of their study. I believe this will be a landmark study for the inference of coloration in extinct species and how to interrogate its significance in the future.

      We thank the referee for these positive comments.

      Weaknesses:

      I have a few minor comments.

      Introduction: I would suggest the authors move the paragraph on coloration in modern birds (lines 75-97) before line 64, as this is part of the reasoning behind the study. I believe this change would improve the flow of the introduction for the general reader.

      We thank the referee for the suggestion, and we made changes accordingly to improve the flow of introduction.

      Melanosome organization: I was surprised to find little information in the main text regarding this topic. As this is one of the major findings of the study, I would suggest the authors include more information regarding the general geometry/morphology of the single melanosomes and their arrangement in three dimensions.

      We thank the referee for this suggestion. We elaborated on the details of the melanosomes in the results as follows:

      Hooks are commonly observed on the oval-shaped melanosomes in cross-sectional views, with two dominant types identified on the dorsal and ventral sides (Figure 3c-d, red arrows). These hooks are deflected in opposing directions, linking melanosomes from different arrays (dorsal-ventral) together. The major axis(y) of the oval-shaped melanosomes (mean = 283 nm) is oriented toward the left side in cross-section, while the shorter axis(x) measures approximately 186 nm (Table S2). In oblique or near-longitudinal sections (Figure 3e-f), the hooked structuresโ€™ connections to the distal and proximal sides of neighboring melanosomes are clearly visible (blue arrows, Figure 3f). A similar pattern occurs in two additional regions of interest within the same feather (figure S5). Although the smaller proximal hooks in these sections are less distinct, this may reflect developmental variation during melanosome formation along the feather barb. Significantly smaller hooks were also observed in cross-sections of in-situ feather barbs from the anterior side of the feather crest (figure S6). The mean long axis (z) of the melanosomes is approximately 1774 nm (Table S2). Based on these observations, we propose that the hooked structuresโ€”particularly those on the dorsal, ventral, proximal, and distal sides of the melanosomesโ€”enhance the structural integrity of the barb (figure S7). However, these features may be teratological and unique to this individual, as no similar structures have been reported in other sampled feathers. These hooks may stabilize the stacked melanosome rods and contribute to increased barb dimensions, such as diameter and length. The sections exhibit modified (or asymmetric) hexagonally packed melanosomes with presence of extra hooked linkages (Figure 3c-d and e-f). The long rod-like melanosomes are different from all other known feather melanosomes from both extant and extinct taxa in having some extra hooks and an oblique ellipse shape in cross and longitudinal sections of individual melanosomes (Durrer 1986, Zhang, Kearns et al. 2010). The asymmetric packing of the melanosomes (the major axis leans leftward) played a major role in the reduction of fossilized keratinous matrix within the barbs, which may correspond to a novel structural coloration in this extinct bird. The close packed hexagonal melanosome pattern found in extant avian feathers yield rounded melanosome outlines in contrast to the oval-shaped melanosomes (see figure S8, x<y) in the perpendicular section here. The asymmetric compact hexagonal packing (ACHP) of the melanosomes is different from the known pattern of melanosomes formed in the structure of barbules among extant birds (Eliason and Shawkey 2012), which has been seen as a regular hexagonal organization. The packing of the melanosomes in an asymmetric pattern, on the microscopic level, might be related to the asymmetrical path of the barb extension direction observed at the macroscopic level (figure S5).

      Added Supplemental figure S5. STEM images of cross-sections taken from three different positions (indicated by white dashed lines in a) demonstrate similar melanosome packing styles. Dashed-lines labeled in (a) indicate where the corresponding position of these sections were taken, black arrows indicate the individual barbs that accumulated together in this long crest father. One distinct feature of these sections is the hooked-link structure that aligns the melanosomes into a modified hexagonal, packed arrangement. White arrows (in c, e, g) indicate the hooked structures observed in the selected melanosomes.

      Added Supplemental figure S6. STEM images showing melanosome structure from three fragments of the feather crest (indicated by dashed lines and white box in a) reveal the hooked linkages between melanosomes and their surrounding melanosomes structures in (b), (c) and (d). Due to the shorter length of these feather barbs, the hook structures are not as well-defined as those in the longer feather samples shown in the main text.

      Keratin: the authors use such a term pretty often in the text, but how is this inference justified in the fossil? Can the authors extend on this? Previous studies suggested the presence of degradation products deriving from keratin, rather than immaculated keratin per se.

      We changed to keratinous matrix and material instead. We observed matrix/material in between these melanosomes were filled by organic rich tissue that is proposed to possibly be taphonomically altered keratin.

      Ontogenetic assessment: the authors infer a sub-adult stage for the specimen, but no evidence or discussion is reported in the SI. Can the authors describe and discuss their interpretations?

      Thanks for the suggestion. We made an osteo-histological section and add our evaluation of the histology of the femoral bone tissue sampled from the specimen to justify assessment of its ontogenetic stage.

      See Supplemental figure S2 for Femur Osteo-Histology

      SI file Femur Osteo-Histology

      Ground sections were acquired from the right side of the femur to assess the osteo-histological features of the bone and its ontogenetic stage. As shown in figure S2, long, flat-shaped lacunae are widely present and densely packed throughout the major part of the bone section. Very few secondary osteocytes are present, and parallel-fibered bone tissue is underdeveloped. The flattened osteocyte lacunae dominate the cellular shape, with observable vascular canals connecting different lacunae. Overall, the osteo-histology indicates that the bird was still in an active growth stage at the time of death, suggesting it was in its sub-adult growth phase.

      CT scan data: these data should be made freely available upon publication of the study.

      We will release our CT scanning on an open server (https://osf.io/kw7sd/) along with the final version of the manuscript.

      Reviewer #3 (Public review):

      Summary:

      The paper presents an in-depth analysis of the original colour of a fossil feather from the crest of a 125-million-year-old enantiornithine bird. From its shape and location, it would be predicted that such a feather might well have shown some striking colour and pattern. The authors apply sophisticated microscopic and numerical methods to determine that the feather was iridescent and brightly coloured and possibly indicates this was a male bird that used its crest in sexual displays.

      Strengths:

      The 3D micro-thin-sectioning techniques and the numerical analyses of light transmission are novel and state-of-the-art. The example chosen is a good one, as a crest feather is likely to have carried complex and vivid colours as a warning or for use in sexual display. The authors correctly warn that without such 3D study feather colours might be given simply as black from regular 2D analysis, and the alignment evidence for iridescence could be missed.

      Weaknesses: Trivial.

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      In a few places, the paper can be strengthened:

      Dimensionality of study method: In the first paragraph, you set things up (lines 60-62) to say that studies hitherto have been of melanosomes and packing in two dimensions... and I then expect you to say soon after, in the next paragraph, 'Here, we investigate a fossil feather in three dimensions...' or some such, but you don't.

      You come back to Methods at the end of the Introduction (lines 97-101), but again do not say whether you model the feather in three dimensions or not. Yes, you did - I finally learned at line 104 - you did micro serial sectioning. This needs to shift a long forward into the Introduction.

      Thanks for the suggestions, we utilize serial sectioning to get a different view of the microbodies that are proposed to be melanosomes and reconstructed the three-dimensional volume of the melanosomes, as well as the intercalated keratin.

      We restructured the introduction and make clear that the three-dimensional data obtained in this study also was used for modeling and in a more anterior position in the text.

      In the Results, there are not enough references to images. It's not enough to refer generally to 'Figures 3c-f' [line 133] and then go on to rapidly step through some amazing imagery (text lines 133-146) - you need to add an image citation to each observation so readers can know exactly which image is being described each time.

      We elaborated our description of imaging to better describe the melanosomes in our results section. We add the description of the stack of melanosomes as IN Above (reply of Reviewer #2).

      The 3D data in Figures 3 and 4 is great and based on huge technical wizardry. The sketch model in Figure 4a is excellent, but could you not attempt an actual 3D block diagram showing the hexagonal arrangement of clusters of aligned melanosomes?

      We have also tried FIB -SEM in an additional place for validation of our ultrathin sections data. See the SI file.

      Added figure S7. Targeted feather barb block prepared in FIB-SEM, with volume rendering reconstruction based on the acquired sequential cross-sectional images; the volume reconstruction is visualized in the x-y plane (c-cross section view) and in x-z plane (d-sagittal section view).

      Modified Figure S8d shows the 3D model of aligned melanosomes. To show the arrangement more clearly, the schematic XY cross-section of the melanosomes 3D model is shown below (also shown in Supplementary Figure S8d).

      35: delete 'yield'

      Changed

      73: 'feather fell' ? = 'feather that has fallen'

      Changed

      305: excises ?= exercises

      Changed

    1. eLife Assessment

      This important study explores the regulation of collective cell migration and tissue patterning in the zebrafish posterior lateral line primordium by SoxB1 transcription factors. The authors provide evidence that SoxB1 genes interact with Wnt and Fgf signaling pathways to control neuromast deposition and spacing, a process central to sensory organ development. The work offers mechanistic insight into the self-organization of migrating tissues and adds to the understanding of how transcriptional networks integrate with signaling pathways during morphogenesis. However, the strength of the evidence supporting several key conclusions is incomplete due to insufficient validation of mutant and knockdown tools, lack of quantitative analysis, and unclear experimental design details; additional quantification and more rigorous verification of gene knockdown or loss-of-function tools are needed to support the proposed model.

    2. Reviewer #1 (Public review):

      Summary:

      Palardy and colleagues examine how transcription factors of the SoxB1 family alter patterning within the zebrafish posterior lateral line primordium and subsequent formation of neuromast organs along the body of the developing fish. They describe how expression of soxb genes changes when Wnt and Fgf signaling pathways are altered, and in addition, how outputs of these signalling pathways change when soxb gene expression is disrupted. Together, experiments suggest a model where the expression of SoxB genes counteracts Wnt signaling. Support comes from the combined inhibition of both pathways, partially restoring the pattern of neuromast deposition. Together, the work reveals an additional layer of control over Wnt and Fgf signals that together ensure proper posterior lateral line development.

      Strengths:

      The authors provide a clear analysis of changes in RNA expression after systematic manipulation of gene expression and signaling pathways to construct a plausible model of how Sox factors regulate primordium patterning.

      Weaknesses:

      There is little attempt to capture the variation of expression patterns with each manipulation. Photomicrographs are examples, with little quantification.

      While the combined loss of soxb functions shows more severe phenotypes, it is not exactly clear what underlies the apparent redundancy. It would be helpful if the soxb gene family member expression was reported after loss of each. Expression of sox1a is shown in sox2 mutants in Figure 4, but other combinations are not reported. This additional analysis would clarify whether there are alterations in expression that influence apparent redundancy.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript seeks to determine the molecular basis of tissue patterning in the collectively migrating cells of the zebrafish posterior lateral line primordium. In particular, the authors examine the cross-regulation of canonical Wnt signaling, Fgf signaling, and the SoxB1 family members Sox1a, Sox2, and Sox3 in the migrating primordium. Using a combination of mutant lines, morphino (MO) knock down, pharmacological inhibition, and dominant-negative inhibition, the authors propose a model in which Sox2 and Sox3 in the trailing region of the primordium restricts Wnt signaling to the leading region, facilitating the formation of rosettes and the deposition of the first formed neuromast downstream of Fgf pathway activity. In contrast, sox1a is expressed in the leading region of the primordium, and the sox1ay590 -/- mutant shows little phenotype on its own. Together, the authors propose a multistep signaling loop that regulates tissue patterning during lateral line collective cell migration.

      Strengths:

      The zebrafish posterior lateral line primordium is a well-established model for the study of collective cell migration that is useful for genetic manipulation and live imaging. The manuscript seeks to understand the complex reciprocal regulation of signaling pathways that regulate tissue patterning of collectively migrating cells.

      Weaknesses:

      (1) The primary tools used in this study are inadequate to support the author's conclusions.

      A. The authors state that the phenotype of the sox2y589 homozygous mutant line described in this manuscript changed across generations, but do not specify which generation is used for any given experiment. The sox2y589 mutant line is not properly verified in this manuscript, which could be done by examining ant-Sox2 antibody labeling, Western blot analysis, or complementation to the existing sox2x50 line described in Gou et al., 2018a and Gou et al., 2018b. There are also published sox1a mutant lines Lekk, et al., 2019.

      B. The authors acknowledge that the sox2 MO1 used in this manuscript also alters sox3 function, but do not redo the experiments with a specific sox2 MO. In addition, the authors show that the anti-Sox2 and anti-Sox3 antibody labeling is reduced but not absent in sox2 MO1 and sox3 MO-injected embryos, but do not show antibody labeling of the sox2 MO and sox3 MO-double injected embryos to determine if there is an additional knockdown.

      C. The authors examine RNA in situ hybridization patterns of sox2 and sox3 following various manipulations, but do not use anti-Sox2 and anti-Sox3 antibody labeling, which would provide more quantifiable information about changes in patterning.

      (2) The manuscript lacks important experimental details and appropriate quantification of results.

      A. It is unclear for most of the experiments described in this manuscript how many individual embryos were examined for each experiment and how robust the results are for each condition. Only Figure 3 includes information about the numbers for each experiment, and in all cases, the experimental manipulations are not fully penetrant, and there is no statistical analysis.

      B. It is not clear at what stage most of the RNA in situ hybridizations were performed.

      C. The manuscript lacks quantification of many of the experiments, making it difficult to conclude their significance.

    4. Reviewer #3 (Public review):

      Summary:

      This study aims to understand the molecular underpinnings of the complex process of periodic deposition of the neuromast organs of the embryonic posterior lateral line (PLL) sensory system in zebrafish. It was previously established that Fgf signaling in the trailing zone of the migrating PLL primordium is key to protoneuromast establishment, while Wnt signaling in the leading zone must be downregulated to allow new Fgf signaling-dependent protoneuromasts to form. Here, the authors evaluate the role of three SoxB transcription factors (Sox1a, Sox2, and Sox3) in this complex process, generating two novel CRISPR mutants as part of their study. They interrogate the interplay of the SoxB genes with the Fgf and Wnt signaling pathways during PLL primordium migration, using a combination of genetics, knockdown, and imaging approaches, including live time-lapse studies. They report a key role for the SoxB genes in regulating the pace of protoneuromast maturation as the primordium migrates, thus ensuring appropriate deposition and spacing of the neuromast organs.

      Strengths:

      Strengths of the study are the careful quantitative analysis. based on imaging approaches, of the impact of mutation or knockdown of SoxB genes, coupled with the use of heat shock inducible dominant negative strategies to address how SoxB genes interact with Wnt and Fgf signaling. Functional analyses convincingly uncover a SoxB regulatory network that serves to limit Wnt activity, as directly read out with a live Wnt reporter. The finding that Wnt inhibition (achieved using pharmacological reagents) rescues the SoxB deficiency phenotype provides compelling evidence of the centrality of the Wnt pathway in mediating SoxB function. Use of atoh1 markers to track the stages of development of the neuromasts provides an effective approach to following their maturation, and allows the authors to explore how SoxB/Wnt interplay ultimately translates into the establishment of functional neuromasts. Finally, loss of Sox2 function, together with loss of either Sox1a or Sox3, blocks maturation of the neuromasts, clearly establishing redundancy between these SoxB family genes.

      The concepts introduced and explored in this study - of complex gene networks that work within a dynamic cellular environment to enable self-organization and ultimately stabilization of cell fate choices-provide a useful conceptual framework for future studies. This study is therefore of relevance to understanding the morphogenesis of self-organizing tissues more broadly.

      Weaknesses:

      A minor weakness is the use of SoxB morpholino (MO) knockdown reagents, which are interspersed with mutant analyses. Although the stable mutants are available, they would be challenging to couple with the reporter transgenes used for some of the experiments, providing a reasonable rationale for the use of MO reagents (although the authors don't overtly provide this rationale). Moreover, reduced penetrance of the Sox2 mutants over multiple generations is noted, but no detailed explanation for this finding is offered.

      Given that the expression patterns of Sox1a and Sox3 are not merely different but are largely reciprocal, the mechanistic basis of their very similar double mutant phenotypes with Sox2 remains opaque. Related to this, the authors discuss that Sox1a/Sox2 double knockdown produces a more severe phenotype than Sox2/Sox3 double knockdown, yet this difference is not obviously reflected in the data, some of which is not shown.

    5. Author response:

      We would like to thank the three reviewers for the careful review and thoughtful comments on our manuscript. In addition to providing useful suggestions, they uncovered some embarrassing oversights on our part, related to experimental details including number of embryos, and quantification of variance in the observed changes for some of the experiments, which were inadvertently omitted in the submission. We provide below an initial response to the reviewerโ€™s public reviews and expect to submit a revised manuscript comprehensively addressing all their concerns.

      I would like to start by addressing some of their most critical comments related to validation of the tools used to reduce soxB1 gene family function in the embryo.ย  In the absence of the critical supplementary data that we inadvertently failed to include, the reviewers were left with an understandable, but we feel erroneous impression, that there was insufficient validation of mutant and knockdown tools.ย 

      Reviewer #2 says โ€œThe sox2y589 mutant line is not properly verified in this manuscript, which could be done by examining ant-Sox2 antibody labeling, Western blot analysis orโ€ฆโ€

      This validation, which had been performed previously both with antibody staining and with western blot analysis, was inadvertently omitted from the supplementary data submitted with the paper. The western blot data is shown here.

      Author response image 1.

      Validation of sox2 mutant phenotype with Western blot.

      Lysates were prepared from 25 embryos selected as wild type or potentially mutant based on the โ€œloss of L1โ€ phenotype at 6 dpf. This polyclonal antibody recognizes within the last 16 amino acids of the C-terminal.

      Author response image 2.

      Validation of sox2 mutant phenotype with antibody staining.

      Though in this experiment there was considerable background in the red channel, and it shows the lateral line nerve, loss of nuclear Sox2 expression is evident in the deposited neuromast of an embryo identified as a mutant based on its delayed deposition of the L1 neuromast.

      This data and a repeat of the antibody staining showing the primordium with loss of Sox2 will be included in a revised manuscript.

      Furthermore, Reviewer #2 comments โ€œthe authors show that the anti-Sox2 and antiSox3 antibody labeling is reduced but not absent in sox2 MO1 and sox3 MO-injected embryos, but do not show antibody labeling of the sox2 MO and sox3 MO-double injected embryos to determine if there is an additional knockdownโ€

      This will be included in a revised manuscript.

      Reviewer #2:

      The authors acknowledge that the sox2 MO1 used in this manuscript also alters sox3 function, but do not redo the experiments with a specific sox2 MO

      This is not exactly true. Having discovered sox2 MO1 simultaneously reduces sox2 and sox3 function, three new morpholinos were obtained based on another paper (Kamachi et al 2008), which had quantitatively assessed efficacy of three sox2 specific morpholinos (sox2 MO2, sox2 MO3, and sox2 MO4). The effects of these morpholinos on the pattern of L1 deposition was compared to that of sox2 MO1. This comparison was shown in supplementary Figure 2 and is included below. It shows that the sox2 specific morpholinos resulted in a poorly penetrant delay in deposition of L1, comparable to that of a sox2 mutant, which was quantified in supplementary Figure 3B. The observations with these three sox2 specific morpholinos independently supported the observations made with the sox2 mutant that reduction of sox2 on its own results in a delay in deposition of the first neuromast with low penetrance and that to effectively examine the role of these SoxB1 genes in the primordium their function needs to be compromised in a combinatorial manner. A conclusion that was independently supported by observations made by crossing sox1a, sox2 and sox3 mutants (Figure 3 and Supplementary Figure 3). Therefore, even though the initial use of a sox2 morpholino, which simultaneously knocks down sox3, was unintentional, its use turned out to be useful. It allowed us to examine effects of knocking down sox2 and sox3 with a single morpholino. Furthermore, though this project was initiated more than 15 years ago to specifically understand sox2 function, our focus had shifted to understanding the role of soxB1 family members sox1a, sox2 and sox3 functioning together as an interacting system that regulates Wnt activity in the primordium. Considering this broader focus, reflected in the title of the paper, it was not a priority to repeat every experiment previously done with the sox2MO1 with the new sox2 specific morpholinos. Instead, having acknowledged the โ€œlimitationsโ€ of sox2MO1, we used it to better understand effects of combinatorial reduction of SoxB1 function.

      Reviewer #1:

      It is not exactly clear what underlies the apparent redundancy. It would be helpful if the soxb gene family member expression was reported after loss of each.

      As suggested by reviewer #1, we had previously looked changes in expression of each of the soxB1 factors following loss of individual soxB1 factors but not included it in the supplementary data with the original submission. Independent of a reproducible and consistent expansion sox1a expression into the trailing zone, following loss of sox2 function, which is reported in the paper and quantified here where 10/10 mutant embryos showed the expansion (compare region within bracket in WT and sox2<sup>-/-</sup>), no consistent changes in the expression of other soxB1 family members was observed as part of a mechanism that might account for compensation when function of a particular soxB1 factor is soxB1 factor is lost. The data shown above together with more extensive quantification of changes will be included in a revised version of the manuscript. At this time the only consistent change was the expansion of sox1a to the trailing zone when lost. The data trailing zone when sox2 function is lost. This change reflects dependence of sox1a on Wnt activity and the fact that Wnt activity expands into the trailing zone when sox2 function is lost. ย 

      Author response image 3.

      Reviewer #3:

      Given that the expression patterns of Sox1a and Sox3 are not merely different but are largely reciprocal, the mechanistic basis of their very similar double mutant phenotypes with Sox2 remains opaque.

      The simplest way to think about compensation for gene function in a network is to think of it being determined by expression of a homolog or another gene with a similar function being expressed in a similar or overlapping domain.ย  However, it is more useful to think of Sox2 function in the primordium as part of a interacting network of SoxB1 factors whose differential regulatory mechanisms create a robust system that simultaneously regulates two key aspects of Wnt activity in the primordium; how high Wnt activity is allowed to get in the leading zone and how effectively it is shut off to facilitate protoneuromast maturation in the trailing zone. These features of Wnt activity influence both when and where nascent protoneuromasts will form in the wake of a progressively shrinking Wnt system and where they undergo effective maturation and stabilization prior to deposition. Changes in individual SoxB1 expression patterns provide some hints about how some SoxB1 factors may compensate when function of one or more of these factors is compromised. However, a deeper understanding of robustness and โ€œcompensationโ€ will require a systems level understanding of this gene regulatory network with computational models, which we are currently working on in our group. It remains possible, for example, that how far into the trailing zone the Wnt activity has an influence is regulated at least in part by how high it is allowed to get in the leading zone by sox1a. Conversely, how high Wnt activity gets in the leading zone may be influenced by how effectively it is shut off in the trailing zone by sox2 and sox3, as this influences the size of the Wnt system, which in turn can influence the overall level of Wnt activity. In this manner Sox1a may cooperate with Sox2 and Sox3 to limit both how high Wnt activity is allowed to get in the primordium and to effectively shut it off in the trailing zone.

      Reviewer #3:

      Related to this, the authors discuss that Sox1a/Sox2 double knockdown produces a more severe phenotype than Sox2/Sox3 double knockdown, yet this difference is not obviously reflected in the data.

      The severity of the sox1a/sox2 double mutant phenotype compared to that of the sox2/sox3 double mutant is shown in Figure 3 K and N, and quantified in Supplementary Figure 3A. Simultaneous loss of sox2 and sox3 results in a small but relatively penetrant delay in where the first stable neuromast is deposited (Figure 2 N). By contrast, loss of sox2 and sox1a together consistently results in a longer delay in deposition of the first stable (Figure 2 K). A new graph, shown below, which will be incorporated in the revised paper, shows that there is a significant difference in the pattern of L1 deposition in sox1a<sup>-/-</sup>, sox2<sup>-/-</sup> and sox2<sup>-/-</sup>, sox3<sup>-/-</sup> double mutants.ย 

      Author response image 4.

      All 3 datasets found to be normally distributed by Shapiro-Wilk test. 1-way ANOVA showed significance (<0.0001), with Tukeyโ€™s multiple comparisons test showing significant difference between all 3 conditions. (***p=0.0008, ****p<0.0001)

      Reviewer #1:

      It would be good to more clearly state why sox3 is not regulated by Wnt given its expression is inhibited by the delta TCF construct (Figure 2M).

      The explanation for why we believe sox3 expression is determined by Fgf signaling, and not Wnt activity requires integrating what is observed both with induction of the delta TCF construct and the dominant negative Fgf receptor (DN FgfR). Loss of sox3 expression with induced expression of the delta TCF construct could result from loss of Wnt activity or the downstream loss of Fgf activity, which is ultimately dependent on Fgfs secreted by Wnt active cells in the leading domain. Distinguishing between these possibilities is based on inhibition of FGF signaling with the DN FgfR, described in the next paragraph. Heat Shock induced expression of DN FgfR expression results in loss of FGF signaling and the simultaneous expansion of Wnt activity into the trailing zone. As explained in the original text, loss of sox3 expression in this context, rather than its expansion, suggests its expression is determined by Fgf signaling not Wnt activity. We will emphasize that its loss, rather than its expansion, following induction of DN FgfR, indicates its expression is determined by Fgf signaling not Wnt activity.

      Reviewer #2:

      The manuscript lacks quantification of many of the experiments, making it difficult to conclude their significance.

      One of the biggest inadvertent omissions of the paper was the inadequate quantification of some of the results. Quantification of results with considerable variation in the outcome, like the pattern of L1 deposition,ย  was provided following manipulations where various combinations of sox1a, sox2, and sox3 function was lost (Figures 3, supplementary Figures 2 and 3) or where sox2MO1/sox3MO was used with or without IWR (Figure 5 and Figure 6). However, numbers for the experiments in Figures 2 were omitted in the Figure legend, where typically about 10 embryos for each manipulation were photographed, scored, and a representative image was used to make the figure. In these experimentsย  there was a very consistent result with 100% of the embryos showing changes represented by each panel in Figure 2. The only exception was Figure 2Y where 9/10 embryos showed the described change. Similarly in Figure 4 there was a consistent result and 100% of embryos showed the change shown. Numbers and statistics for these results will be included in a revised manuscript.

      Reviewer #2:

      The statistical analysis in Figure 5 and Supplementary Figures 2 and 3 should be one-way ANOVA or Kruskal-Wallis with a Dunn's multiple comparisons test rather than pair-wise comparisons.

      The analysis has been re-done following the reviewerโ€™s suggestions. The analysis confirms the primary conclusions of the original submission, and this analysis will be incorporated in a revised manuscript. However, to improve the power of the analysis, experiments with low numbers of embryos will be repeated.

      See redone graphs in Figure 5 and supplementary Figure 2 and 3.

    1. eLife Assessment

      This study provides an important method to model the statistical biases of hypermutations during the affinity maturation of antibodies. The authors show convincingly that their model outperforms previous methods with fewer parameters; this is made possible by the use of machine learning to expand the context dependence of the mutation bias. They also show that models learned from nonsynonymous mutations and from out-of-frame sequences are different, prompting new questions about germinal center function. Strengths of the study include an open-access tool for using the model, a careful curation of existing datasets, and a rigorous benchmark; it is also shown that current machine-learning methods are currently limited by the availability of data, which explains the only modest gain in model performance afforded by modern machine learning.

    2. Reviewer #1 (Public review):

      Summary:

      This paper introduces a new class of machine learning models for capturing how likely a specific nucleotide in a rearranged IG gene is to undergo somatic hypermutation. These models modestly outperform existing state-of-the-art efforts, despite having fewer free parameters. A surprising finding is that models trained on all mutations from non-functional rearrangements give divergent results from those trained on only silent mutations from functional rearrangements.

      Strengths:

      * The new model structure is quite clever and will provide a powerful way to explore larger models.<br /> * Careful attention is paid to curating and processing large existing data sets.<br /> * The authors are to be commended for their efforts to communicate with the developers of previous models and use the strongest possible versions of those in their current evaluation.

      Weaknesses:

      * No significant weaknesses noted

    3. Reviewer #2 (Public review):

      This work offers an insightful contribution for researchers in computational biology, immunology, and machine learning. By employing a 3-mer embedding and CNN architecture, the authors demonstrate that it is possible to extend sequence context without exponentially increasing the model's complexity. Key findings include:

      โ€ข Efficiency and Performance: Thrifty CNNs outperform traditional 5-mer models and match the performance of significantly larger models like DeepSHM.<br /> โ€ข Neutral Mutation Data: A distinction is made between using synonymous mutations and out-of-frame sequences for model training, with evidence suggesting these methods capture different aspects of SHM, or different biases in the type of data.<br /> โ€ข Open Source Contributions: The release of a Python package and pretrained models adds practical value for the community.

      However, readers should be aware of the limitations. The improvements over existing models are modest, and the work is constrained by the availability of high-quality out-of-frame sequence data. The study also highlights that more complex modeling techniques, like transformers, did not enhance predictive performance, which underscores the role of data availability in such studies.

    4. Reviewer #3 (Public review):

      Summary:

      Modeling and estimating sequence context biases during B cell somatic hypermutation is important for accurately modeling B cell evolution to better understand responses to infection and vaccination. Sung et al. introduce new statistical models that capture a wider sequence context of somatic hypermutation with a comparatively small number of additional parameters. They demonstrate their model's performance with rigorous testing across multiple subjects and datasets. Prior work has captured the mutation biases of fixed 3-, 5-, and 7-mers, but each of these expansions has significantly more parameters. The authors developed a machine-learning-based approach to learn these biases using wider contexts with comparatively few parameters.

      Strengths:

      Well motivated and defined problem. Clever solution to expand nucleotide context. Complete separation of training and test data by using different subjects for training vs testing. Release of open-source tools and scripts for reproducibility.

      The authors have addressed my prior comments.

    5. Author Response:

      The following is the authorsโ€™ response to the previous reviews

      Reviewer #1 (Public Review):

      Summary:

      This paper introduces a new class of machine learning models for capturing how likely a specific nucleotide in a rearranged IG gene is to undergo somatic hypermutation. These models modestly outperform existing state-of-the-art efforts, despite having fewer free parameters. A surprising finding is that models trained on all mutations from non-functional rearrangements give divergent results from those trained on only silent mutations from functional rearrangements.

      Strengths:

      (1) The new model structure is quite clever and will provide a powerful way to explore larger models.

      (2) Careful attention is paid to curating and processing large existing data sets.

      (3) The authors are to be commended for their efforts to communicate with the developers of previous models and use the strongest possible versions of those in their current evaluation.

      Thank you very much for your comments. We especially appreciate the last comment, as we have indeed tried hard to do so.

      Weaknesses:

      (1) 10x/single cell data has a fairly different error profile compared to bulk data. A synonymous model should be built from the same briney dataset as the base model to validate the difference between the two types of training data.

      Thank you for pointing this out.

      We have repeated the same analysis with synonymous mutations derived from the bulk-sequenced tang dataset for Figure 4 and the supplementary figure. The conclusion remains the same. We used tang because only the out-of-frame sequences were available to us for the briney dataset, as we were using preprocessing from the Spisak paper.<br /> The fact that both the 10x and the tang data give the same results bolsters our claim.

      (2) The decision to test only kernels of 7, 9, and 11 is not described. The selection/optimization of embedding size is not explained. The filters listed in Table 1 are not defined.

      We have added the following to the Models subsection to further explain these decisions:

      โ€œThe hyperparameters for the models (Table 1) were selected with a run of Optuna (Akiba et al., 2019) early in the project and then fixed. Further optimization was not pursued because of the limited performance differences between the existing models.โ€

      Reviewer #2 (Public Review):

      Summary:

      This work offers an insightful contribution for researchers in computational biology, immunology, and machine learning. By employing a 3-mer embedding and CNN architecture, the authors demonstrate that it is possible to extend sequence context without exponentially increasing the model's complexity.

      Key findings:

      (1) Efficiency and Performance: Thrifty CNNs outperform traditional 5-mer models and match the performance of significantly larger models like DeepSHM.

      (2)Neutral Mutation Data: A distinction is made between using synonymous mutations and out-of-frame sequences for model training, with evidence suggesting these methods capture different aspects of SHM or different biases.

      (3) Open Source Contributions: The release of a Python package and pre-trained models adds practical value for the community.

      Thank you for your positive comments. We believe that we have been clear about the modest improvements (e.g., the abstract says โ€œslight improvementโ€), and we discuss the data limitations extensively. If there are ways we can do this more effectively, we are happy to hear them.

      Reviewer #3 (Public Review):

      Summary:

      Sung et al. introduce new statistical models that capture a wider sequence context of somatic hypermutation with a comparatively small number of additional parameters. They demonstrate their modelโ€™s performance with rigorous testing across multiple subjects and datasets.

      Strengths:

      Well-motivated and defined problem. Clever solution to expand nucleotide context. Complete separation of training and test data by using different subjects for training vs testing. Release of open-source tools and scripts for reproducibility.

      Thank you for your positive comments.

      Weaknesses:

      This study could be improved with better descriptions of dataset sequencing technology, sequencing depth, etc.

      We have added columns to Table 3 that report sequencing technology and depth for each dataset.

      Reviewer #1 (Recommendations for the Authors):

      (1) There seems to be a contradiction between Tables 2 and 3 as to whether the Tang et al. dataset was used to train models or only to test them.

      Thank you for catching this. The "purpose" column in Table 3 was for the main analysis, while Table 2 is describing only models trained to compare with DeepSHM. Explaining this seems more work than it's worth, so we simply removed that column from Table 2. The dataset purposes are clear from the text.

      (2) In Figure 4, I assume the two rows correspond to the Briney and Tang datasets, as in Figure 2, but this is not explicitly described.

      Yes, you are correct. We added an explanation in the caption of Figure 4.

      (3) Figure 2, supplement 1 should include a table like Table 1 that describes these additional models.

      We have added an explanation in the caption to Table 1 that "Medium" and "Large" refer to specific hyperparameter choices. The caption to Figure 2, supplement 1 now describes the corresponding hyperparameter choices for "Small" thrifty models.

      (4) On line 378 "Therefore in either case" seems extraneous.

      Indeed. We have dropped those words.

      (5) In the last paragraph of the Discussion, only the attempt to curate the Ford dataset is described. I am not sure if you intended to discuss the Rodriguez dataset here or not.

      Thank you for pointing this out. We have updated the Materials and Methods section to include our attempts to recover data from Rodriguez et al., 2023.

      (6) Have you looked to see if Soto et al. (Nature 2019) provides usable data for your purposes?

      Thank you for making us aware of this dataset!

      We assessed it but found that the recovery of usable out-of-frame sequences was too low to be useful for our analysis. We now describe this evaluation in the paper.

      (7) Cui et al. note a high similarity between S5F and S5NF (r=0.93). Does that constrain the possible explanations for the divergence you see?

      This is an excellent point.

      We don't believe the correlation observed in Cui and our results are incompatible. Our point is not that the two sources of neutral data are completely different but that they differ enough to limit generalization. Also, the Spearman correlation in Cui is 0.86, which aligns with our observed drop in R-precision.

      (8) Are you able to test the effects of branch length or background SHM on the model?

      We're unsure what is meant by โ€œbackground SHM.โ€<br /> We did try joint optimization of branch length and model parameters, but it did not improve performance. Differences in clone size thresholds do exist between datasets, but Figure 3 suggests that tang is better sequence data.

      (9) Would the model be expected to scale up to a kernel of, say, 50? Would that help yield biological insight?

      We did not test such large models because larger kernels did not improve performance.

      While your suggestion is intriguing, distinguishing biological effects from overfitting would be difficult. We explore biological insights more directly in our recent mechanistic model paper (Fisher et al., 2025), which is now cited in a new paragraph on biological conclusions.

      Reviewer #2 (Recommendations for the Authors):

      (1) Consider applying a stricter filtration approach to the Briney dataset to make it more comparable to the Tang dataset.

      Thank you. We agree that differences in datasets are interesting, though model rankings remain consistent. We now include supplementary figures comparing synonymous and out-of-frame models from the tang dataset.

      (2) You omit mutations between the unmutated germline and the MRCA of each tree. Why?

      The inferred germline may be incorrect due to germline variation or CDR3 indels, which could introduce spurious mutations. Following Spisak et al. (2020), we exclude this branch.<br /> Yes, singletons are discarded: ~28k in tang and ~1.1M in jaffe.

      (3) Could a unified model trained on both data types offer further insights?

      We agree and present such an analysis in Figure 4.

      (4) Tree inference biases from parent-child distances may impact the results.

      While this is an important issue, all models are trained on the same trees, so we expect any noise or bias to be consistent. Different datasets help confirm the robustness of our findings.

      (5) Simulations would strengthen validation.

      We focused on real datasets, which we view as a strength. While simulations could help, designing a meaningful simulation model would be nontrivial. We have clarified this point in the manuscript.

      Reviewer #3 (Recommendations for the Authors):

      There are typos in lines 109, 110, 301, 307, and 418.

      Thank you, we have corrected them.

    1. eLife Assessment

      This study presents a valuable finding on the delivery of a nuclear envelop protein to lysosomes and the impact of C-terminal tagging on its traffic. The authors provide solid evidence for the potential artifacts introduced by large terminal tags, particularly in the context of membrane protein localization and stability.

    2. Reviewer #1 (Public review):

      Summary:

      The authors revisit the specific domains/signals required for redirection of an inner nuclear membrane protein, emerin, to the secretory pathway. They find that epitope tagging influences protein fate, serving as a cautionary tale for how different visualisation methods are used. Multiple tags and lines of evidence are used, providing solid evidence for the altered fate of different constructs.

      Strengths:

      This is a thorough dissection of domains and properties that confer INM retention vs secretion to the PM/lysosome, and will serve the community well as a caution regarding placement of tags and how this influences protein fate.

      Weaknesses:

      The specific biogenesis pathway for C-terminally tagged emerin might confound some interpretations. Appending the large GFP to the C-terminus may direct the fusion protein to a different ER insertion pathway than that used by the endogenous protein. How this might influence the fate of the tagged protein remains to be determined. In some ways this is beyond the scope of the current study, but should serve as a warning to epitope-tagging approaches.

    3. Reviewer #2 (Public review):

      In this manuscript, Mella et al. investigate the effect of GFP tagging on the localization and stability of the nuclear-localized tail-anchored (TA) protein Emerin. A previous study from this group demonstrated that C-terminally GFP-tagged Emerin traffics to the plasma membrane and is eventually targeted to lysosomes for degradation. It has been suggested that the C-terminal tagging of TA proteins may shift their insertion from the post-translational TRC/GET pathway to the co-translational SRP-mediated pathway. Consistent with this, the authors confirm that C-terminal GFP tagging causes Emerin to mislocalize to the plasma membrane and subsequently to lysosomes.

      In this study, they investigate the mechanism underlying this misrouting. By manipulating the cytosolic domain and the hydrophobicity of the transmembrane domain (TMD), the authors show that an ER retention sequence and increased TMD hydrophobicity contribute to Emerin's trafficking through the secretory pathway.

      This reviewer had previously raised the concern that the potential role of the GFP tag within the ER lumen in promoting secretory trafficking was not addressed. In the revised manuscript, the authors respond to this concern by examining the co-localization of Emerin-GFP with the ER exit site marker Sec31A. Their data show that the presence of the C-terminal GFP tag increases Emerin's propensity to engage ER exit sites, supporting the conclusion that GFP tagging promotes its entry into the secretory pathway.

    4. Author Response:

      The following is the authorsโ€™ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors revisit the specific domains/signals required for the redirection of an inner nuclear membrane protein, emerin, to the secretory pathway. They find that epitope tagging influences protein fate, serving as a cautionary tale for how different visualisation methods are used. Multiple tags and lines of evidence are used, providing solid evidence for the altered fate of different constructs.

      Strengths:

      This is a thorough dissection of domains and properties that confer INM retention vs secretion to the PM/lysosome, and will serve the community well as a caution regarding the placement of tags and how this influences protein fate.

      Weaknesses:

      Biogenesis pathways are not explored experimentally: it would be interesting to know if the lysosomal pool arrives there via the secretory pathway (eg by engineering a glycosylation site into the lumenal domain) or by autophagy, where failed insertion products may accumulate in the cytoplasm and be degraded directly from cytoplasmic inclusions.

      This manuscript is a Research Advance that follows previous work that we published in eLife on this topic (Buchwalter et al., eLife 2019; PMID 31599721). In that prior publication, we showed that emerin-GFP arrives at the lysosome by secretion and exposure at the PM, followed by internalization. While we state these previous findings in this manuscript, we did not explicitly restate here how we came to that conclusion. In the 2019 study, we (i) engineered in a glycosylation site, which demonstrated that emerin-GFP receives complex, Endo H-resistant N-glycans, indicating passage through the Golgi; (ii) performed cell surface labeling, which confirmed that emerin accesses the PM; and interfered with (iii) the early secretory pathway using brefeldin A and with (iv) lysosomal function using bafilomycin A1. Further, we ruled out autophagy as a major contributor to emerin trafficking by treating cells with the PI3K inhibitor KU55933, which had no effect on emerinโ€™s lysosomal delivery.

      It would be helpful if the topology of constructs could be directly demonstrated by pulse-labelling and protease protection. It's possible that there are mixed pools of both topologies that might complicate interpretation.

      We demonstrate that emerinโ€™s TMD inserts in a tail-anchored orientation (C terminus in ER lumen) by appending a GFP tag to either the N or C terminus, followed by anti-GFP antibody labeling of unpermeabilized cells (Fig. 1G). This shows the preferred topology of emerinโ€™s wild type TMD.

      As the reviewer points out, it is possible that our manipulations of the TMD sequence (Fig. 2D-E) alter its preferred topology of membrane insertion. We addressed this question by performing anti-GFP and anti-emerin antibody labeling of the less hydrophobic TMD mutant (EMD-TMDm-GFP) after selective permeabilization of the plasma membrane (Figure 2 supplement, panel F). If emerin biogenesis is normal, the GFP tag should face the ER lumen while the emerin antibody epitope should be cytosolic. If the fidelity of emerinโ€™s membrane insertion is impaired, the GFP tag could be exposed to the cytosol (flipped orientation), which would be detected by anti-GFP labeling upon plasma membrane permeabilization. We find that the C-terminal GFP tag is completely inaccessible to antibody when the PM is selectively permeabilized with digitonin, but is readily detected when all intracellular membranes are permeabilized with Triton-X-100. These data confirm that mutating emerinโ€™s TMD does not disrupt the proteinโ€™s membrane topology.

      Reviewer #2 (Public review):

      In this manuscript, Mella et al. investigate the effect of GFP tagging on the localization and stability of the nuclear-localized tail-anchored (TA) protein Emerin. A previous study from this group showed that C-terminally GFP-tagged Emerin protein traffics to the plasma membrane and reaches lysosomes for degradation. It is suggested that the C-terminal tagging of tail-anchored proteins shifts their insertion from the post-translational TRC/GET pathway to the co-translational SRP-mediated pathway. The authors of this paper found that C-terminal GFP tagging causes Emerin to localize to the plasma membrane and eventually reach lysosomes. They investigated the mechanism by which Emerin-GFP moves to the secretory pathway. By manipulating the cytosolic domain and the hydrophobicity of the transmembrane domain (TMD), the authors identify that an ER retention sequence and strong TMD hydrophobicity contribute to Emerin trafficking to the secretory pathway. Overall, the data are solid, and the knowledge will be useful to the field. However, the authors do not fully answer the question of why C-terminally GFP-tagged Emerin moves to the secretory pathway. Importantly, the authors did not consider the possible roles of GFP in the ER lumen influencing Emerin trafficking to the secretory pathway.

      Reviewer #2 (Recommendations for the authors):

      Major concerns:

      (1) The authors suggest that an ER retention sequence and high hydrophobicity of Emerin TMD contribute to its trafficking to the secretory pathway. However, these two features are also present in WT Emerin, which correctly localizes to the inner nuclear membrane. Additionally, the authors show that the ER retention sequence is normally obscured by the LEM domain. The key difference between WT Emerin and Emerin-GFP is the presence of GFP in the ER lumen. The authors missed investigating the role of GFP in the ER lumen in influencing Emerin trafficking to the secretory pathway. It is likely that COPII carrier vesicles capture GFP protein in the lumen as part of the bulk flow mechanism for transport to the Golgi compartment. The authors could easily test this by appending a KDEL sequence to the C-terminus of GFP; this should now redirect the protein to the nucleus.

      We agree with the reviewerโ€™s point that the presence of lumenal GFP somehow promotes secretion of emerin from the ER, likely at the stage of enhancing its packaging into COPII vesicles. We struggle to think about how to interpret the KDEL tagging experiment that the reviewer proposes, as the KDEL receptor predominantly recycles soluble proteins from the Golgi to the ER, while emerin is a membrane protein; and we have shown that emerin already contains a putative COPI-interacting RRR recycling motif in its cytosolic domain.

      Nevertheless, we agree with the reviewer that it is worthwhile to test the possibility that addition of GFP to emerinโ€™s C-terminus promotes capture by COPII vesicles. We have evaluated this question by performing temperature block experiments to cause cargo accumulation within stalled COPII-coated ER exit sites, then comparing the propensity of various untagged and tagged emerin variants to enrich in ER exit sites as judged by colocalization with the COPII subunit Sec31a. These data now appear in Figure 4 supplement 1. These experiments indicate that emerin-GFP samples ER exit sites significantly more than does untagged emerin. Further, the ER exit site enrichment of emerin-GFP is dampened by shortening emerinโ€™s TMD. We do not see further enrichment of any emerin variant in ER exit sites when COPII vesicle budding is stalled by low temperature incubation, implying that emerin lacks any positive sorting signals that direct its selective enrichment in COPII vesicles. Altogether, these data indicate that both emerinโ€™s long and hydrophobic TMD and the addition of a lumenal GFP tag increase emerinโ€™s propensity to sample ER exit sites and undergo non-selective, โ€œbulk flowโ€ ER export.

      (2) The authors nicely demonstrate that the hydrophobicity of Emerin TMD plays a role in its secretory trafficking. I wonder if this feature may be beneficial for cells to degrade newly synthesized Emerin via the lysosomal pathway during mitosis, as the nuclear envelope breakdown may prevent the correct localization of newly synthesized Emerin. The authors could test Emerin localization during mitosis. Such findings could add to the physiological significance of their findings. At the minimum, they should discuss this possibility.

      We thank the reviewer for this insightful suggestion. It is attractive to speculate that secretory trafficking might enable lysosomal degradation of emerin during mitosis, when its lamin anchor has been depolymerized. However, we think it is unlikely that mitotic trafficking contributes significantly to the turnover flux of untagged emerin; if it did, we would expect to see higher steady state levels and/or slowed turnover of emerin mutants that cannot traffic to the lysosome. We did not observe this outcome. Instead, mutations that enhance (RA) or impair (TMDm) emerin trafficking had no effect on the untagged proteinโ€™s steady-state levels (Fig. 4G).

      Minor concerns:

      (1) On page 7, the authors note that "FLAG-RA construct was not poorly expressed relative to WR, in contrast with RA-GFP (Figures S3C, 2I)." The expression levels of these proteins cannot be compared across two different blots.

      We apologize for this confusion; we were implying two distinct comparisons to internal controls present on each blot. We have adjusted the text to read โ€œFLAG-RA construct was not poorly expressed relative to FLAG-WT (Fig. S3C) in contrast to RA-GFP compared to WT-GFP (Fig. 2I).โ€

      (2) In the first paragraph of the discussion, the authors suggest that aromatic amino acids facilitate trafficking to lysosomes. However, they only replaced aromatic amino acids with alanine residues. If they want to make this claim, they should test other amino acids, particularly hydrophobic amino acids such as leucine.

      The reviewer may be inferring more import from our statement than we intended. We focused on these aromatic residues within the TMD because they contribute strongly to its overall hydrophobicity. Experimentally, we determined that nonconservative alanine substitutions of these aromatic residues inhibited trafficking. We do not state and do not intend to imply that the aromatic character of these residues specifically influences trafficking propensity, and we agree with the reviewer that to test such a question would require additional substitutions with non-aromatic hydrophobic amino acids.

      We realize that our phrasing may have been misleading by opening with discussion of the aromatic amino acids; in the revised discussion paragraph, we instead lead with discussion of TMD hydrophobicity, and then state how the specific substitutions we made affect trafficking.

      Reviewing Editor comments:

      While reviewer 1 did not provide any recommendations to the authors, I agree with this reviewer that the authors should validate the topology of their tagged proteins (at least for the one used to draw key conclusions). Given that Emerin is a tail-anchored protein, having a big GFP tag at the C-terminus could mess up ER insertion, causing the protein to take a wrong topology or even be mislocalized in the cytosol, particularly under overexpression conditions. In either case, it can be subject to quality control-dependent clearance via either autophagy, ERphagy, or ER-to-lysosome trafficking. I think that the authors should try a few straightforward experiments such as brefeldin A treatment or dominant negative Sar1 expression to test whether blocking conventional ER-to-Golgi trafficking affects lysosomal delivery of Emerin. I also think that the authors should discuss their findings in the context of the RESET pathway reported previously (PMID: 25083867). The ER stress-dependent trafficking of tagged Emerin to the PM and lysosomes appears to follow a similar trafficking pattern as RESET, although the authors did not demonstrate that Emerin traffic to lysosomes via the PM. In this regard, they should tone down their conclusion and discuss their findings in the context of the RESET pathway, which could serve as a model for their substrate.

      We agree that validating the topology of TMD mutants is important, and now include these experiments in the revised manuscript (please see our response to Reviewer 1 above).

      Please see our response to Reviewer 1โ€™s public review; we previously determined that emerin-GFP undergoes ER-to-Golgi trafficking (see our 2019 study).

      We recognize the major parallels between our findings and the RESET pathway. In our 2019 study, we found that similarly to other RESET cargoes, emerin-GFP travels through the secretory pathway, is exposed at the PM, and is then internalized and delivered to lysosomes. We discussed these strong parallels to RESET in our 2019 study. In this revised manuscript, we now also point out the parallels between emerin trafficking and RESET and cite the 2014 study by Satpute-Krishnan and colleagues (PMID 25083867)

    1. eLife Assessment

      This study shows, for the first time, the structure and snapshots of the dynamics of the full-length soluble Angiotensin-I converting enzyme dimer. The combination of structural and computational analyses provides compelling evidence that reveals the conformational dynamics of the complex and key regions mediating the conformational change. This fundamental work illustrates how conformational heterogeneity can be used to gain insights into protein function.

    2. Reviewer #1 (Public review):

      Summary:

      The authors report four cryoEM structures (2.99 to 3.65 ร… resolution) of the 180 kDa, full-length, glycosylated, soluble Angiotensin-I converting enzyme (sACE) dimer, with two homologous catalytic domains at the N- and C-terminal ends (ACE-N and ACE-C). ACE is a protease capable of effectively degrading Aฮฒ. The four structures are C2 pseudo-symmetric homodimers and provide insight into sACE dimerization. These structures were obtained using discrete classification in cryoSPARC and show different combinations of open, intermediate, and closed states of the catalytic domains, resulting in varying degrees of solvent accessibility to the active sites.

      To deepen the understanding of the gradient of heterogeneity (from closed to open states) observed with discrete classification, the authors performed all-atom MD simulations and continuous conformational analysis of cryo-EM data using cryoSPARC 3DVA, cryoDRGN, and RECOVAR. cryoDRGN and cryoSPARC 3DVA revealed coordinated open-closed transitions across four catalytic domains, whereas RECOVAR revealed independent motion of two ACE-N domains, also observed with cryoSPARC focused classification. The authors suggest that the discrepancy in the results of the different methods for continuous conformational analysis in cryo-EM could results from different approaches used for dimensionality reduction and trajectory generation in these methods.

      Strengths:

      This is an important study that shows, for the first time, the structure and the snapshots of the dynamics of the full-length sACE dimer. Moreover, the study highlights the importance of combining insights from different cryo-EM methods that address questions difficult or impossible to tackle experimentally, while lacking ground truth for validation.

      Weaknesses:

      The open, closed, and intermediate states of ACE-N and ACE-C in the four cryo-EM structures from discrete classification were designated quantitatively (based on measured atomic distances on the models fitted into cryo-EM maps). Unfortunately, atomic models were not fitted into cryo-EM maps obtained with cryoSPARC 3DVA, cryoDRGN, and RECOVAR, and the open/closed states in these cases were designated based on a qualitative analysis.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript presents a valuable contribution to the field of ACE structural biology and dynamics by providing the first complete full-length dimeric ACE structure in four distinct states. The study integrates cryo-EM and molecular dynamics simulations to offer important insights into ACE dynamics. The depth of analysis is commendable, and the combination of structural and computational approaches enhances our understanding of the protein's conformational landscape.

    4. Reviewer #3 (Public review):

      Summary:

      Mancl et al. report four Cryo-EM structures of glycosylated and soluble Angiotensin-I converting enzyme (sACE) dimer. This moves forward the structural understanding of ACE, as previous analysis yielded partially denatured or individual ACE domains. By performing a heterogeneity analysis, the authors identify three structural conformations (open, intermediate open, and closed) that define the openness of the catalytic chamber and structural features governing the dimerization interface. They show that the dimer interface of soluble ACE consists of an N-terminal glycan and protein-protein interaction regions, as well as C-terminal protein-protein interactions. Further heterogeneity mining and all-atom molecular dynamic simulations show structural rearrangements that lead to the opening and closing of the catalytic pocket, which could explain how ACE binds its substrate. These studies could contribute to future drug design targeting the active site or dimerization interface of ACE.

      Strengths:

      The authors make significant efforts to address ACE denaturation on cryo-EM grids, testing various buffers and grid preparation techniques. These strategies successfully reduce denaturation and greatly enhance the quality of the structural analysis. The integration of cryoDRGN, 3DVA, RECOVAR, and all-atom simulations for heterogeneity analysis proves to be a powerful approach, further strengthening the overall experimental methodology.

      Weaknesses:

      No weaknesses noted. The revised manuscript adequately addresses the points I suggested in the review of the first submission.

    5. Author Response:

      The following is the authors response to the original reviews.

      Reviewer #1 (Public review):ย 

      Summary:ย 

      The authors report four cryoEM structures (2.99 to 3.65 ร… resolution) of the 180 kDa, full-length, glycosylated, solubleย Angiotensin-I converting enzyme (sACE) dimer, with two homologous catalytic domains at the N- and C-terminal ends (ACE-N and ACE-C). ACE is a protease capable of effectively degrading Aฮฒ. The four structures are C2 pseudo-symmetric homodimers and provide insight into sACE dimerization. These structures were obtained using discrete classification in cryoSPARC and show different combinations of open, intermediate, and closed states of the catalytic domains, resulting in varying degrees of solvent accessibility to the active sites.ย 

      To deepen the understanding of the gradient of heterogeneity (from closed to open states) observed with discrete classification, the authors performed all-atom MD simulations and continuous conformational analysis of cryo-EM data using cryoSPARC 3DVA, cryoDRGN, and RECOVAR. cryoDRGN and cryoSPARC 3DVA revealed coordinated open-closed transitions across four catalytic domains, whereas RECOVAR revealed independent motion of two ACE-N domains, also observed with cryoSPARC-focused classification. The authors suggest that the discrepancy in the results of the different methods for continuous conformational analysis in cryo-EM could result from different approaches used for dimensionality reduction and trajectory generation in these methods.ย 

      Strengths:ย 

      This is an important study that shows, for the first time, the structure and the snapshots of the dynamics of the full-length sACE dimer. Moreover, the study highlights the importance of combining insights from different cryo-EM methods that address questions difficult or impossible to tackle experimentally while lacking ground truth for validation.ย 

      Weaknesses:ย 

      The open, closed, and intermediate states of ACE-N and ACE-C in the four cryo-EM structures from discrete classification were designated quantitatively (based on measured atomic distances on the models fitted into cryo-EM maps, Figure 2D). Unfortunately, atomic models were not fitted into cryo-EM maps obtained with cryoSPARC 3DVA, cryoDRGN, and RECOVAR, and the open/closed states in these cases were designated based on qualitative analysis. As the authors clearly pointed out, there are many other methods for continuous conformational heterogeneity analysis in cryo-EM. Among these methods, some allow analyzing particle images in terms of atomic models, like MDSPACE (Vuillemot et al., J. Mol. Biol. 2023, 435:167951), which result in one atomic model per particle image and can help in analyzing cooperativity of domain motions through measuring atomic distances or angular differences between different domains (Valimehr et al., Int. J. Mol. Sci. 2024, 25: 3371). This could be discussed in the article.ย 

      Reviewer #2 (Public review):ย 

      Summary:ย 

      The manuscript presents a valuable contribution to the field of ACE structural biology and dynamics by providing the first complete full-length dimeric ACE structure in four distinct states. The study integrates cryo-EM and molecular dynamics simulations to offer important insights into ACE dynamics. The depth of analysis is commendable, and the combination of structural and computational approaches enhances our understanding of the protein's conformational landscape. However, the strength of evidence supporting the conclusions needs refinement, particularly in defining key terms, improving structural validation, and ensuring consistency in data analysis. Addressing these points through major revisions will significantly improve the clarity, rigor, and accessibility of the study to a broader audience, allowing it to make a stronger impact in the field.ย 

      Strengths:ย 

      The integration of cryo-EM and MD simulations provides valuable insights into ACE dynamics, showcasing the authors' commitment to exploring complex aspects of protein structure and function. This is a commendable effort, and the depth of analysis is appreciated.ย 

      Weaknesses:ย 

      Several aspects of the manuscript require further refinement to improve clarity and scientific rigor as detailed in my recommendations for the authors.ย 

      Reviewer #3 (Public review):ย 

      Summary:ย 

      Mancl et al. report four Cryo-EM structures of glycosylated and solubleย Angiotensin-I converting enzyme (sACE) dimer. This moves forward the structural understanding of ACE, as previous analysis yielded partially denatured or individual ACE domains. By performing a heterogeneity analysis, the authors identify three structural conformations (open, intermediate open, and closed) that define the openness of the catalytic chamber and structural features governing the dimerization interface. They show that the dimer interface of soluble ACE consists of an N-terminal glycan and protein-protein interaction region, as well as C-terminal protein-protein interactions. Further heterogeneity mining and all-atom molecular dynamic simulations show structural rearrangements that lead to the opening and closing of the catalytic pocket, which could explain how ACE binds its substrate. These studies could contribute to future drug design targeting the active site or dimerization interface of ACE.ย 

      Strengths:ย 

      The authors make significant efforts to address ACE denaturation on cryo-EM grids, testing various buffers and grid preparation techniques. These strategies successfully reduce denaturation and greatly enhance the quality of the structural analysis. The integration of cryoDRGN, 3DVA, RECOVAR, and all-atom simulations for heterogeneity analysis proves to be a powerful approach, further strengthening the overall experimental methodology.ย 

      Weaknesses:ย 

      In general, the findings are supported by experimental data, but some experimental details and approaches could be improved. For example, CryoDRGN analysis is limited to the top 5 PCA components for ease of comparison with cryoSPARC 3DVA, but wouldn't an expansion to more components with CryoDRGN potentially identify further conformational states? The authors also say that they performed heterogeneity analysis on both datasets but only show data for one. The results for the first dataset should be shown and can be included in supplementary figures. In addition, the authors mention that they were not successful in performing cryoSPARC 3DFLex analysis, but they do not show their data or describe the conditions they used in the methods section. These data should be added and clearly described in the experimental section.ย 

      Some cryo-EM data processing details are missing. Please add local resolution maps, box sizes, and Euler angle distributions and reference the initial PDB model used for model building.ย 

      Reviewer #1 (Recommendations for the authors):ย <br /> Major point:ย 

      The authors could discuss the use of continuous conformational heterogeneity analysis methods that analyze particle images in terms of atomic models, based on MD simulations, like MDSPACE (Vuillemot et al., J. Mol. Biol. 2023, 435:167951). MDSPACE can be used on a dataset preprocessed with cryoSPARC or Relion by discrete classification to reduce compositional heterogeneity and obtain initial particle poses. It results in one atomic model per particle image and can help in analyzing the cooperativity of domain motions by measuring atomic distances or angular differences between different domains (Valimehr et al., Int. J. Mol. Sci. 2024, 25: 3371).ย 

      We agree that MDSPACE is a promising and useful tool for analysis, and are excited to implement such a method. Prior to manuscript submission, we have had discussions with the primary author, Slavica Jonic, about how we may employ her software in our analysis. Unfortunately, we were unable to overcome significant computational issues, notably MDSPACEโ€™s lack of GPU functionality, which prevent us from employing MDSPACE in a reasonable manner for our dataset. We hope to employ MDSPACE in future work, once the computational issues have been addressed, and have added a section on MDSPACE to the discussion in an effort to increase the visibility of MDSPACE, as we feel it is an exciting approach that deserves more visibility. We have added a substantial discussion on this point, specifically on MDspace as follows:

      line 565-574

      Similarly, MDSPACE holds tremendous promise as a method for investigating conformational dynamics from cryo-EM data (61). MDSPACE integrates cryo-EM particle data with short MD simulations to fit atomic models into each particle image through an iterative process which extracts dynamic information. However, the lack of GPU-enabled processing for MDSPACE requires either a dedicated a computational setup that diverges from most other cryo-EM software, or access to a CPU-based supercomputer, which severely limits the accessibility of such software. Despite these challenges, both 3DFlex and MDSPACE use promising approaches to study protein conformational dynamics. We look forward to exploring effective methods to incorporate these strategies into our future research.

      Minor points:ย 

      (1) Lines 348-350: "The discrepancy in population size between these clusters is likely due to bias in the initial particle poses, rather than a subunit-specific preference for the open state." Which bias? The cluster size is related to conformations, not to poses.ย 

      We hope to emphasize that the assignment of particles to either the OC or CO cluster is likely due to the particle orientation within the complete dimer refinement, and the discrepancy in size between OC and CO clusters does not necessarily indicate a domain specific preference for one state or another, which would carry allosteric implications. This remains a possibility, but we hope to avoid over-interpretation of our results with the statement above.

      The statement was altered to now read:

      Line 418-423

      โ€œThe discrepancy in population size between these clusters is likely due to bias in the initial particle orientation, rather than a subunit-specific preference for the open state. As the O/C state and the C/O state are 180 degree rotations of each other, particle assignment to either cluster is likely influenced by the initial particle orientation of the complete dimer, and we currently lack the data to discern any allosteric implication to the orientation assignment.โ€

      (2) Line 519: "Micrographs with a max CTF value worse than 4ร… were removed from the dataset,..." (also, lines 822-823 in supplementary material).ย <br /> Do you want to say that micrographs with a resolution worse than 4 A were removed?ย 

      Max CTF value was replaced with CTF fit resolution to properly match the parameter used in Cryosparc.

      (3) Figure 2C: The black lines are barely visible. Can you make them thicker and in red color?ย 

      The figure has been amended.

      (4) Figure 2D: The values for Chain A and Chain B in the second row (ACE-C) of sACE-3.05 columns are 17.9 (I) (Chain A) and 13.9 (C) (Chain B). Shouldn't they be reversed (13.9 (C) (Chain A) and 17.9 (I) (Chain B))?ย 

      The values are now correct. sACE-3.65 chains were flipped in the table, and the updated color scheme should make it easier to map the values from the table to their corresponding structure.

      Reviewer #2 (Recommendations for the authors):ย 

      The manuscript presents the first complete full-length dimeric ACE structure. The integration of cryo-EM and MD simulations provides valuable insights into ACE dynamics, showcasing the authors' commitment to exploring complex aspects of protein structure and function. This is a commendable effort, and the depth of analysis is appreciated. However, several aspects of the manuscript require further refinement to improve clarity and scientific rigor. In the view of this reviewer, a major revision is necessary. Please see the detailed comments below:ย 

      (1) Definition of "Conformational Heterogeneity": The term "conformational heterogeneity" should be clearly defined when citing references 27-29.ย <br /> References 27 and 29 use MD simulations, which reveal "conformational flexibility" rather than "conformational heterogeneity" as observed in cryo-EM data. A more precise distinction should be made.ย 

      We have changed the term โ€œconformational heterogeneityโ€ to the broader โ€œconformational dynamics

      (2) Figure Adjustments for Clarity:ย <br /> Figure 1B: A scale bar is needed for accurate representation.ย 

      A 100 Angstrom scale bar was added to figure 1B.

      Figure 2A, B: Using a Cฮฑ trace representation would improve clarity and make structural differences more apparent.ย 

      We found using a Cฮฑ trace representation makes the figure too confusing and impossible to determine individual structural elements. Everything just becomes a jumble of lines.

      Additionally, a Cฮฑ displacement vs. residue index plot (with Figure 1A placed along the x-axis) should be included alongside Figures 2A and B to provide quantitative insight into structural variations.ย 

      This analysis has been combined with several other suggestions and now comprises a new figure 4.

      (3) Structural Resolution and Validation:ย <br /> Euler angle distribution and 3D-FSC analysis should be provided to help the audience assess how these factors influence the resolution of each structure.ย <br /> Local resolution analysis in Relion should be included to determine if there are dynamic differences among the four structures.ย <br /> To enhance structural interpretation, the manuscript would benefit from showcasing examples of bulky side-chain densities (e.g., Trp, Phe, Tyr) for each of the four structures.ย 

      Information is included in Figure S3 and S5.

      (4) Glycan Modeling Considerations:ย <br /> Since the resolution of cryo-EM does not allow for precise glycan composition determination, additional experimental validation (e.g., Glyco-MS) would strengthen the modeling. If experimental support is unavailable, appropriate references should be cited to justify the modeled glycans.ย 

      Minimal glycan modeling was performed with the goal of demonstrating that the protein is glycosylated. We have highlighted that we chose 12 N-linked glycosylation sites that have the observed extra density, an indication that glycan should be present and modeled them with complex glycans in the manuscript. ย 

      (5) Advanced Cryo-EM and MD Analyses: 3DFlex Analysis:ย <br /> It is recommended that the authors explore 3DFlex to better capture conformational variability. CryoSPARC's community support can assist in proper implementation.ย 

      We have incorporated our 3Dflex analysis in our discussion as follows:

      Line 553-565

      Surprisingly, we did not observe such motion using cryoSPARC 3DFlex, a neural network-based method analyzing our cryo-EM data of sACE (54). Central to the working of cryoSPARC 3DFlex is the generation of a tetrahedral mesh used to calculate deformations within the particle population. Proper generation of the mesh is critical for obtaining useful results and must often be determined empirically. Despite several attempts, we were unable to obtain results from 3DFlex comparable to what we observed with our other methods. Even using the results from our 3DVA as prior input to 3DFlex, the largest conformational change we observed was a slight wiggling at the bottom of the D3a subdomain (Movie S12). The authors of 3DFlex note that 3DFlex struggles to model intricate motions, and the implementation of custom tetrahedral meshes currently requires a non-cyclical fusion strategy between mesh segments. Given these limitations, and the complexity of sACE conformational dynamics, it appears that sACE, as a system, is not well-suited to analysis via 3DFlex in its current implementation.

      (6) Movie Consistency:ย <br /> The MD simulation movies should use the same color coding as the first four movies for consistency. Similarly, the 3DVar analysis map should be color-coded to enhance interpretability.ย 

      MD simulation movies are re-colored.

      (7) MD Simulations - Data Extraction and Validation:ย <br /> The manuscript includes several long-timescale MD simulations, but further analysis is needed to extract meaningful dynamic information. Suggested analyses include:ย <br /> a. RMSF (Root Mean Square Fluctuation) Analysis: Calculate RMSF from MD trajectories and compare it with local resolution variations in cryo-EM maps.ย 

      RMSF values were included in the new figure 4 along with structural depictions colored by RMSF value to localize variation to the structure.

      b. Assess whether regions exhibiting lower dynamics correspond to higher resolution in cryo-EM.ย 

      Information is added to Figure 4, Figure S3, S5, S6.

      c. Compare RMSF between simulations with and without glycans to identify potential effects.ย 

      This has been done in Figure 4.

      d. Clustering Analysis: Use the four solved structures as reference states to cluster MD simulation trajectories. Determine if the population states observed in MD simulations align with cryo-EM findings.ย 

      This has been done in supplementary figure S10.

      e. Principal Component Analysis (PCA): Perform PCA on MD trajectories and compare with dynamics inferred from cryo-EM analyses (3DVar, cryoDRGN, and RECOVAR) to ensure consistency.ย 

      This has been done in supplementary figure S11.

      f. Correction of RMSF Analysis or the y-axis label in Figure S9: The RMSF values cannot be negative by definition. The authors should carefully review the code used for this calculation or explicitly define the metric being measured.ย 

      The Y-axis label has been corrected to clarify that the plot depicts the change in RMSF values when comparing the glycosylated and non-glycosylated MD simulations.

      (8) Discussion on Coordinated Motion and Allostery:ย <br /> The discussion of coordinated motion and allosteric regulation between sACE-N domains should be explicitly connected to experimental evidence mentioned in the introduction:ย <br /> "Enzyme kinetics analysis suggests negative cooperativity between two catalytic domains (31-33). However, ACE also exhibits positive synergy toward Ab cleavage and allostery to enhance the activity of its binding partner, the bradykinin receptor (11, 34)."ย 

      (9) The authors should elaborate on how their new insights provide a mechanistic explanation for these experimental observations.ย 

      (10) Connection to Therapeutic Implications:ย <br /> The discussion section should more explicitly connect the structural findings to potential therapeutic applications, which would significantly enhance the impact of the study.ย 

      These three points (8-10) were addressed in a significant overhaul to the discussion section.

      In summary, this study makes a valuable contribution to the field of ACE structural biology and dynamics. The combination of cryo-EM and MD simulations is particularly powerful, and with major revisions, this manuscript has the potential to make a strong impact. Addressing the points outlined above will significantly improve clarity, strengthen the scientific claims, and enhance the manuscript's accessibility to a broader audience. I appreciate the authors' rigorous approach to this complex topic and encourage them to refine their work to fully highlight the significance of their findings.ย 

      Reviewer #3 (Recommendations for the authors):ย 

      (1) The authors incorrectly refer to their ACE construct as full-length throughout the manuscript. Given that they are purifying the soluble region (aa 1-1231), saying full-length ACE is not the correct nomenclature. I suggest removing full-length and using soluble ACE (sACE) throughout the text.ย 

      We utilize the term full-length to highlight the fact that our structures contain both the N and C domains for both subunits in the dimer, in contrast to the previously published ACE cryo-EM structure. We have clarified in the text that we refer to the full-length soluble region of ACE (sACE), and sACE is used to specifically refer to our construct throughout the text, except when referring to ACE in a more generalized biological context in the introduction and discussion.

      (2) The authors could show differences between the different structural states by measuring and displaying the alpha carbon distances. For example, in Figures 2A, B, 3A, and 4B and C.ย 

      Alpha carbon displacements for each residue have been added to the new figure 4.

      (3) Most figures, with a few exceptions (Figures 2 and S11), are of low quality. Perhaps they are not saved in the same format. In addition, the color schemes used throughout the figures and movies are not consistent. For example, in Figure 1 D2 domains are in green, while they appear yellow in Figure 2 and later. Please double-check all coloring schemes and keep them consistent throughout the manuscript. In addition, it would be good to keep the labeling of the domains in the subsequent figures, as it is difficult to remember which domain is which throughout the manuscript.ย 

      We are unsure of how to address the low quality issue, our files and the online versions appear to be of suitable high quality. We will work with editorial staff to ensure all files are of suitable quality. The color scheme has been revised throughout the manuscript to ensure consistency and better differentiate between domains and chains.

      (4) Figure 1. Indicate exactly where in panel A ACE-N ends and ACE-C starts. Also, the pink and magenta, as well as aqua vs. light blue, are hard to distinguish.ย 

      We have updated coloring scheme.

      (5) Figure 2. In the figure legend, the use of brackets for defining closed, intermediate, and open states is confusing, given that the panels are also described with brackets, and some letters match between them. Using a hyphen or bolding the abbreviations could help. Also, define chains A and B, make the black lines that I assume indicate distances in C bold or thicker as they are very hard to see in the figure, and add to the legend what those lines mean.ย 

      The abbreviations have been changed from parentheses to quotes, and suggestions have been implemented.

      (6) Figure 4 is confusing as shown. Since the authors mention the general range of motion in sACE-N first in the text, wouldn't it make more sense to show panel B first and then panel A? Also, can you point and label the "tip connecting the two long helices of the D1a subdomain" in the figure? It is not clear to me where this region is in B. In addition, add a description of the arrows in B and C to the figure legend.ย 

      Most changes incorporated. The order should make more sense now in light of other changes.

      (7) Figure 5. Can the authors add a description to the legend as to what the arrows indicate and their thickness?ย 

      Done

      (8) Add a scale bar to the micrograph images in the supplementary figures.ย 

      Figure S2 and S4 need the scale bar.

      (9) Provide a more comprehensive description of buffers used in the DF analysis, as this information could be useful to others.ย 

      We have included the data in Table S1.<br /> (10) Line 51: Reference format not consistent with other references: (Wu et al., 2023).ย 

      Fixed

      (11) Line 66: Define "ADAM".ย 

      The definition has been added.

      (12) Line 90: The authors say: Recent open state structures of sACE-N, sACE monomer, and a sACE-N dimer, along with molecular dynamics (MD) simulations of sACE-C, have begun to reveal the conformational heterogeneity, though it remains under-studied (27-29)." Can the authors clarify what "it" refers to? The full-length ACE, sACE, or its specific domains?ย 

      The sentence now reads: Recent open state structures of sACE-N, sACE monomer, and a sACE-N dimer, along with molecular dynamics (MD) simulations of sACE-C, have begun to reveal ACE conformational dynamics, though they remain under-studied (29-31).

      (13) Line 204: "The comparison of our dimeric sACE cryoEM structures of reveals the conformational dynamics of sACE catalytic domains." The second "of" should be removed.ย 

      Fixed<br /> (14) Line 268: "From room mean square fluctuation (RMSF) analysis..." "room" should be replaced with "root."

      Fixed

    1. eLife Assessment

      Arecchi et al. demonstrate that polarized second-harmonic generation microscopy can be used to probe the ON/OFF states of myosin in both permeabilized and intact muscle, making this key measurement accessible to a greater number of labs. This has the potential to help with the study of disease-causing mutations and our understanding of drug function. The methodology is well defined, and the results are important; however, whilst this is overall a convincing study, there are some limitations to the interpretation of the data.

    2. Reviewer #1 (Public review):

      Summary:

      This study utilizes polarized second-harmonic generation (pSHG) microscopy to investigate myosin conformation in the relaxed state, distinguishing between the disordered, actin-accessible ON state and the ordered, energy-conserving OFF state. By pharmacologically modulating the ON/OFF equilibrium with a myosin activator (2-deoxyATP) and inhibitor (Mavacamten), the authors demonstrate that pSHG can sensitively quantify the ON/OFF ratio in both skeletal and cardiac muscle. Validation with X-ray diffraction supports the accuracy of the method. Applying this approach to a hypertrophic cardiomyopathy model, the study shows that R403Q/MYH7-mutated minipigs exhibit an increased ON state fraction relative to controls. This difference is eliminated under saturating concentrations of myosin modulators, indicating that the ON/OFF balance can be pharmacologically shifted to its extremes. Additionally, ATPase assays reveal elevated resting ATPase activity in R403Q samples, which persists even when the ON state is saturated, suggesting that increased energy consumption in this mutation is driven by both a shift toward the ON state and inherently higher myosin ATPase activity.

      Strengths:

      This is a well-written and well-conducted study that clearly reveals the power of SHG microscopy. The study clearly establishes the great utility of SHG to study thick filament regulation.

      Weaknesses:

      (1)โ€ฏSeveral studies have shown that the ON state of the thick filament is sensitive to both temperature and filament lattice spacing, with a common recommendation to conduct skinned fiber experiments at temperatures above 27{degree sign}C and in the presence of dextran to better preserve physiological conditions. The authors should clarify the experimental temperature used in their skinned fiber studies, indicate whether dextran was included, and discuss whether adherence to these recommended conditions would have impacted their results.

      (2)โ€ฏOn page 13, the authors report the proportion of disordered heads as approximately 30% in wild-type and 65% in R403Q fibers. They should clarify whether these values represent the percentage of total myosin heads, or rather the percentage of heads that are responsive to Mavacamten and dATP.

      (3)โ€ฏIn Figure 5, regarding ATPase measurements, the content of contractile material per unit volume of muscle preparation will influence the results. Did the authors account for this variable, and if not, how might it have affected the conclusions?

      (4)โ€ฏFor readers primarily interested in assessing the ON/OFF state of thick filaments, could the authors list the specific advantages of polarized second harmonic generation (pSHG) microscopy compared to X-ray diffraction?

      (5)โ€ฏGiven that many data points were derived from the same fiber or myocyte, how did the authors address the risk of type I errors due to non-independence of measurements? Was a nested or hierarchical statistical approach used?

    3. Reviewer #2 (Public review):

      Summary:

      In striated muscle, myosin motors can dynamically switch between an energy-conserving OFF state and an activated ON state. This switching is important for meeting the body's needs under different physiological conditions, and previous studies have shown that disease-causing mutations associated with cardiomyopathies can affect the population of these states, leading to aberrant contractility. Studying these structural states in muscle has previously only been possible via X-ray diffraction, which requires access to a beam line. Here, Arecchi et al. demonstrate that polarized second-harmonic generation microscopy (pSGH), a technique that is more accessible, can be used to probe the ON/OFF states of myosin in both permeabilized and intact muscle.

      Strengths:

      (1) There is an outstanding need in the field to better understand the regulation of the ON/OFF states of myosin. Currently, this is studied using X-ray diffraction, meaning that it is accessible to only a few labs. The authors demonstrate that pSGH can be used to probe the ON/OFF states of myosin both in intact and permeabilized muscle. This is a significant advance, since it makes it possible to study these states in a standard research laboratory.

      (2) The authors demonstrate that this approach can be employed in both skeletal and cardiac muscle. Importantly, it works with both porcine and mouse cardiac muscle, which are two of the most important animal models for preclinical studies.

      (3) The authors manipulate the ON/OFF equilibrium using both drugs and a genetic model of hypertrophic cardiomyopathy that has been shown to modulate the ON/OFF equilibrium. Their results generally agree with previous studies conducted using X-ray diffraction as well as biochemical measurements of myosin autoinhibition.

      Weaknesses:

      (1) While the application of pSGH to the ON/OFF equilibrium is an important advance, there are limited new biological insights since the perturbations used here have been extensively characterized in previous studies.

      (2) SGH has previously been applied to study the nucleotide-dependent orientation of myosin motors in the sarcomere (PMID: 20385845). The authors have previously interpreted the value of gamma as being a readout of lever arm position, but here, it is interpreted as a measure of ON/OFF equilibrium. When this technique is applied to intact muscle, it is not clear how to deconvolve the contributions of lever arm angle from the ON/OFF population (especially where there is a mix of states that give rise to the gamma value). This is an important limitation that is not discussed in the manuscript.

      (3) The R403Q mutation has previously been shown to cause an increase in ATP usage. Here, the authors measure an elevated basal ATPase rate under relaxing conditions, and they interpret this as showing increased myosin ATPase activity intrinsic to the motors; however, care should be used in interpreting these results. Work from the Spudich lab has shown that the R403Q mutation can appear as increasing motor function in some assays but depressing motor function in others (see PMID: 32284968, 26601291). Moreover, the actin-activated ATPase rate is an order of magnitude higher than the basal ATPase rate, and thus, small changes in the basal ATPase rate are unlikely to be important for physiology.

      (4) The authors interpret some of their data based on the assumption that the high concentrations of drugs cause the myosin to either adopt 100% OFF or ON states. This assumption is not validated, limiting the ability to interpret the fraction of myosins in the ON/OFF states.

      (5) The ATPase measurements are innovative but hard to interpret. dATP and ATP do not have identical ATPase kinetics, meaning that it is hard to deconvolve whether the elevated ATPase rate with dATP is due to changes in the ON/OFF population and/or intrinsic ATPase activity. Similarly, mavacamten reduces the rate of phosphate release from myosin, and this effect is not strictly coupled to the formation of the OFF state (e.g., see PMID: 40118457). As such, it is difficult to deconvolve drug-based changes in the inherent ATPase kinetics of the myosin from changes in the OFF-state population.

    4. Reviewer #3 (Public review):

      Summary:

      This is a very interesting paper extending the use of SHG to the study of relaxed muscle and its use to assess the order-disorder (and on /off) states of myosin heads in the thick filament. The work convincingly shows that SHG and the parameter gamma provide a reliable measure of the state of the myosin heads in a range of different relaxed muscle fibres, both intact and skinned, and in myofibrils. In mini pig cardiac fibres, the use of dATP and mavacamten increased or decreased the number of heads in the disordered state, respectively. On the assumption that these treatments push myosins fully into the disordered or ordered state, then this allows the fraction of ordered heads to be assessed under a wide variety of conditions. It is unfortunate that dATP treatment was not used (as mavacmten was) on rabbit psoas and mouse samples to further test this hypothesis.

      The results with the myosin mutant R403Q support the idea that this mutation reduces the fraction of myosin heads in the ordered state and that mavacamten can recover the WT situation.

      The results from SHG were compared with parallel studies using X-rays to validate the conclusions. Independent fibre ATPase data further support the conclusions.

      The work is solid and provides a novel approach to assessing the activity state of muscle thick filaments. The authors point out some of the potential uses of this approach in the future, including time-resolved SHG measurements. Indeed, jumps in mavacamten or dATP concentration with time-resolved SHG could measure the rates of entry and exit from the ordered, off state of the filament. A measurement is urgently needed in the field.

      Strengths:

      (1) The SHG signal is convincingly shown to assess the fraction of ordered/disordered myosin heads in the thick filament of a variety of muscle fibres.

      (2) The results are similar for rabbit psoas, mouse, and minipig cardiac fibres. Skinning the fibres and production of myofibrils do not change the SHG signal.

      (3) Use of myosin R403Q mutant in mini pig confirms a loss of ordered myosin heads, and the ordered heads can be recovered by mavacamten.

      (4) Parallel X-ray scattering and ATPase data support the conclusions.

      (5) Assuming that dATP and mavacamten generate 100% disordered vs ordered myosin heads respectively, then the percentage of ordered heads can be calculated for a variety of conditions.

      Weaknesses:

      (1) Issues like the effect of fibre disarray and lattice spacing on the SHG signal are not well defined.

      (2) The, now well-defined heterogeneity of thick filament structure is not acknowledged.

      (3) dATP was only used on minipig cardiac fibres. The effect of dATP on rabbit psoas and mouse cardiac fibres would be a useful comparison and would help validate the calculation of % ordered heads.

    1. eLife Assessment

      This important study demonstrates that yeast populations can rapidly evolve freeze-thaw tolerance by converging on a trehalose-rich, quiescence-like state, illuminating a general physiological route to extreme-stress adaptation. The evidence is solid, combining rigorous experimental-evolution design with multi-scale phenotyping, biophysical measurements, whole-genome sequencing, and quantitative modeling that together support the mechanistic conclusions. Questions about the novelty relative to prior growth/stress tolerance links, the precise genetic versus non-genetic drivers of trehalose up-regulation, and the breadth of independently evolved lines. These are areas for clarification, but these do not substantially weaken the overall contribution.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript presents findings on the adaptation mechanisms of Saccharomyces cerevisiae under extreme stress conditions. The authors try to generalize this to adaptation to stress tolerance. A major finding is that S. cerevisiae evolves a quiescence-like state with high trehalose to adapt to freeze-thaw tolerance independent of their genetic background. The manuscript is comprehensive, and each of the conclusions is well supported by careful experiments.

      Strengths:

      This is excellent interdisciplinary work.

      Weaknesses: .

      I have questions regarding the overall novelty of the proposal, which I would like the authors to explain.

      (1) Earlier papers have shown that loss of ribosomal proteins, that slow growth, leads to better stress tolerance in S. cerevisiae. Given this, isn't it expected that any adaptation that slows down growth would, overall, increase stress tolerance? Even for other systems, it has been shown that slowing down growth (by spore formation in yeast or bacteria/or dauer formation in C. elegans) is an effective strategy to combat stress and hence is a likely route to adaptation. The authors stress this as one of the primary findings. I would like the authors to explain their position, detailing how their findings are unexpected in the context of the literature.

      (2) Convergent evolution of traits: I find the results unsurprising. When selecting for a trait, if there is a major mode to adapt to that stress, most of the strains would adapt to that mode, independent of the route. According to me, finding out this major route was the objective of many of the previous reports on adaptive evolution. The surprising part in the previous papers (on adaptive evolution of bacteria or yeast) was the resampling of genes that acquired mutations in multiple replicates of an evolution experiments, providing a handle to understand the major genetic route or the molecular mechanism that guides the adaptation (for example in this case it would be - what guides the over-accumulation of trehalose). I fail to understand why the authors find the results surprising, and I would be happy to understand that from the authors. I may have missed something important.

      (3) Adaptive evolution would work on phenotype, as all of selective evolution is supposed to. So, given that one of the phenotypes well-known in literature to allow free-tolerance is trehalose accumulation, I think it is not surprising that this trait is selected. For me, this is not a case of "non-genetic" adaptation as the authors point out: it is likely because perturbation of many genes can individually result in the same outcome - upregulation of trehalose accumulation. Thereby, although the adaptation is genetic, it is not homogeneous across the evolving lines - the end result is. Do the authors check that the trait is actually a non-genetic adaptation, i.e., if they regrow the cells for a few generations without the stress, the cells fall back to being similarly only partially fit to freeze-thaw cycles? Additionally, the inability to identify a network that is conserved in the sequencing does not mean that there is no regulatory pathway. A large number of cryptic pathways may exist to alter cellular metabolic states.<br /> This is a point in continuation of point #2, and I would like to understand what I have missed.

      (4) To propose the convergent nature, it would be important to check for independently evolved lines and most probably more than 2 lines. It is not clear from their results section if they have multiple lines that have evolved independently.

      (5) For the genomic studies, it is not clear if the authors sequenced a pool or a single colony from the evolved strains. This is an important point, since an average sequence will miss out on many mutations and only focus on the mutations inherited from a common ancestral cell. It is also not clear from the section.

    3. Reviewer #2 (Public review):

      Summary:

      The authors used experimental evolution, repeatedly subjecting Saccharomyces cerevisiae populations to rapid liquid-nitrogen freeze-thaw cycles while tracking survival, cellular biophysics, metabolite levels, and whole-genome sequence changes. Within 25 cycles, viability rose from ~2 % to ~70 % in all independent lines, demonstrating rapid and highly convergent adaptation despite distinct starting genotypes. Evolved cells accumulated about threefold more intracellular trehalose, adopted a quiescence-like phenotype (smaller, denser, non-budding cells), showed cytoplasmic stiffening and reduced membrane damage, and re-entered growth with shorter lag traits that together protected them from ice-induced injury. Whole-genome sequencing indicated that multiple genetic routes can yield the same mechano-chemical survival strategy. A population model in which trehalose controls quiescence entry, growth rate, lag, and freeze-thaw survival reproduced the empirical dynamics, implicating physiological state transitions rather than specific mutations as the primary adaptive driver. The study therefore concludes that extreme-stress tolerance can evolve quickly through a convergent, trehalose-rich quiescence-like state that reinforces membrane integrity and cytoplasmic structure.

      Strengths:

      The strengths of the paper are the experimental design, data presentation and interpretation, and that it is well-written.

      Weaknesses:

      (1) While the phenotyping is thorough, a few more growth curves would be quite revealing to determine the extent of cross-stress protection. For example, comparing growth rates under YPD vs. YPEG (EtOH/glycerol), and measuring growth at 37ยบC or in the presence of 0.8 M KCl.

      (2) Is GEMS integrated prior to evolution? Are the evolved cells transformable?

      (3) From the table, it looks like strains either have mutations in Ras1/2 or Vac8. Given the known requirements of Ras/PKA signaling for the G1/S checkpoint (to make sure there are enough nutrients for S phase), this seems like a pathway worth mentioning and referencing. Regarding Vac8, its emerging roles in NVJ and autophagy suggest another nutrient checkpoint, perhaps through TORC1. The common theme is rewired metabolism, which is probably influencing the carbon shuttling to trehalose synthesis.

    1. eLife Assessment

      This study reports the important development and characterization of next-generation analogs of the molecule AA263, which was previously identified for its ability to promote adaptive ER proteostasis remodeling. The evidence supporting the conclusions is convincing, with rigorous assays used to benchmark the changes in potency and efficacy of the AA263 analogs as well as AA263 targets. The ability of AA263 analogs to restore the loss of function associated with disease-associated proteins prone to misfolding will be of interest to pharmacologists, chemical biologists, and cell biologists, as well as those working on protein misfolding disorders.

    2. Reviewer #1 (Public review):

      Summary:

      This study builds off prior work that focused on the molecule AA147 and its role as an activator of the ATF6 arm of the unfolded protein response. In prior manuscripts, AA147 was shown to enter the ER, covalently modify a subset of protein disulfide isomerases (PDIs), and improve ER quality control for the disease-associated mutants of AAT and GABAA. Unsuccessful attempts to improve the potency of AA147 have led the authors to characterize a second hit from the screen in this study: the phenylhydrazone compound AA263. The focus of this study on enhancing the biological activity of the AA147 molecule is compelling, and overcomes a hurdle of the prior AA147 drug that proved difficult to modify. The study successfully identifies PDIs as a shared cellular target of AA263 and its analogs. The authors infer, based on the similar target hits previously characterized for AA147, that PDI modification accounts for a mechanism of action for AA263.

      Strengths:

      The authors are able to establish that, like AA147, AA263 covalently targets ER PDIs. The work establishes the ability to modify the AA263 molecule to create analogs with more potency and efficacy for ATF6 activation. The "next generation" analogs are able to enhance the levels of functional AAT and GABAA receptors in cellular models expressing the Z-variant of AAT or an epilepsy-associated variant of the GABAA receptor, outlining the therapeutic potential for this molecule and laying the foundation for future organism-based studies.

      Weaknesses:

      Arguably, the work does not fully support the statement provided in the abstract that the study "reveals a molecular mechanism for the activation of ATF6". The identification of targets of AA263 and its analogs is clear. However, it is a presumption that the overlap in PDIs as targets of both AA263 and AA147 means that AA263 works through the PDIs. While a likely mechanism, this conclusion would be bolstered by establishing that knockdown of the PDIs lessens drug impact with respect to ATF6 activation. Alternatively, it has previously been suggested that the cell-type dependent activity of AA263 may be traced to the presence of cell-type specific P450s that allow for the metabolic activation of AA263 or cell-type specific PDIs (Plate et al 2016; Paxman et al 2018). If the PDI target profile is distinct in different cell types, and these target difference correlates with ATF6-induced activity by AA263, that would also bolster the authors' conclusion.

    3. Reviewer #2 (Public review):

      Modulating the UPR by pharmacological targeting of its sensors (or regulators) provides mostly uncharted opportunities in diseases associated with protein misfolding in the secretory pathway. Spearheaded by the Kelly and Wiseman labs, ATF6 modulators were developed in previous years that act on ER PDIs as regulators of ATF6. However, hurdles in their medicinal chemistry have hampered further development. In this study, the authors provide evidence that the small molecule AA263 also targets and covalently modifies ER PDIs, with the effect of activating ATF6. Importantly, AA263 turned out to be amenable to chemical optimization while maintaining its desired activity. Building on this, the authors show that AA263 derivatives can improve the aggregation, trafficking, and function of two disease-associated mutants of secretory pathway proteins. Together, this study provides compelling evidence for AA263 (and its derivatives) being interesting modulators of ER proteostasis. Mechanistic details of its mode of action will need more attention in future studies that can now build on this.

      In detail, the authors provide strong evidence that AA263 covalently binds to ER PDIs, which will inhibit the protein disulfide isomerase activity. ER PDIs regulate ATF6, and thus their finding provides a mechanistic interpretation of AA263 activating the UPR. It should be noted, however, that AA263 shows broad protein labeling (Figure 1G), which may suggest additional targets, beyond the ones defined as MS hits in this study. Also, a further direct analysis of the IRE1 and PERK pathways (activated or not by AA263) would have been a benefit, as e.g., PDIA1, a target of AA263, directly regulates IRE1 (Yu et al., EMBOJ, 2020), and other PDIs also act on PERK and IRE1. The authors interpret modest activation of IRE1/PERK target genes (Figure 2C) as an effect on target gene overlap, indeed the most likely explanation based on their selective analyses on IRE1 (ERdj4) and PERK (CHOP) downstream genes, but direct activation due to the targeting of their PDI regulators is also a possible explanation. Further key findings of this paper are the observed improvement of AAT behavior and GABAA trafficking and function. Further strength to the mechanistic conclusion that ATF6 activation causes this could be obtained by using ATF6 inhibitors/knockouts in the presence of AA263 (as the target PDIs may directly modulate the behavior of AAT and/or GABAA). Along the same line, it also warrants further investigation why the different compounds, even if all were used at concentrations above their EC50, had different rescuing capacities on the clients.

      Together, the study now provides a strong basis for such in-depth mechanistic analyses.

    4. Reviewer #3 (Public review):

      Summary:

      This study aims to develop and characterize phenylhydrazone-based small molecules that selectively activate the ATF6 arm of the unfolded protein response by covalently modifying a subset of ER-resident PDIs. The authors identify AA263 as a lead scaffold and optimize its structure to generate analogs with improved potency and ATF6 selectivity, notably AA263-20. These compounds are shown to restore proteostasis and functional expression of disease-associated misfolded proteins in cellular models involving both secretory (AAT-Z) and membrane (GABAA receptor) proteins. The findings provide valuable chemical tools for modulating ER proteostasis and may serve as promising leads for therapeutic development targeting protein misfolding diseases.

      Strengths:

      (1) The study presents a well-defined chemical biology framework integrating proteomics, transcriptomics, and disease-relevant functional assays.

      (2) Identification and optimization of a new electrophilic scaffold (AA263) that selectively activates ATF6 represents a valuable advance in UPR-targeted pharmacology.

      (3) SAR studies are comprehensive and logically drive the development of more potent and selective analogs such as AA263-20.

      (4) Functional rescue is demonstrated in two mechanistically distinct disease models of protein misfolding-one involving a secretory protein and the other a membrane protein-underscoring the translational relevance of the approach.

      Weaknesses:

      (1) ATF6 activation is primarily inferred from reporter assays and transcriptional profiling; however, direct evidence of ATF6 cleavage is lacking.

      (2) While the mechanism involving PDI modification and ATF6 activation is plausible, it remains incompletely characterized.

      (3) No in vivo data are provided, leaving the pharmacological feasibility and bioavailability of these compounds in physiological systems unaddressed.

    1. Author response:

      The following is the authorsโ€™ response to the previous reviews

      Reviewer #1:

      Comment:

      The authors quantified information in gesture and speech, and investigated the neural processing of speech and gestures in pMTG and LIFG, depending on their informational content, in 8 different time-windows, and using three different methods (EEG, HD-tDCS and TMS). They found that there is a time-sensitive and staged progression of neural engagement that is correlated with the informational content of the signal (speech/gesture).

      Strengths:

      A strength of the paper is that the authors attempted to combine three different methods to investigate speech-gesture processing.

      We sincerely appreciate the reviewerโ€™s recognition of our efforts in employing a multi-method approach, which integrates three complementary experimental paradigms, each leveraging distinct neurophysiological techniques to provide converging evidence.

      In Experiment 1, we found that the degree of inhibition in the pMTG and LIFG was strongly associated with the overlap in gesture-speech representations, as quantified by mutual information. Experiment 2 revealed the time-sensitive dynamics of the pMTG-LIFG circuit in processing both unisensory (gesture or speech) and multisensory information. Experiment 3, utilizing high-temporal-resolution EEG, independently replicated the temporal dynamics of gesture-speech integration observed in Experiment 2, further validating our findings.

      The striking convergence across these methodologically independent approaches significantly bolsters the robustness and generalizability of our conclusions regarding the neural mechanisms underlying multisensory integration.

      Comment 1: I thank the authors for their careful responses to my comments. However, I remain not convinced by their argumentation regarding the specificity of their spatial targeting and the time-windows that they used.

      The authors write that since they included a sham TMS condition, that the TMS selectively disrupted the IFG-pMTG interaction during specific time windows of the task related to gesture-speech semantic congruency. This to me does not show anything about the specificity of the time-windows itself, nor the selectivity of targeting in the TMS condition.

      (1) Selection of brain regions (IFG/pMTG)

      We thank the reviewer for their thoughtful consideration. The choice of the left IFG and pMTG as regions of interest (ROIs) was informed by a meta-analysis of fMRI studies on gesture-speech integration, which consistently identified these regions as critical hubs (see Author response table 1 for detailed studies and coordinates).

      Author response table 1.

      Meta-analysis of previous studies on gesture-speech integration.

      Based on the meta-analysis of previous studies, we selected the IFG and pMTG as ROIs for gesture-speech integration. The rationale for selecting these brain regions is outlined in the introduction in Lines 63-66: โ€œEmpirical studies have investigated the semantic integration between gesture and speech by manipulating their semantic relationship[15-18] and revealed a mutual interaction between them19-21 as reflected by the N400 latency and amplitude14 as well as common neural underpinnings in the left inferior frontal gyrus (IFG) and posterior middle temporal gyrus (pMTG)[15,22,23].โ€

      And further described in Lines 77-78: โ€œExperiment 1 employed high-definition transcranial direct current stimulation (HD-tDCS) to administer Anodal, Cathodal and Sham stimulation to either the IFG or the pMTGโ€. And Lines 85-88: โ€˜Given the differential involvement of the IFG and pMTG in gesture-speech integration, shaped by top-down gesture predictions and bottom-up speech processing [23], Experiment 2 was designed to assess whether the activity of these regions was associated with relevant informational matricesโ€.

      In the Methods section, we clarified the selection of coordinates in Lines 194-200: โ€œBuilding on a meta-analysis of prior fMRI studies examining gesture-speech integration[22], we targeted Montreal Neurological Institute (MNI) coordinates for the left IFG at (-62, 16, 22) and the pMTG at (-50, -56, 10). In the stimulation protocol for HD-tDCS, the IFG was targeted using electrode F7 as the optimal cortical projection site[36], with four return electrodes placed at AF7, FC5, F9, and FT9. For the pMTG, TP7 was selected as the cortical projection site[36], with return electrodes positioned at C5, P5, T9, and P9.โ€

      The selection of IFG or pMTG as integration hubs for gesture and speech has also been validated in our previous studies. Specifically, Zhao et al. (2018, J. Neurosci) applied TMS to both areas. Results demonstrated that disrupting neural activity in the IFG or pMTG via TMS selectively impaired the semantic congruency effect (reaction time costs due to semantic incongruence), while leaving the gender congruency effect unaffected.

      These findings identified the IFG and pMTG as crucial hubs for gesture-speech integration, guiding the selection of brain regions for our subsequent studies.

      (2) Selection of time windows

      The five key time windows (TWs) analyzed in this study were derived from our previous TMS work (Zhao et al., 2021, J. Neurosci), where we segmented the gesture-speech integration period (0โ€“320 ms post-speech onset) into eight 40-ms windows. This interval aligns with established literature on gesture-speech integration, particularly the 200โ€“300 ms window noted by the reviewer. As detailed in Lines (776-779): โ€œProcedure of Experiment 2. Eight time windows (TWs, duration = 40 ms) were segmented in relative to the speech IP. Among the eight TWs, five (TW1, TW2, TW3, TW6, and TW7) were chosen based on the significant results in our prior study[23]. Double-pulse TMS was delivered over each of the TW of either the pMTG or the IFGโ€.

      In our prior work (Zhao et al., 2021, J. Neurosci), we employed a carefully controlled experimental design incorporating two key factors: (1) gesture-speech semantic congruency (serving as our primary measure of integration) and (2) gesture-speech gender congruency (implemented as a matched control factor). Using a time-locked, double-pulse TMS protocol, we systematically targeted each of the eight predefined time windows (TWs) within the left IFG, left pMTG, or vertex (serving as a sham control condition). Our results demonstrated that a TW-selective disruption of gesture-speech integration, indexed by the semantic congruency effect (i.e., a cost of reaction time because of semantic conflict), when stimulating the left pMTG in TW1, TW2, and TW7 but when stimulating the left IFG in TW3 and TW6. Crucially, no significant effects were observed during either sham stimulation or the controlled gender congruency factor (Figure 3 from Zhao et al., 2021, J. Neurosci).

      This triple dissociation - showing effects only for semantic integration, only in active stimulation, and only at specific time points - provides compelling causal evidence that IFG-pMTG connectivity plays a temporally precise role in gesture-speech integration.

      Noted that this work has undergone rigorous peer review by two independent experts who both endorsed our methodological approach. Their original evaluations, provided below:

      Reviewer 1: โ€œsignificance: Using chronometric TMS-stimulation the data of this experiment suggests a feedforward information flow from left pMTG to left IFG followed by an information flow from left IFG back to the left pMTG.ย  The study is the first to provide causal evidence for the temporal dynamics of the left pMTG and left IFG found during gesture-speech integration.โ€

      Reviewer 2: โ€œBeyond the new results the manuscript provides regarding the chronometrical interaction of the left inferior frontal gyrus and middle temporal gyrus in gesture-speech interaction, the study more basically shows the possibility of unfolding temporal stages of cognitive processing within domain-specific cortical networks using short-time interval double-pulse TMS. Although this method also has its limitations, a careful study planning as shown here and an appropiate discussion of the results can provide unique insights into cognitive processing.โ€

      References:

      Willems, R.M., Ozyurek, A., and Hagoort, P. (2009). Differential roles for left inferior frontal and superior temporal cortex in multimodal integration of action and language. Neuroimage 47, 1992-2004. 10.1016/j.neuroimage.2009.05.066.

      Drijvers, L., Jensen, O., and Spaak, E. (2021). Rapid invisible frequency tagging reveals nonlinear integration of auditory and visual information. Human Brain Mapping 42, 1138-1152. 10.1002/hbm.25282.

      Drijvers, L., and Ozyurek, A. (2018). Native language status of the listener modulates the neural integration of speech and iconic gestures in clear and adverse listening conditions. Brain and Language 177, 7-17. 10.1016/j.bandl.2018.01.003.

      Drijvers, L., van der Plas, M., Ozyurek, A., and Jensen, O. (2019). Native and non-native listeners show similar yet distinct oscillatory dynamics when using gestures to access speech in noise. Neuroimage 194, 55-67. 10.1016/j.neuroimage.2019.03.032.

      Holle, H., and Gunter, T.C. (2007). The role of iconic gestures in speech disambiguation: ERP evidence. J Cognitive Neurosci 19, 1175-1192. 10.1162/jocn.2007.19.7.1175.

      Kita, S., and Ozyurek, A. (2003). What does cross-linguistic variation in semantic coordination of speech and gesture reveal?: Evidence for an interface representation of spatial thinking and speaking. J Mem Lang 48, 16-32. 10.1016/S0749-596x(02)00505-3.

      Bernardis, P., and Gentilucci, M. (2006). Speech and gesture share the same communication system. Neuropsychologia 44, 178-190. 10.1016/j.neuropsychologia.2005.05.007.

      Zhao, W.Y., Riggs, K., Schindler, I., and Holle, H. (2018). Transcranial magnetic stimulation over left inferior frontal and posterior temporal cortex disrupts gesture-speech integration. Journal of Neuroscience 38, 1891-1900. 10.1523/Jneurosci.1748-17.2017.

      Zhao, W., Li, Y., and Du, Y. (2021). TMS reveals dynamic interaction between inferior frontal gyrus and posterior middle temporal gyrus in gesture-speech semantic integration. The Journal of Neuroscience, 10356-10364. 10.1523/jneurosci.1355-21.2021.

      Hartwigsen, G., Bzdok, D., Klein, M., Wawrzyniak, M., Stockert, A., Wrede, K., Classen, J., and Saur, D. (2017). Rapid short-term reorganization in the language network. Elife 6. 10.7554/eLife.25964.

      Jackson, R.L., Hoffman, P., Pobric, G., and Ralph, M.A.L. (2016). The semantic network at work and rest: Differential connectivity of anterior temporal lobe subregions. Journal of Neuroscience 36, 1490-1501. 10.1523/JNEUROSCI.2999-15.2016.

      Humphreys, G. F., Lambon Ralph, M. A., & Simons, J. S. (2021). A Unifying Account of Angular Gyrus Contributions to Episodic and Semantic Cognition. Trends in neurosciences, 44(6), 452โ€“463. https://doi.org/10.1016/j.tins.2021.01.006

      Bonner, M. F., & Price, A. R. (2013). Where is the anterior temporal lobe and what does it do?. The Journal of neuroscience : the official journal of the Society for Neuroscience, 33(10), 4213โ€“4215. https://doi.org/10.1523/JNEUROSCI.0041-13.2013

      Comment 2: It could still equally well be the case that other regions or networks relevant for gesture-speech integration are targeted, and it can still be the case that these timewindows are not specific, and effects bleed into other time periods. There seems to be no experimental evidence here that this is not the case.

      The selection of IFG and pMTG as regions of interest was rigorously justified through multiple lines of evidence. First, a comprehensive meta-analysis of fMRI studies on gesture-speech integration consistently identified these regions as central nodes (see response to comment 1). Second, our own previous work (Zhao et al., 2018, JN; 2021, JN) provided direct empirical validation of their involvement. Third, by employing the same experimental paradigm, we minimized the likelihood of engaging alternative networks. Fourth, even if other regions connected to IFG or pMTG might be affected by TMS, the distinct engagement of specific time windows of IFG and pMTG minimizes the likelihood of consistent influence from other regions.

      Regarding temporal specificity, our 2021 study (Zhao et al., 2021, JN, see details in response to comment 1) systematically examined the entire 0-320ms integration window and found that only select time windows showed significant effects for gesture-speech semantic congruency, while remaining unaffected during gender congruency processing. This double dissociation (significant effects for semantic integration but not gender processing in specific windows) rules out broad temporal spillover.

      Comment 3: To be more specific, the authors write that double-pulse TMS has been widely used in previous studies (as found in their table). However, the studies cited in the table do not necessarily demonstrate the level of spatial and temporal specificity required to disentangle the contributions of tightly-coupled brain regions like the IFG and pMTG during the speech-gesture integration process. pMTG and IFG are located in very close proximity, and are known to be functionally and structurally interconnected, something that is not necessarily the case for the relatively large and/or anatomically distinct areas that the authors mention in their table.

      Our methodological approach is strongly supported by an established body of research employing double-pulse TMS (dpTMS) to investigate neural dynamics across both primary motor and higher-order cognitive regions. As documented in Author response table 1, multiple studies have successfully applied this technique to: (1) primary motor areas (tongue and lip representations in M1), and (2) semantic processing regions (including pMTG, PFC, and ATL). Particularly relevant precedents include:

      (1) Teige et al. (2018, Cortex): Demonstrated precise spatial and temporal specificity by applying 40ms-interval dpTMS to ATL, pMTG, and mid-MTG across multiple time windows (0-40ms, 125-165ms, 250-290ms, 450-490ms), revealing distinct functional contributions from ATL versus pMTG.

      (2) Vernet et al. (2015, Cortex): Successfully dissociated functional contributions of right IPS and DLPFC using 40ms-interval dpTMS, despite their anatomical proximity and functional connectivity.

      These studies confirm double-pulse TMS can discriminate interconnected nodes at short timescales. Our 2021 study further validated this for IFG-pMTG.

      Author response table 2.

      Double-pulse TMS studies on brain regions over 3-60 ms time interval

      References:

      Teige, C., Mollo, G., Millman, R., Savill, N., Smallwood, J., Cornelissen, P. L., & Jefferies, E. (2018). Dynamic semantic cognition: Characterising coherent and controlled conceptual retrieval through time using magnetoencephalography and chronometric transcranial magnetic stimulation. Cortex, 103, 329-349.

      Vernet, M., Brem, A. K., Farzan, F., & Pascual-Leone, A. (2015). Synchronous and opposite roles of the parietal and prefrontal cortices in bistable perception: a double-coil TMSโ€“EEG study. Cortex, 64, 78-88.

      Comment 4: But also more in general: The mere fact that these methods have been used in other contexts does not necessarily mean they are appropriate or sufficient for investigating the current research question. Likewise, the cognitive processes involved in these studies are quite different from the complex, multimodal integration of gesture and speech. The authors have not provided a strong theoretical justification for why the temporal dynamics observed in these previous studies should generalize to the specific mechanisms of gesture-speech integration..

      The neurophysiological mechanisms underlying double-pulse TMS (dpTMS) are well-characterized. While it is established that single-pulse TMS can produce brief artifacts (typically within 0โ€“10 ms) due to transient cortical depolarization (Romero et al., 2019, NC), the dynamics of double-pulse TMS (dpTMS) involve more intricate inhibitory interactions. Specifically, the first pulse increases membrane conductance via GABAergic shunting inhibition, effectively lowering membrane resistance and attenuating the excitatory impact of the second pulse. This results in a measurable reduction in cortical excitability at the paired-pulse interval, as evidenced by suppressed motor evoked potentials (MEPs) (Paulus & Rothwell, 2016, J Physiol). Importantly, this neurophysiological mechanism is independent of cognitive domain and has been robustly demonstrated across multiple functional paradigms.

      In our study, we did not rely on previously reported timing parameters but instead employed a dpTMS protocol using a 40-ms inter-pulse interval. Based on the inhibitory dynamics of this protocol, we designed a sliding temporal window sufficiently broad to encompass the integration period of interest. This approach enabled us to capture and localize the critical temporal window associated with ongoing integrative processing in the targeted brain region.

      We acknowledge that the previous phrasing may have been ambiguous, a clearer and more detailed description of the dpTMS protocol has now been provided in Lines 88-92: โ€œTo this end, we employed chronometric double-pulse transcranial magnetic stimulation, which is known to transiently reduce cortical excitability at the inter-pulse interval]27]. Within a temporal period broad enough to capture the full duration of gestureโ€“speech integration[28], we targeted specific timepoints previously implicated in integrative processing within IFG and pMTG [23].โ€

      References:

      Romero, M.C., Davare, M., Armendariz, M. et al. Neural effects of transcranial magnetic stimulation at the single-cell level. Nat Commun 10, 2642 (2019). https://doi.org/10.1038/s41467-019-10638-7

      Paulus W, Rothwell JC. Membrane resistance and shunting inhibition: where biophysics meets state-dependent human neurophysiology. J Physiol. 2016 May 15;594(10):2719-28. doi: 10.1113/JP271452. PMID: 26940751; PMCID: PMC4865581.

      Obermeier, C., & Gunter, T. C. (2015). Multisensory Integration: The Case of a Time Window of Gesture-Speech Integration. Journal of Cognitive Neuroscience, 27(2), 292-307. https://doi.org/10.1162/jocn_a_00688

      Comment 5: Moreover, the studies cited in the table provided by the authors have used a wide range of interpulse intervals, from 20 ms to 100 ms, suggesting that the temporal precision required to capture the dynamics of gesture-speech integration (which is believed to occur within 200-300 ms; Obermeier & Gunter, 2015) may not even be achievable with their 40 ms time windows.

      Double-pulse TMS has been empirically validated across neurocognitive studies as an effective method for establishing causal temporal relationships in cortical networks, with demonstrated sensitivity at timescales spanning 3-60 m. Our selection of a 40-ms interpulse interval represents an optimal compromise between temporal precision and physiological feasibility, as evidenced by its successful application in dissociating functional contributions of interconnected regions including ATL/pMTG (Teige et al., 2018) and IPS/DLPFC (Vernet et al., 2015). This methodological approach combines established experimental rigor with demonstrated empirical validity for investigating the precisely timed IFG-pMTG dynamics underlying gesture-speech integration, as shown in our current findings and prior work (Zhao et al., 2021).

      Our experimental design comprehensively sampled the 0-320 ms post-stimulus period, fully encompassing the critical 200-300 ms window associated with gesture-speech integration, as raised by the reviewer. Notably, our results revealed temporally distinct causal dynamics within this period: the significantly reduced semantic congruency effect emerged at IFG at 200-240ms, followed by feedback projections from IFG to pMTG at 240-280ms. This precisely timed interaction provides direct neurophysiological evidence for the proposed architecture of gesture-speech integration, demonstrating how these interconnected regions sequentially contribute to multisensory semantic integration.

      Comment 6: I do appreciate the extra analyses that the authors mention. However, my 5th comment is still unanswered: why not use entropy scores as a continous measure?

      Analysis with MI and entropy as continuous variables were conducted employing Representational Similarity Analysis (RSA) (Popal et.al, 2019). This analysis aimed to build a model to predict neural responses based on these feature metrics.

      To capture dynamic temporal features indicative of different stages of multisensory integration, we segmented the EEG data into overlapping time windows (40 ms in duration with a 10 ms step size). The 40 ms window was chosen based on the TMS protocol used in Experiment 2, which also employed a 40 ms time window. The 10 ms step size (equivalent to 5 time points) was used to detect subtle shifts in neural responses that might not be captured by larger time windows, allowing for a more granular analysis of the temporal dynamics of neural activity.

      Following segmentation, the EEG data were reshaped into a four-dimensional matrix (42 channels ร— 20 time points ร— 97 time windows ร— 20 features). To construct a neural similarity matrix, we averaged the EEG data across time points within each channel and each time window. The resulting matrix was then processed using the pdist function to compute pairwise distances between adjacent data points. This allowed us to calculate correlations between the neural matrix and three feature similarity matrices, which were constructed in a similar manner. These three matrices corresponded to (1) gesture entropy, (2) speech entropy, and (3) mutual information (MI). This approach enabled us to quantify how well the neural responses corresponded to the semantic dimensions of gesture and speech stimuli at each time window.

      To determine the significance of the correlations between neural activity and feature matrices, we conducted 1000 permutation tests. In this procedure, we randomized the data or feature matrices and recalculated the correlations repeatedly, generating a null distribution against which the observed correlation values were compared. Statistical significance was determined if the observed correlation exceeded the null distribution threshold (p < 0.05). This permutation approach helps mitigate the risk of spurious correlations, ensuring that the relationships between the neural data and feature matrices are both robust and meaningful.

      Finally, significant correlations were subjected to clustering analysis, which grouped similar neural response patterns across time windows and channels. This clustering allowed us to identify temporal and spatial patterns in the neural data that consistently aligned with the semantic features of gesture and speech stimuli, thus revealing the dynamic integration of these multisensory modalities across time. Results are as follows:

      (1)ย  Two significant clusters were identified for gesture entropy (Figure 1 left). The first cluster was observed between 60-110 ms (channels F1 and F3), with correlation coefficients (r) ranging from 0.207 to 0.236 (p < 0.001). The second cluster was found between 210-280 ms (channel O1), with r-values ranging from 0.244 to 0.313 (p < 0.001).

      (2)ย  For speech entropy (Figure 1 middle), significant clusters were detected in both early and late time windows. In the early time windows, the largest significant cluster was found between 10-170 ms (channels F2, F4, F6, FC2, FC4, FC6, C4, C6, CP4, and CP6), with r-values ranging from 0.151 to 0.340 (p = 0.013), corresponding to the P1 component (0-100 ms). In the late time windows, the largest significant cluster was observed between 560-920 ms (across the whole brain, all channels), with r-values ranging from 0.152 to 0.619 (p = 0.013).

      (3)ย  For mutual information (MI) (Figure 1 right), a significant cluster was found between 270-380 ms (channels FC1, FC2, FC3, FC5, C1, C2, C3, C5, CP1, CP2, CP3, CP5, FCz, Cz, and CPz), with r-values ranging from 0.198 to 0.372 (p = 0.001).

      Author response image 1.

      Results of RSA analysis.

      These additional findings suggest that even using a different modeling approach, neural responses, as indexed by feature metrics of entropy and mutual information, are temporally aligned with distinct ERP components and ERP clusters, as reported in the current manuscript. This alignment serves to further consolidate the results, reinforcing the conclusion we draw. Considering the length of the manuscript, we did not include these results in the current manuscript.

      Reference:

      Popal, H., Wang, Y., & Olson, I. R. (2019). A guide to representational similarity analysis for social neuroscience. Social cognitive and affective neuroscience, 14(11), 1243-1253.

      Comment 7: In light of these concerns, I do not believe the authors have adequately demonstrated the spatial and temporal specificity required to disentangle the contributions of the IFG and pMTG during the gesture-speech integration process. While the authors have made a sincere effort to address the concerns raised by the reviewers, and have done so with a lot of new analyses, I remain doubtful that the current methodological approach is sufficient to draw conclusions about the causal roles of the IFG and pMTG in gesture-speech integration.

      To sum up:

      (1) Empirical validation from our prior work (Zhao et al., 2018,2021,JN): The selection of IFG and pMTG as target regions was informed by both: (1) a comprehensive meta-analysis of fMRI studies on gesture-speech integration, and (2) our own prior causal evidence from Zhao et al. (2018, J Neurosci), with detailed stereotactic coordinates provided in the attached Response to Editors and Reviewers letter. The temporal parameters were similarly grounded in empirical data from Zhao et al. (2021, J Neurosci), where we systematically examined eight consecutive 40-ms windows spanning the full integration period (0-320 ms). This study revealed a triple dissociation of effects - occurring exclusively during: (i)semantic integration (but not control tasks), (ii) active stimulation (but not sham), and (iii) specific time windows (but not all time windows)- providing robust causal evidence for the spatiotemporal specificity of IFG-pMTG interactions in gesture-speech processing. Notably, all reviewers recognized the methodological strength of this dpTMS approach in their evaluations (see attached JN assessment for details).

      (2) Convergent evidence from Experiment 3: Our study employed a multi-method approach incorporating three complementary experimental paradigms, each utilizing distinct neurophysiological techniques to provide converging evidence. Specifically, Experiment 3 implemented high-temporal-resolution EEG, which independently replicated the time-sensitive dynamics of gesture-speech integration observed in our double-pulse TMS experiments. The remarkable convergence between these methodologically independent approaches -demonstrating consistent temporal staging of IFG-pMTG interactions across both causal (TMS) and correlational (EEG) measures - significantly strengthens the validity and generalizability of our conclusions regarding the neural mechanisms underlying multisensory integration.

      (3) Established precedents in double-pulse TMS literature: The double-pulse TMS methodology employed in our study is firmly grounded in established neuroscience research. As documented in our detailed Response to Editors and Reviewers letter (citing 11 representative studies), dpTMS has been extensively validated for investigating causal temporal dynamics in cortical networks, with demonstrated sensitivity at timescales ranging from 3-60 ms. Particularly relevant precedents include: 1. Teige et al. (2018, Cortex) successfully dissociated functional contributions of anatomically proximal regions (ATL vs. pMTG vs.mid-MTG) using 40-ms-interval double-pulse TMS; 2. Vernet et al. (2015, Cortex) effectively distinguished neural processing in interconnected frontoparietal regions (right IPS vs. DLPFC) using 40-ms double-pulse TMS parameters. Both parameters are identical to those employed in our current study.

      (4) Neurophysiological Plausibility: The neurophysiological basis for the transient double-pulse TMS effects is well-established through mechanistic studies of TMS-induced cortical inhibition (Romero et al.,2019; Paulus & Rothwell, 2016).

      Taking together, we respectfully submit that our methodology provides robust support for our conclusions.

    2. eLife Assessment

      This useful study uses brain stimulation and electroencephalography to study speech-gesture integration. It investigates the role of frontotemporal regions in integrating linguistic and extra-linguistic information during communication, focusing on the inferior frontal gyrus and posterior middle temporal gyrus. Reliance on activation patterns of tightly-coupled brain regions over short timescales leads to incomplete support for the study's conclusions due to conceptual and methodological limitations.

    3. Reviewer #1 (Public review):

      Summary:

      The authors quantified information in gesture and speech, and investigated the neural processing of speech and gestures in pMTG and LIFG, depending on their informational content, in 8 different time-windows, and using three different methods (EEG, HD-tDCS and TMS). They found that there is a time-sensitive and staged progression of neural engagement that is correlated with the informational content of the signal (speech/gesture).

      Strengths:

      A strength of the paper is that the authors attempted to combine three different methods to investigate speech-gesture processing.

      Comments on revisions:

      I thank the authors for their careful responses to my comments. However, I remain not convinced by their argumentation regarding the specificity of their spatial targeting and the time-windows that they used.

      I do not believe the authors have adequately demonstrated the spatial and temporal specificity required to disentangle the contributions of the IFG and pMTG during the gesture-speech integration process. While the authors have made a sincere effort to address the concerns raised by the reviewers, and have done so with a lot of new analyses, I remain doubtful that the current methodological approach is sufficient to draw conclusions about the causal roles of the IFG and pMTG in gesture-speech integration.

    4. Reviewer #2 (Public review):

      Summary

      The study is an innovative and fundamental study that clarified important aspects of brain processes for integration of information from speech and iconic gesture (i.e., gesture that depicts action, movement, and shape), based on tDCS, TMS and EEG experiments. They evaluated their speech and gesture stimuli in information-theoretic ways and calculated how informative speech is (i.e., entropy), how informative gesture is, and how much shared information speech and gesture encode. The tDCS and TMS studies found that the left IFG and pMTG, the two areas that were activated in fMRI studies on speech-gesture integration in the previous literature, are causally implicated in speech-gesture integration. The size of tDC and TMS effects are correlated with entropy of the stimuli or mutual information, which indicates that the effects stems from the modulation of information decoding/integration processes. The EEG study showed that various ERP (event-related potential, e.g., N1-P2, N400, LPC) effects that have been observed in speech-gesture integration experiments in the previous literature are modulated by the entropy of speech/gesture and mutual information. This makes it clear that these effects are related to information decoding processes. The authors propose a model of how speech-gesture integration process unfolds in time, and how IFG and pMTG interact with each other in that process.

      Strengths:

      The key strength of this study is that the authors used information-theoretic measures of their stimuli (i.e., entropy and mutual information between speech and gesture) in all of their analyses. This made it clear that the neuro-modulation (tDCS, TMS) affected information decoding/integration and ERP effects reflect information decoding/integration. This study used tDCS and TMS methods to demonstrate that left IFG and pMTG are causally involved in speech-gesture integration. The size of tDCS and TMS effects are correlated with information-theoretic measures of the stimuli, which indicate that the effects indeed stem from disruption/facilitation of information decoding/integration process (rather than generic excitation/inhibition). The authors' results also showed correlation between information-theoretic measures of stimuli with various ERP effects. This indicates that these ERP effects reflect the information decoding/integration process.

      Weaknesses:

      The "mutual information" cannot capture all types of interplay of the meaning of speech and gesture. The mutual information is calculated based on what information can be decoded from speech alone and what information can be decoded from gesture alone. However, when speech and gesture are combined, a novel meaning can emerge, which cannot be decoded from a single modality alone. When example, a person produce a gesture of writing something with a pen, while saying "He paid". The speech-gesture combination can be interpreted as "paying by signing a cheque". It is highly unlikely that this meaning is decoded when people hear speech only or see gestures only. The current study cannot address how such speech-gesture integration occur in the brain, and what ERP effects may reflect such a process. The future studies can classify different types of speech-gesture integration and investigate neural processes that underlie each type. Another important topic for future studies is to investigate how the neural processes of speech-gesture integration change when the relative timing between the speech stimulus and the gesture stimulus changes.

      Comments on the previous round of revisions: The authors addressed my concerns well.

    1. eLife Assessment

      This article presents valuable findings on how the timing of cooling affects the timing of autumn bud set in European beech saplings. The study leverages extensive experimental data and provides an interesting conceptual framework of the various ways in which warming can affect bud set timing. The support for the findings is incomplete, though extra justifications of the experimental settings, clarifications of the interpretation of the results, and alternative statistical analyses can make the conclusions more robust.

    2. Reviewer #1 (Public review):

      Summary:

      This study provided key experimental evidence for the "Solstice-as-Phenology-Switch Hypothesis" through two temperature manipulation experiments.

      Strengths:

      The research is data-rich, particularly in exploring the effects of pre- and post-solstice cooling, as well as daytime versus nighttime cooling, on bud set timing, showcasing significant innovation. The article is well-written, logically clear, and is likely to attract a wide readership.

      Weaknesses:

      However, there are several issues that need to be addressed.

      (1) In Experiment 1, significant differences were observed in the impact of cooling in July versus August. July cooling induced a delay in bud set dates that was 3.5 times greater in late-leafing trees compared to early-leafing ones, while August cooling induced comparable advances in bud set timing in both early- and late-leafing trees. The study did not explain why the timing (July vs. August) resulted in different mechanisms. Can a link be established between phenology and photosynthetic product accumulation? Additionally, can the study differentiate between the direct warming effect and the developmental effect, and quantify their relative contributions?

      (2) The two experimental setups differed in photoperiod: one used a 13-hour photoperiod at approximately 4,300 lux, while the other used an ambient day length of 16 hours with a light intensity of around 6,900 lux. What criteria were used to select these conditions, and do they accurately represent real-world scenarios? Furthermore, as shown in Figure S1, significant differences in soil moisture content existed between treatments - could this have influenced the conclusions?

      (3) The authors investigated how changes in air temperature around the summer solstice affected primary growth cessation, but the summer solstice also marks an important transition in photoperiod. How can the influence of photoperiod be distinguished from the temperature effect in this context?

      (4) The study utilized potted trees in a controlled environment, which limits the generalization of the results to natural forests. Wild trees are subject to additional variables, such as competition and precipitation. Moreover, climate differences between years (2022 vs. 2023) were not controlled. As such, the conclusions may be overgeneralized to "all temperate tree species", as the experiment only involved potted European beech seedlings. The discussion would benefit from addressing species-specific differences.

    3. Reviewer #2 (Public review):

      In 'Developmental constraints mediate the summer solstice reversal of climate effects on European beech bud set', Rebindaine and co-authors report on two experiments on Fagus sylvatica where they manipulated temperatures of saplings between day and night and at different times of year. I enjoyed reading this paper and found it well written. I think the experiments are interesting, but I found the exact methods somewhat extreme compared to how the authors present them. Further, given that much of the experiment happened outside, I am not sure how much we can generalize from one year for each experiment, especially when conducted on one population of one species. I next expand briefly on these concerns and a few others.

      Concerns:

      (1) As I read the Results, I was surprised the authors did not give more information on the methods here. For example, they refer to the 'effect of July cooling' but never say what the cooling was. Once I read the methods, I feared they were burying this as the methods feel quite extreme given the framing of the paper. The paper is framed as explaining observational results of natural systems, but the treatments are not natural for any system in Europe that I have worked in. For example, a low of 2 {degree sign}C at night and 7 {degree sign}C during the day through the end of May and then 7/13 {degree sign}C in July is extreme. I think these methods need to be clearly laid out for the reader so they can judge what to make of the experiment before they see the results.

      (2) I also think the control is confounded with the growth chamber experience in Experiment 1. That is, the control plants never experience any time in a chamber, but all the treatments include significant time in a chamber. The authors mention how detrimental chamber time can be to saplings (indeed, they mention an aphid problem in experiment 2), so I think they need to be more upfront about this. The study is still very valuable, but again, we may need to be more cautious in how much we infer from the results.

      (3) I suggest the authors add a figure to explain their experiments, as they are very hard to follow. Perhaps this could be added to Figure 1?

      (4) Given how much the authors extrapolate to carbon and forests, I would have liked to see some metrics related to carbon assimilation, versus just information on timing.

      (5) Fagus sylvatica is an extremely important tree to European forests, but it also has outlier responses to photoperiod and other cues (and leafs out very late), so using just this species to then state 'our results likely are generalisable across temperate tree species' seems questionable at best.

      (6) Another concern relates to measuring the end of season (EOS). It is well known that different parts of plants shut down at different times, and each metric of end of season - budset, end of radial expansion, leaf coloring, etc - relates to different things. Thus, I was surprised that the authors ignore all this complexity and seem to equate leaf coloring with budset (which can happen MONTHS before leaf coloring often) and with other metrics. The paper needs a much better connection to the physiology of end of season and a better explanation for the focus on budset. Relatedly, I was surprised that the authors cite almost none of the literature on budset, which generally suggests it is heavily controlled by photoperiod and population-level differences in photoperiod cues, meaning results may be different with a different population of plants.

      (7) I didn't fully see how the authors' results support the Solstice as Switch hypothesis, since what timing mattered seemed to depend on the timing of treatment and was not clearly related to the solstice. Could it be that these results suggest the Solstice as Switch hypothesis is actually not well supported (e.g., line 135) and instead suggest that the pattern of climate in the summer months affects end-of-season timing?

    4. Author Response:

      We would like to thank the reviewers and editors for your consideration of our manuscript, your kind comments about the value of our study, and for providing constructive feedback. We intend to submit a revised version of the manuscript and address the concerns and recommendations. This will include improvements to the statistical analyses, text content, and text format.ย 

      Specifically, we will:

      1. Revise the text to better explain the experimental methods, interpretation of results and how our findings are situated in the literature. Although we still believe that there is sufficient evidence to suggest that temperate tree species other thanย Fagus sylvaticaย may show similar patterns, we understand the reviewers concerns regarding these statements and will revise them.

      2. Add a supplemetal analysis of leaf chlorophyll content data to use leaf discolouration as an alternative marker of the end of the growing season. On this we would like to make two important points. Firstly, we agree with the reviewers that bud set often occurs before leaf discolouration. In experiment 1, bud set occurred on average on day-of-year (DOY) 262, onset of leaf senescence (last day when leaf chlorophyll content fell below 90% of its measured maximum) occurred on average at the same time โ€“ DOY 261, and mid-senescence (50% leaf discolouration) occurred on DOY 320. We do not agree that this excludes the combined discussion of bud set and leaf senescence timing. Whilst environmental drivers can affect parts of plants differently, often responses from different end-of-season indicators (e.g. bud set and leaf discolouration) are similar, even if only directionally. Secondly, shifts in bud set timing will remain the key focus of the manuscript as we believe it has greater physiological relevence to plant development, whereas leaf discolouration may simply follow bud set as a symptom of the completion of growth (reduced sink activity).

      3. Address points raised about potential additional drivers of our observed phenological shifts. For example, photoperiod effects and the Sosltice-as-Phenology-Switch hypothesis are not mutually exclusive, the annual progression of photoperiod is fundamental to how we suggest the switch is regulated (please see L66-68 in the original manuscript). The reviewers also comment on the significant differences in soil water content between the treatment groups in Fig. S1. However, all pots were watered sufficiently to avoid water deficit and all efforts were made to minimise differences in water availabiltiy. A provisional analysis shows only one treatment pair (6 - Late_July_Extreme vs. 7 - Early_August_Moderate) had significantly different soil water content, a pair whose differences are not discussed.

    1. eLife Assessment

      This landmark study describes the structure of the human RAD51 filament with a recombination intermediate called the displacement loop (D-loop). Using cryogenic structural, biochemical, and single-molecule analyses, the authors provide compelling evidence on how the RAD51 filament promotes strand exchange between single-stranded and double-stranded DNAs. The findings are highly relevant to the fields of homologous recombination, DNA repair, and genome stability.

    2. Reviewer #1 (Public review):

      Summary:

      The paper describes the cryoEM structure of RAD51 filament on the recombination intermediate. In the RAD51 filament, the insertion of a DNA-binding loop called the L2 loop stabilizes the separation of the complementary strand for the base-pairing with an incoming ssDNA and the non-complementary strand, which is captured by the second DNA-binding channel called the site II. The molecular structure of the RAD51 filament with a recombination intermediate provides a new insight into the mechanism of homology search and strand exchange between ssDNA and dsDNA.

      Strong points:

      This is the first human RAD51 filament structure with a recombination intermediate called the D-loop. The work has been done with great care, and the results shown in the paper are compelling based on cryo-EM and biochemical analyses. The paper is really nice and important for researchers in the field of homologous recombination, which gives a new view on the molecular mechanism of RAD51-mediated homology search and strand exchange.

      Comments on revisions:

      The authors nicely address most of the previous points.

    3. Reviewer #2 (Public review):

      Homologous recombination is essential for DNA double-strand break repair, with RAD51-catalyzed strand exchange at its core. This study presents a 2.64 ร… resolution cryogenic electron microscopy structure of the RAD51 D-loop complex, achieved through reconstitution of a RAD51 mini-filament. The structure uncovers how specific RAD51 residues drive strand exchange, offering atomic-level insight into the mechanics of eukaryotic HR and DNA repair.

      Comments on revisions:

      Authors acknowledged:

      "We acknowledge that there exists an extensive body of literature that has investigated the polarity of strand exchange by RecA and RAD51 under a variety of experimental conditions, and we have added a brief comment to the text to reflect this, as well as some of the key citations. Undoubtedly, and as we also mention in our reply to the public reviews, further experimental work will be needed for a full reconciliation of the available evidence."

      In the revised manuscript, this is reflected in the statement:

      "Our mechanistic interpretation of static D-loop structures awaits full reconciliation with earlier efforts to determine strand-exchange polarity for RecA and RAD51 measured under a variety of experimental conditions."

      Among the four cited studies, my understanding (as a person who has never studied this subject of polarity) is as follows:<br /> โ€ขReferences 50 (EMBO J. 1997), 51 (Cell. 1995), and 52 (Nature. 2008) suggest that the strand exchange by human RAD51 occurs with a polarity opposite to that of RecA-that is, in the 5โ€ฒโ†’3โ€ฒ direction relative to the complementary strand, or 3โ€ฒโ†’5โ€ฒ relative to the initiating single-stranded DNA (isDNA).<br /> โ€ข In contrast, reference 49 (PNAS 1998) proposed that 5โ€ฒโ†’3โ€ฒ polarity (relative to isDNA) is conserved across RecA, human RAD51, and yeast RAD51.

      Given the substantial structural analysis provided in the current manuscript, it would strengthen the work to include a concise description of these earlier biochemical findings, rather than citing them without context. This would benefit readers who are not familiar with the longstanding studies in the field and allow for a more informed interpretation of how the structural observations may reconcile or contrast with previous work.

    4. Reviewer #3 (Public review):

      Summary:

      Built on their previous pioneer expertise in studying RAD51 biology, in this paper, the authors aim to capture and investigate the structural mechanism of human RAD51 filament bound with a displacement loop (D-loop), which occurs during the dynamic synaptic state of the homologous recombination (HR) strand-exchange step. As the structures of both pre- and post-synaptic RAD51 filaments were previously determined, a complex structure of RAD51 filament during strand exchange is one of the key missing pieces of information for a complete understanding of how RAD51 functions in HR pathway. This paper aims to determine the high-resolution cryo-EM structure of RAD51 filament bound with D-loop. Combined with mutagenesis analysis and biophysical assays, the authors aim to investigate the D-loop DNA structure, RAD51 mediated strand separation and polarity, and a working model of RAD51 during HR strand invasion in comparison with RecA.

      Strengths:

      (1) The structural work and associated biophysical assays in this paper are solid, elegantly designed and interpreted. ย These results provide novel insights into RAD51's function in HR.

      (2) The DNA substrate used was well designed, taking into consideration of the nucleotide number requirement of RAD51 for stable capture of donor DNA. This DNA substrate choice lays the foundation for successfully determining the structure of the RAD51 filament on D-loop DNA using single-partial cryo-EM.

      (3) The authors utilised their previous expertise in capping DNA ends using monometric streptavidin and combined their careful data collection and processing to determine the cryo-EM structure of full-length human RAD51 bound at D-loop in high resolution. This interesting structure forms the core part of this work and allows detailed mapping of DNA-DNA and DNA-protein interaction among RAD51, invading strands, and donor DNA arms (Figures 1, 2, 3, 4). The geometric analysis of D-loop DNA bound with RAD51 and EM density for homologous DNA pairing are also impressive (Figure S5). The previously disordered RAD51's L2-loop is now ordered and traceable in the density map and functions as a physical spacer when bound with D-loop DNA. Interestingly, the authors identified that the side chain position of F279 in the L2_loop of RAD51_H differs from other F279 residues in L2-loops of E, F and G protomers. This asymmetric binding of L2 loops and RAD51_NTD binding with donor DNA arms forms the basis of the proposed working model about the polarity on csDNA during RAD51-mediated strand exchange.

      (4) This work also includes mutagenesis analysis and biophysical experiments, especially EMSA, single-molecule fluorescence imaging using an optical tweezer, and DNA strand exchange assay, which are all suitable methods to study the key residues of RAD51 for strand exchange and D-loop formation (Figure 5).

      Weaknesses:

      (1) The proposed model for the 3'-5' polarity of RAD51-mediated strand invasion is based on the structural observations in the cryo-EM structure. This study lacks follow-up biochemical/biophysical experiments to validate the proposed model compared to RecA or developing methods to capture structures of any intermediate states with different polarity models.

      (2) The functional impact of key mutants designed based on structure has not been tested in cells to evaluate how these mutants impact the HR pathway.

      The significance of the work for the DNA repair field and beyond:

      Homologous recombination (HR) is a key pathway for repairing DNA double-strand breaks and involves multiple steps. RAD51 forms nucleoprotein filaments first with 3' overhang single-strand DNA (ssDNA), followed by a search and exchange with a homology strand. This function serves as the basis of an accurate template-based DNA repair during HR. This research addressed a long-standing challenge of capturing RAD51 bound with the dynamic synaptic DNA and provided the first structural insight into how RAD51 performs this function. The significance of this work extends beyond the discovery biology for the DNA repair field, into its medical relevance. RAD51 is a potential drug target for inhibiting DNA repair in cancer cells to overcome drug resistance. This work offers a structural understanding of RAD51's function with D-loop and provides new strategies for targeting RAD51 to improve cancer therapies.

    5. Author response:

      The following is the authorsโ€™ response to the original reviews.

      Reviewer #1 (Public review): ย 

      Summary:ย 

      The paper describes the cryoEM structure of RAD51 filament on the recombination intermediate. In the RAD51 filament, the insertion of a DNA-binding loop called the L2 loop stabilizes the separation of the complementary strand for the base-pairing with an incoming ssDNA and the non-complementary strand, which is captured by the second DNA-binding channel called the site II. The molecular structure of the RAD51 filament with a recombination intermediate provides a new insight into the mechanism of homology search and strand exchange between ssDNA and dsDNA.ย 

      Strengths:ย 

      This is the first human RAD51 filament structure with a recombination intermediate called the D-loop. The work has been done with great care, and the results shown in the paper are compelling based on cryo-EM and biochemical analyses. The paper is really nice and important for researchers in the field of homologous recombination, which gives a new view on the molecular mechanism of RAD51-mediated homology search and strand exchange.ย 

      Weaknesses:ย 

      The authors need more careful text writing. Without page and line numbers, it is hard to give comments.ย 

      We would like to thank the reviewer for their kind words of appreciation of our work.

      Reviewer #2 (Public review): ย 

      Summary:ย 

      Homologous recombination (HR) is a critical pathway for repairing double-strand DNA breaks and ensuring genomic stability. At the core of HR is the RAD51-mediated strand-exchange process, in which the RAD51-ssDNA filament binds to homologous double-stranded DNA (dsDNA) to form a characteristic D-loop structure. While decades of biochemical, genetic, and single-molecule studies have elucidated many aspects of this mechanism, the atomic-level details of the strand-exchange process remained unresolved due to a lack of atomic-resolution structure of RAD51 D-loop complex.ย 

      In this study, the authors achieved this by reconstituting a RAD51 mini-filament, allowing them to solve the RAD51 D-loop complex at 2.64 ร… resolution using a single particle approach. The atomic resolution structure reveals how specific residues of RAD51 facilitate the strand exchange reaction. Ultimately, this work provides unprecedented structural insight into the eukaryotic HR process and deepens the understanding of RAD51 function at the atomic level, advancing the broader knowledge of DNA repair mechanisms.ย 

      Strengths:ย 

      The authors overcame the challenge of RAD51's helical symmetry by designing a minifilament system suitable for single-particle cryo-EM, enabling them to resolve the RAD51 D-loop structure at 2.64 ร… without imposed symmetry. This high resolution revealed precise roles of key residues, including F279 in Loop 2, which facilitates strand separation, and basic residues on site II that capture the displaced strand. Their findings were supported by mutagenesis, strand exchange assays, and single-molecule analysis, providing strong validation of the structural insights.ย 

      Weaknesses:ย 

      Despite the detailed structural data, some structure-based mutagenesis data interpretation lacks clarity. Additionally, the proposed 3โ€ฒ-to-5โ€ฒ polarity of strand exchange relies on assumptions from static structural features, such as stronger binding of the 5โ€ฒ-arm-which are not directly supported by other experiments. This makes the directional model compelling but contradicts several well-established biochemical studies that support a 5'-to-3' polarity relative to the complementary strand (e.g., Cell 1995, PMID: 7634335; JBC 1996, PMID: 8910403; Nature 2008, PMID: 18256600).ย 

      Overall:ย 

      The 2.6 ร… resolution cryoEM structure of the RAD51 D-loop complex provides remarkably detailed insights into the residues involved in D-loop formation. The high-quality cryoEM density enables precise placement of each nucleotide, which is essential for interpreting the molecular interactions between RAD51 and DNA. Particularly, the structural analysis highlights specific roles for key domains, such as the N-terminal domain (NTD), in engaging the donor DNA duplex.ย 

      This structural interpretation is further substantiated by single-molecule fluorescence experiments using the KK39,40AA NTD mutant. The data clearly show a significant reduction in D-loop formation by the mutant compared to wild-type, supporting the proposed functional role of the NTD observed in the cryoEM model.ย 

      However, the strand exchange activity interpretation presented in Figure 5B could benefit from a more rigorous experimental design. The current assay measures an increase in fluorescence intensity, which depends heavily on the formation of RAD51-ssDNA filaments. As shown in Figure S6A, several mutants exhibit reduced ability to form such filaments, which could confound the interpretation of strand exchange efficiency. To address this, the assay should either: (1) normalize for equivalent levels of RAD51-ssDNA filaments across samples, or (2) compare the initial rates of fluorescence increase (i.e., the slope of the reaction curve), rather than endpoint fluorescence, to better isolate the strand exchange activity itself.ย 

      We agree with the reviewer that the reduced filament-forming ability of some of the RAD51 mutants complicates a straightforward interpretation of their strand-exchange assay. Interestingly, the RAD51 mutants that appear most impaired are the esDNA-capture mutants that do not contact the ssDNA in the structure of the pre-synaptic filament. However, the RAD51 NTD mutants, that display the most severe defect in strand-exchange, have a near-WT filament forming ability.

      Based on the structural features of the D-loop, the authors propose that strand pairing and exchange initiate at the 3'-end of the complementary strand in the donor DNA and proceed with a 3'-to-5' polarity. This conclusion, drawn from static structural observations, contrasts with several well-established biochemical studies that support a 5'-to-3' polarity relative to the complementary strand (e.g., Cell 1995, PMID: 7634335; JBC 1996, PMID: 8910403; Nature 2008, PMID: 18256600). While the structural model is compelling and methodologically robust, this discrepancy underscores the need for further experiments.ย 

      We would like to thank the reviewer for highlighting the importance of our findings to our understanding of the mechanism of homologous recombination.

      The reviewer correctly points out that the polarity of strand exchange by RecA and RAD51 is an extensively researched topic that has been characterised in several authoritative studies. In our paper, we simply describe the mechanistic insights obtained from the structural D-loop models of RAD51 (our work) and RecA (Yang et al, PMID: 33057191).The structures illustrate a very similar mechanism of Dloop formation that proceeds with opposite polarity of strand exchange for RAD51 and RecA. Comparison of the D-loop structures for RecA and RAD51 provides an attractive explanation for the opposite polarity, as caused by the different positions of their dsDNA-binding domains in the filament structure.ย 

      We agree with the reviewer that further investigation will be needed for an adequate rationalisation of the available evidence. We will mention the relevant literature in the revised version of the manuscript.

      Reviewer #3 (Public review): ย 

      Summary:ย 

      Built on their previous pioneer expertise in studying RAD51 biology, in this paper, the authors aim to capture and investigate the structural mechanism of human RAD51 filament bound with a displacement loop (D-loop), which occurs during the dynamic synaptic state of the homologous recombination (HR) strand-exchange step. As the structures of both pre- and post-synaptic RAD51 filaments were previously determined, a complex structure of RAD51 filaments during strand exchange is one of the key missing pieces of information for a complete understanding of how RAD51 functions in the HR pathway. This paper aims to determine the high-resolution cryo-EM structure of RAD51 filament bound with the D-loop. Combined with mutagenesis analysis and biophysical assays, the authors aim to investigate the D-loop DNA structure, RAD51-mediated strand separation and polarity, and a working model of RAD51 during HR strand invasion in comparison with RecA.ย 

      Strengths:ย 

      (1) The structural work and associated biophysical assays in this paper are solid, elegantly designed, and interpreted.ย  These results provide novel insights into RAD51's function in HR.ย 

      (2) The DNA substrate used was well designed, taking into consideration the nucleotide number requirement of RAD51 for stable capture of donor DNA. This DNA substrate choice lays the foundation for successfully determining the structure of the RAD51 filament on D-loop DNA using single-particle cryo-EM.ย 

      (3) The authors utilised their previous expertise in capping DNA ends using monomeric streptavidin and combined their careful data collection and processing to determine the cryo-EM structure of full-length human RAD51 bound at the D-loop in high resolution. This interesting structure forms the core part of this work and allows detailed mapping of DNA-DNA and DNA-protein interaction among RAD51, invading strands, and donor DNA arms (Figures 1, 2, 3, 4). The geometric analysis of D-loop DNA bound with RAD51 and EM density for homologous DNA pairing is also impressive (Figure S5). The previously disordered RAD51's L2-loop is now ordered and traceable in the density map and functions as a physical spacer when bound with D-loop DNA. Interestingly, the authors identified that the side chain position of F279 in the L2_loop of RAD51_H differs from other F279 residues in L2-loops of E, F, and G protomers. This asymmetric binding of L2 loops and RAD51_NTD binding with donor DNA arms forms the basis of the proposed working model about the polarity of csDNA during RAD51-mediated strand exchange.ย 

      (4) This work also includes mutagenesis analysis and biophysical experiments, especially EMSA, singlemolecule fluorescence imaging using an optical tweezer, and DNA strand exchange assay, which are all suitable methods to study the key residues of RAD51 for strand exchange and D-loop formation (Figure 5).ย 

      Weaknesses:ย 

      (1) The proposed model for the 3'-5' polarity of RAD51-mediated strand invasion is based on the structural observations in the cryo-EM structure. This study lacks follow-up biochemical/biophysical experiments to validate the proposed model compared to RecA or developing methods to capture structures of any intermediate states with different polarity models.ย 

      (2)ย The functional impact of key mutants designed based on structure has not been tested in cells to evaluate how these mutants impact the HR pathway.ย 

      The significance of the work for the DNA repair field and beyond:ย 

      Homologous recombination (HR) is a key pathway for repairing DNA double-strand breaks and involves multiple steps. RAD51 forms nucleoprotein filaments first with 3' overhang single-strand DNA (ssDNA), followed by a search and exchange with a homologous strand. This function serves as the basis of an accurate template-based DNA repair during HR. This research addressed a long-standing challenge of capturing RAD51 bound with the dynamic synaptic DNA and provided the first structural insight into how RAD51 performs this function. The significance of this work extends beyond the discovery of biology for the DNA repair field, into its medical relevance. RAD51 is a potential drug target for inhibiting DNA repair in cancer cells to overcome drug resistance. This work offers a structural understanding of RAD51's function with the D-loop and provides new strategies for targeting RAD51 to improve cancer therapies.ย 

      We thank the reviewer for their positive comments on the significance of our work. Concerning the proposed polarity of strand exchange based on our structural finding, please see our reply to the previous reviewer; we agree with the reviewer that further experimentation will be needed to to reach a settled view on this.

      Testing the functional effects of the RAD51 mutants on HR in cells was not an aim of the current work but we agree that it would be a very interesting experiment, which would likely provide further important insights into the mechanism of strand exchange at the core of the HR reaction.

      Reviewer #1 (Recommendations for the authors):

      Major points:

      (1)ย Structural analysis showed a critical role of F279 in the L2 loop. However, the biochemical study showed that the F279A substitution did not provide a strong defect in the in vitro strand exchange, as shown in Figure 5B. Moreover, a previous study by Matsuo et al. FEBS J, 2006; ref 43) showed human RAD51-F279A is proficient in the in vitro strand exchange. These suggest that human RAD51 F279 is not critical for the strand exchange. The authors need more discussions of the role of F279 or the L2 for the RAD51-mediated reactions in the Discussion.

      In the strand-exchange essay of Figure 5B, the F279A mutant shows the mildest phenotype, in agreement with the findings of Matsuo et al. Accordingly, in the text we describe the F279A mutant as having a โ€œmodest impactโ€ on strand-exchange.

      We have now added a brief comment to the relevant text, pointing out that the result of the strand exchange assay for F279A are in agreement with the previous findings by Matsuo et al., and adding the reference.

      (2) In some parts, the authors cited the newest references rather than the paper describing the original findings. For RAD51 paralogs, why are these three (refs 21,22, 23) selected here? For FIGNL1, why is only one (ref 24) chosen?

      The cited publications were chosen to acquaint the reader with the latest structural and mechanistic advances about the function of some of the most important and well-studied recombination mediator proteins. For completeness, we have now added a further reference for FIGNL1 - Ito, Masaru et al, Nat Comm, 2023 โ€“ in the Introduction, to provide the reader with an additional pointer to our current knowledge about the mechanism of FIGNL1 in Homologous Recombination.

      Minor points:

      (1) Page 3, line 1 in the second paragraph, the reaction of "HR": HR should be homology search and strand exchange. HR is used incorrectly throughout the text, please check them. Remove "strandexchange" from ATPases in line 2.

      We believe that HR is used correctly in this context, as we refer to the biochemical reactions of HR, which includes the search for homology and strand exchange.

      We have removed โ€œstrand-exchangeโ€ from ATPases in line 2, as requested by the reviewer.

      (2) Supplementary Figure 1B, C, "EMSA" experiment: Please indicate an experimental condition in the legend: how ssDNA and dsDNA were mixed with RAD51. In (B), this is not an actual EMSA result, but rather a native gel analysis of reaction products with the D-loop. In (C), was the binding of RAD51 to the pre-formed D-loop examined? Which is correct here? Moreover, why do the authors need streptavidin in this experiment? Please explain why this is necessary for the EMSA assay. Please show where is Cy3 or Cy5 labels on the DNAs should be shown in the schematic drawing.

      The conditions for the experiments of Supplementary figure 1B, C are reported in the Methods section.

      Panel B shows the mobility shifts of the ssDNA and dsDNA sequences in panel A, so it is appropriate to describe it as an EMSA.

      We did not examine the binding of RAD51 to a pre-formed D-loop.

      We used streptavidine in the experiment of Supplementary Figure 1C to show that streptavidine binding did not interfere with D-loop reconstitution.

      The position of the Cy3, Cy5 labels in the DNAs is reported in Table S1.

      (3) Figure S4B, page 6, line 6 from the top, 5'-arm and 3'-arm: please add them to the figure. And also, please explain what 5'-arm and 3'-arm are here in the text, as shown in lines 3-5 in the second paragraph of the same page.

      We thank the reviewer for spotting this slight incongruity. We have removed the reference to 5โ€™- and 3โ€™arms of the donor DNA in the initial description of the D-loop (first paragraph of the โ€œD-loop structureโ€ section, 6 lines from the top), as the nomenclature for the arms of the donor DNA is introduced more appropriately in the following paragraph. Thus, there is no need to re-label Figure S4B; we note that the 5โ€™- and 3โ€™-labels are added to the arms of the donor DNA in Figure S4D.

      (4) Page 7, line 4, and Figure 2E, "C24": C24 should be C26 here (Figure 2D shows that position 24 in esDNA is "T").

      We thank the reviewer for spotting this typo, that is now corrected in the revised version of Figure 2 and in the text.

      (5) Page 8, line 1, K284: It would be nice to show "K284" in Figure 3F.

      We have added the side chain of K284 to Figure 3F, as suggested by the reviewer.

      (6) Page 8, second paragraph, line 3 from the bottom, "5'-arm" should be "3'-arm" for the binding of RAD51A NTD to ds DNA (Figure 4D).

      We thank the reviewer for spotting this typo, that is now corrected in the revised version of the text.

      Reviewer #2 (Recommendations for the authors):

      I understand that the strand exchange polarity of RAD51 should be opposite to that of RecA. But in the RecA manuscript (Nature 2020), it states (in the extended figure 1) " Because the mini-filament consists of fused RecA protomers, it does not reflect the effects a preferential polarity of RecA polymerization might have on the directionality of strand exchange. Also, our strand exchange reactions do not include the single-stranded DNA binding protein SSB that is involved in strand exchange in vivo and may sequester released DNA strands."

      We are aware that the findings by Yang et al, 2020 were obtained with a multi-protomeric RecA chimera and that their construct might not therefore recapitulate a potential effect of RecA polymerisation on the directionality of strand-exchange.ย 

      Comparison of the RecA and RAD51 D-loop structures shows that RecA and RAD51 adopt the same asymmetric mechanism of D-loop formation, which begins at one arm of the donor DNA and proceeds with donor unwinding and strand invasion until the second arm is captured, completing D-loop formation. However, the cryoEM structures provide compelling evidence that, after engagement with the donor DNA, RecA and RAD51 proceed to unwind the donor with opposite polarity; the structures provide a clear rationale for this, because of the different position of their dsDNA-binding domains relative to the ATPase domain.

      We acknowledge that there exists an extensive body of literature that has investigated the polarity of strand exchange by RecA and RAD51 under a variety of experimental conditions, and we have added a brief comment to the text to reflect this, as well as some of the key citations. Undoubtedly, and as we also mention in our reply to the public reviews, further experimental work will be needed for a full reconciliation of the available evidence.

      Reviewer #3 (Recommendations for the authors):

      (1) I have a minor comment regarding the DNA shown in the structural figures in this work. The authors have used different colours to differentiate between isDNA, esDNA, and csDNA for easier interpretation. However, these colour codes are inconsistent across Figures 1, 2, 3, and 5. This inconsistency makes it difficult to interpret which strand is which, particularly for readers unfamiliar with D-loops and strand invasion. A consistent colour scheme for the DNA strands would enhance the quality of the structural figures.

      We appreciate the reviewerโ€™s comment about the colour scheme of the strands in the D-loop. We chose a unique colour scheme for each figure, to help the reader focus on the particular structural features that we wanted to highlight in the figure. So for instance, in figure 1D we chose to highlight the relationship (complementary vs identical) of the donor DNA strands with the the invading strand; in figure 2, the emphasis is on distinguishing the homologously paired dsDNA (pink) from the exchanged strand (magenta), as a consequence of L2 loop binding; etc.

      (2) I have another comment regarding the rationale behind naming the RAD51 protomers (A to H) within the structure, which could confuse general readers if not clearly explained. In this paper, the RAD51 protomer is RAD51_A when closest to the 3' end of the isDNA. I assume the authors chose this order because HR generates a 3' ssDNA overhang before strand invasion. It would be beneficial for the introduction and results sections to mention this property of the 3' ssDNA overhang and the reasoning behind this naming strategy. This explanation will help readers understand how it differs from other naming orders used in RecA/RAD51 with ssDNA, where protomer A is closer to the 5' ssDNA.

      We thank the reviewer for their insightful comment. We chose to name as chain A the RAD51 protomer nearest to the 3โ€™-end of the isDNA to be consistent with the naming scheme that we use for all our published RAD51 filament structures.

      (3) I have highlighted some text within this paper that has contradicting parts for authors to clarify and correct:

      "Overall, the structural features of the RAD51 D-loop provide a strong indication that strand pairing and exchange begins at the 3'-end of the complementary strand in the donor DNA and progresses with 3'-to5' polarity (Fig. 5F)"

      "The observed 5'-to-3' polarity of strand-exchange by RAD51 is opposite to the 3'-to-5' polarity of bacterial RecA (Fig. S8), that was determined based on cryoEM structures of RecA D-loops".

      We thank the reviewer for alerting us to this inconsistency that has now been corrected in the revised manuscript.

      (4) Figure S8 last model: NTD should be CTD in the title; Figure 2B: resolution scale bar needs A unit. We thank the reviewer for spotting this typo that has now been corrected in the revised version of figure S8.ย 

      We couldnโ€™t find a missing resolution scale bar in Figure 2B; however, we have added a missing resolution bar with A unit to Fig. S3B.

    1. eLife Assessment

      This paper examines selection on induced epigenetic variation ("Lamarckian evolution") in response to herbivory in Arabidopsis thaliana. The authors find weak evidence for such adaptation, which contrasts with a recently published study that reported extensive heritable variation induced by the environment. The authors convincingly demonstrate that the findings of the previous study were confounded by mix-ups of genetically distinct material, so that standing genetic variation was mistaken for acquired (epigenetic) variation. Given the controversy surrounding the influence of heritable epigenetic variation on phenotypic variation and adaptation, this study is an important, clarifying contribution; it serves as a timely reminder that sequence-based verification of genetic material should be prioritized when either genetic identity or divergence is of importance to the conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      The authors extended a previous study of selective response to herbivory in Arabidopsis, in order to look specifically for selection on induced epigenetic variation ("Lamarckian evolution"). They found no evidence. In addition, the re-examined result from a previously published study arguing that environmentally induced epigenetic variation was common, and found that these findings were almost certainly artifactual.

      Strengths:

      The paper is very clearly written, there is no hype, and the methods used are state-of-the-art.

      Weaknesses:

      The result is negative, so the best you can do is put an upper bound on any effects.

      Significance:

      Claims about epigenetic inheritance and Lamarckian evolution continue to be made based on very shaky evidence. Convincing negative results are therefore important. In addition, the study presents results that, to this reviewer, suggest that the 2024 paper by Lin et al. [26] should probably be retracted.

    3. Reviewer #2 (Public review):

      In this paper, the authors examine the extent to which epigenetic variation acquired during a selection treatment (as opposed to standing epigenetic variation) can contribute to adaptation in Arabidopsis. They find weak evidence for such adaptation and few differences in DNA methylation between experimental groups, which contrasts with another recent study (reference 26) that reported extensive heritable variation in response to the environment. The authors convincingly demonstrate that the conclusions of the previous study were caused by experimental error, so that standing genetic variation was mistaken for acquired (epigenetic) variation. Given the controversy surrounding the possible role of epigenetic variation in mediating phenotypic variation and adaptation, this is an important, clarifying contribution.

      [Editors' note: We thank the authors for responding to the reviewers' comments.]

    4. Author Response:

      The following is the authorsโ€™ response to the original reviews

      Reviewer #1(Public Review):

      Summary:

      The authors extended a previous study of selective response to herbivory in Arabidopsis, in order to look specifically for selection on induced epigenetic variation ("Lamarckian evolution"). They found no evidence. In addition, they re-examined result from a previously published study arguing that environmentally induced epigenetic variation was common, and found that these findings were almost certainly artifactual.

      Strengths:

      The paper is very clearly written, there is no hype, and the methods used are state-of-the-art.

      Weaknesses:

      The result is negative, so the best you can do is put an upper bound on any effects.

      Significance:

      Claims about epigenetic inheritance and Lamarckian evolution continue to be made based on very shaky evidence. Convincing negative results are therefore important. In addition, the study presents results that, to this reviewer, suggest that the 2024 paper by Lin et al. [26] should probably be retracted.

      Reviewer #2(Public Review):

      In this paper, the authors examine the extent to which epigenetic variation acquired during a selection treatment (as opposed to standing epigenetic variation) can contribute to adaptation in Arabidopsis. They find weak evidence for such adaptation and few differences in DNA methylation between experimental groups, which contrasts with another recent study (reference 26) that reported extensive heritable variation in response to the environment. The authors convincingly demonstrate that the conclusions of the previous study were caused by experimental error, so that standing genetic variation was mistaken for acquired (epigenetic) variation. Given the controversy surrounding the possible role of epigenetic variation in mediating phenotypic variation and adaptation, this is an important, clarifying contribution.

      I have a few specific comments about the analysis of DNA methylation:

      (1) The authors group their methylation analysis by sequence context (CG, CHG, CHH). I feel this is insufficient, because CG methylation can appear in two distinct forms: gene body methylation (gbM), which is CG-only methylation within genes, and transposable element (TE) and TE-like methylation (teM), which typically involves all sequence contexts and generally affects TEs, but can also be found within genes. GbM and teM have distinct epigenetic dynamics, and it is hard to know how methylation patterns are changing during the experiment if gbM and teM are mixed. This can also have downstream consequences (see point below).

      We thank Reviewer 2 for this suggestion. We usually separate the three contexts because they are set by different enzymes and not because of the general process or specific function. It would indeed be informative to groupย DMCsย into gbM and teM, but as there are many regions with overlaps between genes and transposons, this also adds some complexity. Given that there were very few DMCs, we wanted to keep it simple. Therefore, we wrote that 87.3% of the DMCs were close to or within genes and that 98.1% were close to and within genes or transposons. Together with the clear overrepresentation of the CG context, this indicates that most of the DMCs were related to gbM. We updated the paragraph and specifically referred to gbM to make this point clearer.

      (2) For GO analysis, the authors use all annotated genes as a control. However, most of the methylation differences they observe are likely gbM, and gbM genes are not representative of all genes. The authors' results might therefore be explained purely as a consequence of analyzing gbM genes, and not an enrichment of methylation changes in any particular GO group.

      We are grateful to Reviewer #2 for this suggestion.ย We updated the GO analysis and defined the background as genes with cytosines that we tested for differences in methylation and which also exhibited overall at least 10% methylation (i.e., one cytosine per gene was sufficient). This resulted in a decrease of the background gene set from 34'615 to 18'315 genes. We still detect enrichment of terms related to epigenetic regulation, transport and growth processes. We have updated the corresponding paragraph accordingly.

      Reviewer #1 (Recommendations for The Authors):

      This paper is very clearly written and could be published as-is. The writing could be improved in a few places, for example:

      "We realized that in this recent study (26), potential errors may have confounded treatments with genetic variation. This is because in that study, Lin and colleagues kept lineages 1-to-1 throughout the experiment by single-seed descent."

      โ€œThisโ€ in the second sentence seems to refer to the confounding, not your realization thereof.

      I am sure there are more: just give the manuscript a good read-through.

      We thank the Reviewer for pointing out that some sentences may not be clear. We have edited the manuscript and focused on avoiding misleading or unclear wording.

      Reviewer #2 (Recommendations for The Authors):

      (1) The authors should distinguish gbM from teM and repeat the GO term analysis with an appropriate set of control genes.

      See our response to the public reviews above.

      (2) The authors' experimental design should allow them to directly assess whether the rates of epigenetic change are affected by the selective environment. This would require comparison of methylation patterns of individual plants prior to treatment with their progeny (the progeny is what the authors have currently analyzed). This would entail gathering new data, and I don't feel that this analysis is essential, but given the question the authors are addressing (the extent to which a selective environment can induce heritable epigenetic variation), it seems important to test whether the rates of epigenetic change are at all affected by the selection treatment.

      While this is a very valuable recommendation, we can currently not address it because the person who gathered the data works at a different university now. However, we keep this in mind for future projects.

      Again, we would like to thank the reviewers for the constructive suggestions that help us to improve the manuscript.

    1. eLife Assessment

      This useful study presents a real-time transcriptomics analysis, with the aim of providing rapid access to sequenced data to reduce the costs associated with Oxford Nanopore long-read technology. The revised manuscript demonstrates the utilities with four sets of experiments with convincing evidence.

    2. Reviewer #2 (Public review):

      Summary:

      Transcriptomics technologies play crucial roles in biological research. Technologies based on second-generation sequencing, such as Illumina RNA-seq, encounter significant challenges due to the short reads, particularly in isoform analysis. In contrast, third-generation sequencing technologies overcome the limitation by providing long reads, but they are much more expensive. The authors present a useful real-time strategy to minimize the cost of RNA sequencing with Oxford Nanopore Technologies (ONT). The revised manuscript demonstrates the utilities with four sets of experiments with convincing evidence: (1) comparation between two cell lines; (2) comparison of RNA preparation procedures; (3) comparation between heat-shock and control conditions; (4) comparison of genetic modified yeast strains. The strategy will probably guide biologists to conduct transcriptomics studies with ONT in a fast and cost-effective way, benefiting both fundamental research and clinical applications.

      Strengths:

      The authors have recently developed a computational tool called NanopoReaTA to perform real-time analysis when cDNA/RNA samples are sequencing with ONT (Wierczeiko et al., 2023). The advantage of real-time analysis is that sequencing can be terminated once sufficient data has been collected to save cost. In this study, the authors demonstrate how to perform comprehensive quality control during sequencing. Their results indicate that the real-time strategy is effective across different species and RNA preparation methods. The revised manuscript addresses most of the major and minor limitations identified in the previous version, including: (1) explicitly detailing the methodology for isoform analysis and presenting the corresponding results; (2) increasing sample sizes and providing a clear explanation of related considerations; (3) clarifying the issue of sequential analysis; and (4) incorporating a new heat-shock experiment that better reflects real-world biological research.

      Weaknesses:

      A key advantage of RNA sequencing using ONT is its ability to facilitate isoform analysis. The primary strength of real-time analysis lies in its potential to reduce costs for researchers while enabling significant biological discoveries related to isoforms. Although the authors explicitly describe their approach to isoform analysis and introduce a new experiment in the revised manuscript, the study still lacks a concrete example that clearly demonstrates the substantial impact of their tool and strategy. While such an example may be beyond the intended scope of the current work, its absence limits a better assessment of the significance of the findings. Because the evaluation of a methodological approach ultimately depends on the additional scientific value it provides in research. It is possible that the full potential of this tool will be demonstrated in future studies by the authors or other researchers.

      Furthermore, while the tool integrates a set of state-of-the-art methods, it does not introduce any novel methods. Consequently, the strength of evidence can be raised to "convincing".

    3. Author Response:

      The following is the authorsโ€™ response to the original reviews.

      Reviewer #1 (Public Review):

      In this study, the authors developed three case studies:

      (1) transcriptome profiling of two human cell cultures (HEK293 and HeLa)

      (2) identification of experimentally enriched transcripts in cell culture (RiboMinus and RiboPlus treatments)

      (3) identification of experimentally manipulated genes in yeast strains (gene knockouts or strains transformed with plasmids containing the deleted gene for overexpression). Sequencing was performed using the Oxford Nanopore Technologies (ONT), the only technology that allows for real-time analysis. The real-time transcriptomic analysis was performed using NanopoReaTA, a recent toolbox for comparative transcriptional analyses of Nanopore-seq data, developed by the group (Wierczeiko and Pastore et al. 2023). The authors aimed to show the use of the tool developed by them in data generated by ONT, evidencing the versatility of the tool and the possibility of cost reduction since the sequencing by ONT can be stopped at any time since enough data were collected.

      Strengths:ย 

      Given that Oxford Nanopore Technologies offers real-time sequencing, it is extremely useful to develop tools that allow real-time data analysis in parallel with data generation. The authors demonstrated that this strategy is possible for both human cell lines and yeasts in the case studies presented. It is a useful strategy for the scientific community, and it has the potential to be integrated into clinical applications for rapid and cost-effective quality checks in specific experiments such as overexpression of genes.

      Weaknesses:

      In relation to the RNA-Seq analyses, for a proper statistical analysis, a greater number of replicates should have been performed. The experiments were conducted with a minimal number of replicates (2 replicates for case study 1 and 2 and 3 replicates for case study 3).

      We have addressed this issue by performing two new sets of experiments: similar HEK293 vs HeLa with 10 replicates per condition and heatshocked vs non-heat shock with 6 replicates per condition. In the case of HEK293 vs HeLa comparison, we kept the 2 replicates per condition comparison to demonstrate the effect of limited replication number, simulating an early-stage evaluation of the experimental approach to obtain valuable quality control metrics. Nevertheless, we show that relevant and reproducible data can be obtained even with a lower replication number (2 replicates per condition), compared to a higher replication number (10 replicates), across both PromethION and MinION sequencing platforms.

      Regarding the experimental part, some problems were observed in the conversion to doublestranded and loading for Nanopore-Seq, which were detailed in Supplementary Material 2. This fact is probably reflected in the results where a reduction in the overall sequencing throughput and detected gene number for HEK293 compared to HeLa were observed (data presented in Supplementary Figure 2). It is necessary to use similar quantities of RNA/cDNA since the sequencing occurs in real-time. The authors should have standardized the experimental conditions to proceed with the sequencing and perform the analyses.

      We completely agree with the reviewer. In the 10-replicate HEK vs HeLa experiment, we collected similar data to what was presented in Supplementary Material 2. We chose to include this information to highlight the experimental variability that can arise during Nanopore-seq library preparation, particularly with cDNA synthesis. This type of information is not often highlighted in Nanoporebased studies, yet it is crucial to be aware of such differences. Despite these variations, we identified a consistent set of DEGs across comparisons of low versus high replicate numbers. Importantly, NanopoReaTA successfully provided realtime monitoring (e.g. detected number of genes per replicate/condition) as it allows for informed decision-making regarding the next steps in sequencing-based experiments.

      Reviewer #2 (Public Review):

      Transcriptomics technologies play important roles in biological studies. Technologies based on second-generation sequencing, such as mRNA-seq, face some serious obstacles, including isoform analysis, due to short read length. Third-generation sequencing technologies perfectly solve these problems by having long reads, but they are much more expensive. The authors presented a useful real-time strategy to minimize the cost of sequencing with Oxford Nanopore Technologies (ONT). The authors performed three sets of experiments to illustrate the utility of the real-time strategy. However, due to the problems in experimental design and analysis, their aims are not completely achieved. If the authors can significantly improve the experiments and analysis, the strategy they proposed will guide biologists to conduct transcriptomics studies with ONT in a fast and cost-effective way and help studies in both basic research and clinical applications.

      Strengths:

      The authors have recently developed a computational tool called NanopoReaTA to perform real-time analysis when cDNA/RNA samples are sequenced with ONT (Wierczeiko et al., 2023). The advantage of real-time analysis is that the sequencing can be stopped once enough data is collected to save cost. Here, they described three sets of experiments: a comparison between two human cell lines, a comparison among RNA preparation procedures, and a comparison between genetically modified yeasts. Their results show that the real-time strategy works for different species and different RNA preparation methods.

      Weaknesses:

      However, especially considering that the computational tool NanopoReaTA is their previous work, the authors should present more helpful guidelines to perform real-time ONT analysis and more advanced analysis methods. There are four major weaknesses:

      (1) For all three sets of experiments, the authors focused on sample clustering and gene-level differential expression analysis (DEA), and only did little analysis on isoform level and even nothing in any figures in the main text. Sample clustering and gene-level DEA can be easily and well done using mRNA-seq at a much cheaper cost. Even for initial data quality checking, mRNA-seq can be first done in Illumina MiSeq/NextSeq which is quick, before deep sequencing in HiSeq/NovaSeq. The real power of third-generation RNA sequencing is the isoform analysis due to the long read length. At least for now, PacBio Iso-seq is very expensive and one cannot analyze the data in real-time. Thus, the authors should focus on the real-time isoform analysis of ONT to show the advantages.

      We are aware that isoform analysis is one of the powers of real-time monitoring of long-read data, especially with Nanopore-seq. That is why we have included pipelines such as DRIM-seq and DEX-seq, which could provide valuable information about the differential transcript usage (i.e. isoforms). However, interpreting the results in a biologically meaningful context, particularly regarding the role of specific isoforms, remains challenging. This is especially relevant as our main goal is to demonstrate NanopoReaTA's utility as a real-time transcriptomic tool that offers valuable quality control and meaningful insights. Nevertheless, in the heat-shock experiments, we have identified one isoform that was differentially expressed and included it in the main figure. We hope that with the right experimental setup, users could use the incorporated tools for meaningful analyses for isoforms identification.

      (2)ย The sample sizes are too small in all three sets of experiments: only two for sets 1 and 2, and three for set 3. For DEA, three is the minimal number for proper statistics. But a sample size of three always leads to very poor power. Nowadays, a proper transcriptomics study usually has a larger sample size. Besides the power issue, biological samples always contain many outliers due to many reasons. It is crucial to show whether the real-time analysis also works for larger sample sizes, such as 10, i.e., 20 samples in total. Will the performance still hold when the sample number is increasing? What is the maximum sample number for an ONT run? If the samples need to be split into multiple runs, how the real-time analysis will be adjusted? These questions are quite useful for researchers who plan to use ONT.

      We thank the reviewer for their suggestion. We performed the suggested experiment in the HEK293 vs HeLa, taking 10 replicates per condition and acquired the data during the sequencing. As you can see in the results (Figure 2), the performance held very well, from the first hour up until the 24hour mark. In theory, the maximum number of barcodes that can be integrated in a sequencing run can be used for the pair-wise comparison. We are using 24 barcoding kit (provided by ONT) therefore we can include up to 12 replicates per condition. We are aware that there is a 96 barcoding kit that could be used as well. However, it is important to note that with more samples integrated in the sequencing run, less reads will be generated per sample. Therefore, it is important to plan properly the number of replicates used per sequencing run.

      (3)ย According to the manuscript, real-time analysis checks the sequencing data in a few time points, this is usually called sequential analysis or interim analysis in statistics which is usually performed in clinical trials to save cost. Care must be taken while performing these analyses, as repeated checks on the data can inflate the type I error rate. Thus, the authors should develop a sequential analysis procedure for real-time RNA sequencing.

      We would like to respond to this comment by addressing two points: 1) Quality control: During the analysis we offer two main statistics, which enable scientists to assess the experimental development. For each iteration the change in relative gene counts per sample is computed to assess the convergence towards 0. Moreover, for each iteration the number of detected genes per sample is computed to assess whether the number of detected reads is saturated. These metrics allow the user to independently assess whether samples within the experimental development reach a stable state, to reveal a meaningful timepoint of data evaluation.ย 

      Sequential analysis: One solution to lower the type 1 error during sequential analysis is using the Pocock boundary, a systematic lowering of the p-value threshold depending on the number of interim analyses. We offer in NanopoReaTA a custom choice of the p-value threshold during the analysis. This allows researchers to set their parameters as needed.ย ย 

      (4) The experimental set 1 (comparison between two completely different human cell lines) and experimental set 2 (comparison among RNA preparation procedures) are not quite biologically meaningful. If it is possible, it is better for the authors to perform an experiment more similar to a real situation for biological discovery. Then the manuscript can attract more researchers to follow its guidelines.

      We took the suggestion of reviewer 2 (from recommendation for authors) to perform heat-shock experimental comparison between heatshocked and non-heat shocked cells from the same cell line (HEK293). We sequenced the sample (6 replicates per condition) and one-hour postsequencing initiation, we already identified three DEGs (including HSPA1A, DNAJB1, and HSP90AA1) known to be upregulated in heat shock conditions (Yonezawa and Bono 2023, Sanchez-Briรฑas et al. 2023). Therefore, we illustrate how NanopoReaTA can capture biologically relevant insights in real time.

      Reviewer #1 (Recommendations for The Authors):

      (1) The comparison between two different human cell lines doesn't have much biological relevance. It would be more interesting and useful to evaluate the genes and transcripts expressed from the same cell in different conditions.

      As mentioned previously, we conducted a heat-shock experimental comparison between heat-shocked and non-heat-shocked within the same cell line HEK293. We observed reliable results already within one hour of initiating the sequencing.

      (2) Increase the number of replicates to give greater confidence in the results.

      We have addressed the replicate issue by performing two new sets of experiments: HEK293 vs HeLa with 10 replicates per condition and heatshocked vs non-heat shock with 6 replicates per condition. In both cases, we obtained reliable and reproducible results (even when comparing with lower replicate number).

      (3) One of the advantages of performing Nanopore sequencing is the possibility of sequencing RNA molecules directly. It would be interesting to test the real-time analysis strategy in parallel using direct RNA sequencing if it is possible.

      That is a great point. In theory, it would be possible to perform realtime differential gene expression on direct RNA data (since the pipeline for such analysis is already integrated in NanopoReaTA), however the limiting factor is the lack of multiplexing. To perform real-time transcriptomic analysis with direct RNA-seq data, one would need to sequence at least 4 flow cells (MinION or PromethION), each containing one sample (2 flow cells per condition to perform pairwise transcriptomic analyses). Despite the possibility of such an analysis, this scenario will not be cost-effective as this will increase significantly the costs for the amount of data gathered. We are aware that ONT is planning to release a multiplexing option to direct RNA-seq in the unforeseen future. We have integrated the option of direct RNA-seq analyses for the day that such option will be available, and the users will be able to perform real-time transcriptomic analysis with dRNA-seq data.ย ย 

      Some minor weakneses are below:

      (4)ย With respect to the text as a whole, the authors should be more careful with standardization, such as mL/ml and uL/ul, Ribominus/RiboMinus.

      We have standardized the nomenclature to ยตL, mL and Ribominus (due to trademark). ย 

      (5) Set up paragraphs on page 9 and throughout the text when necessary.

      We have set the suggested paragraphs on page 9 and throughout the text.

      (6)ย Please, check the word form in the sentence: "To isolate the RNA form the

      RiboMinus{trade mark, serif} supernatant.."

      The word has been corrected.

      (7) In order to make clear to the reader at the outset, I suggest including in the methodology how many biological replicates were performed for each cell type studied (cell lines and yeast strains).

      _For cell line w_e have included now the number of replicates used for each replicate. We have included this also for yeast setups.ย 

      (8) Please, check the Supplementary Tables as the word VERDADEIRO has not been translated (TRUE) in Supplementary Table 1.

      This issue appears to be influenced by the language settings configured on the viewer's computer.

      (9) On page 17, I suggest including the absorbance used to measure RNA concentration in HEK293 and HeLa cell lines. Also, I suggest including how the quality of the RNA extracted from the cell cultures and yeast strains was determined. Was the ratio 260/280 and 260/230 calculated? Given that the material was extracted with Trizol, which has phenol and chloroform in its composition, it would be important to evaluate the quality of the RNA, especially by calculating the 260/230 ratio.

      We have included a statement regarding the concentrations and quality of RNA in the โ€œRNA isolationโ€ section within the material and methods.

      (10) On page 18, the topic of Selective purification of ribosomal-depleted (RiboMinus) and ribosomal-enriched (RiboPlus) transcripts needs to be better detailed, especially in the last two sentences. For example: "The pooled bead samples (containing the rRNA) were further processed with Trizol RNA isolation to complete the purification." This sentence should be detailed to make it clear that this procedure is what you call ribosomal-enriched (RiboPlus).

      Qualitative analysis of the material was performed after rRNA depletion and enrichment.

      We have made these sentences clearer.

      (9) On the topic of Direct cDNA-native barcoding Nanopore library preparation and sequencing, in the following sentences: "Concentration determination (1 ฮผl) and adapter ligation using 5 ฮผL NA, 10 ฮผL NEBNext Quick Ligation Reaction Buffer (5X), and 5 ฮผL Quick T4 DNA Ligase (NEB, cat # E6056) were performed. Pooled library purification with 0.7X AMPure XP Beads resulted in a final elution volume of 33 ฮผl EB. Concentration of the pooled barcoded library was determined using Qubit (1 ฮผl)."

      Two concentration determinations were performed, before and after adapter ligation. I suggest writing one sentence for concentration determination and another for adapter ligation.

      We applied the reviewerโ€™s suggestion.ย 

      (11) In the section Experimental Design in Results, the first sentences are part of the methodology and are described in materials and methods. I suggest removing it from the results and rewriting the text. Results of the RNA extraction methodology and library preparation were shown in supplementary material. Thus, the authors could mention that the results were presented in supplementary material.

      We have revised this section to remove the details of RNA extraction and library preparation, focusing instead on the pipeline and experimental setups. The methodology is outlined in Figure 1, as well as in the materials and methods and the supplementary figures for each experimental setup.

      Reviewer #2 (Recommendations For The Authors):

      For major weakness 4 described in the Public Review, the authors could try experiments like:

      (1) comparison between females and males of tissues or primary cells; or

      (2) comparison between cell lines before and after heat shock.

      They are easy to perform and much more similar to real experimental designs for discovery, and the authors may actually have some new findings because usually people do not do much investigation on the isoform level using mRNA-seq.

      We thank the reviewer for their suggestions. We performed the heat-shock experimental comparison between heat-shocked and non-heat shocked cells from the same cell line (HEK293). We sequenced the sample (6 replicates per condition) and already one-hour post-sequencing initiation, we identified three DEGs including HSPA1A, DNAJB1, and HSP90AA1 reported to be upregulated heat shock conditions (Yonezawa and Bono 2023, Sanchez-Briรฑas et al. 2023). We have identified one differentially expressed isoform and included it in the main figure.

      There are two minor weaknesses:

      (1) Many figure numbers in the main text are wrong, including:

      Page 4, "similarity plot and principal component analysis (PCA) (Figure 1B, 1C)";

      Page 7, "same intervals as mentioned earlier (Figure 1A)", and "Next, we inspected the PCA and dissimilarity plots (Figure 2B";

      Page 10, "process (Supplementary Figure 19A) until the 24-hour PSI mark point (Figure 9B", and "NEW1 was the sole differentially expressed gene (Figure 9D)".

      The authors should be more careful about this. It is very confusing for readers.

      We have addressed these points in the text.ย 

      (2) The texts in the figures are too small to recognize, especially in Figures 4 and 5. The reason is that there are too many sub-figures in one figure. Is that really necessary to put more than 20 sub-figures in one? The authors should better summarize their results. For example, remove sub-figures with little information; do not show figures with the same styles again and again in the main text and just summarize them instead.

      We thank the reviewer for the suggestion. We have updated the figure to focus on the most relevant comparisons (new1ฮ”-pEV vs. WT-pEV and rkr1ฮ”-pEV vs. WT-pEV), providing a clearer and more realistic comparison between mutant and wild-type conditions in the main figure. Additionally, a summary and all related comparisons are included in Supplementary Documents S4 and S5. We believe these supplementary figures are essential to demonstrate NanopoReaTA's capabilities as a quality control tool, effectively detecting expected transcriptomic alterations in real-time.

    1. eLife Assessment

      This study uses all-optical electrophysiology methods to provide a valuable insight into the organization of cortical networks and their ability to balance the activity of groups of neurons with similar functional tuning. The all-optical approach used in this study is impressive and the claim that the effects of optical stimulation correspond to a specific homeostatic mechanism is solid. The work will be of interest to neurobiologists and to developers of optical approaches for interrogating brain function.

    2. Reviewer #1 (Public review):

      Summary:

      Kang et al. provide the first experimental insights from holographic stimulation of auditory cortex. Using stimulation of functionally-defined ensembles, they test whether overactivation of a specific subpopulation biases simultaneous and subsequent sensory-evoked network activations.

      Strengths:

      The investigators use a novel technique to investigate the sensory response properties in functionally defined cell assemblies in auditory cortex. These data provide the first evidence of how acutely perturbing specific frequency-tuned neurons impacts the tuning across a broader population. Their revised manuscript appropriately tempers any claims about specific plasticity mechanisms involved.

      Weaknesses:

      Although the single cell analyses in this manuscript are comprehensive, questions about how holographic stimulation impacts population coding are left to future manuscripts, or perhaps re-analyses of this unique dataset.

    3. Reviewer #2 (Public review):

      The goal of HiJee Kang et al. in this study is to explore the interaction between assemblies of neurons with similar pure-tone selectivity in mouse auditory cortex. Using holographic optogenetic stimulation in a small subset of target cells selective for a given pure tone (PTsel), while optically monitoring calcium activity in surrounding non-target cells, they discovered a subtle rebalancing process: co-tuned neurons that are not optogenetically stimulated tend to reduce their activity. The cortical network reacts as if an increased response to PTsel in some tuned assemblies is immediately offset by a reduction in activity in the rest of the PTsel-tuned assemblies, leaving the overall response to PTsel unchanged. The authors show that this rebalancing process affects only the responses of neurons to PTsel, not to other pure tones. They also show that assemblies of neurons that are not selective for PTsel don't participate in the rebalancing process. They conclude that assemblies of neurons with similar pure-tone selectivity must interact in some way to organize this rebalancing process, and they suggest that mechanisms based on homeostatic signaling may play a role.

      The authors have successfully controlled for potential artefacts resulting from their optogenetic stimulation. This study is therefore pioneering in the field of the auditory cortex (AC), as it is the first to use single-cell optogenetic stimulation to explore the functional organization of AC circuits in vivo. The conclusions of this paper are very interesting. They raise new questions about the mechanisms that could underlie such a rebalancing process.

      (1) This study uses an all-optical approach to excite a restricted group of neurons chosen for their functional characteristics (their frequency tuning), and simultaneously record from the entire network observable in the FOV. As stated by the authors, this approach is applied for the first time to the auditory cortex, which is a tour de force. However, such approach is complex and requires precise controls to be convincing. The authors provide important controls to demonstrate the precise ability of their optogenetic methods. In particular, holographic patterns used to excite 5 cells simultaneously may be associated with out-of-focus laser hot spots. Cells located outside of the FOV could be activated, therefore engaging other cells than the targeted ones in the stimulation. This would be problematic in this study as their tuning may be unrelated to the tuning of the targeted cells. To control for such effect, the authors have decoupled the imaging and the excitation planes, and checked for the absence of out-of-focus unwanted excitation (Suppl Fig1).

      (2) In the auditory cortex, assemblies of cells with similar pure-tone selectivity are linked together not only by their ability to respond to the same sound, but also by other factors. This study clearly shows that such assemblies are structured in a way that maintains a stable global response through a rebalancing process. If a group of cells within an assembly increases its response, the rest of the assembly must be inhibited to maintain the total response.<br /> One surprising result is the clear boundary between assemblies: a rebalancing process occurring in one assembly does not affect the response in another assembly comprising cells tuned to a different frequency. However, this is slightly challenged by the data shown in Figure 3.

      Figure 3B-left, for example, shows that, compared to controls, non-target 16 kHz-preferring neurons only decrease their response to a 16 kHz pure tone when the cells targeted by the opto stimulation also prefer 16 kHz, but not when the targeted cells prefer 54 kHz. However, the inverse is not entirely true. Again compared to controls, Figure 3B (right) shows that non-target 54 kHz-preferring neurons decrease their response to a 54 kHz pure tone when the targeted cells also prefer 54 kHz; however, they also tend to be inhibited when the targeted cells prefer 16 kHz.

      The authors suggest this may be due to the partial activation of 54 kHz-preferring cells by 16 kHz tones and propose examining the response of highly selective neurons. The results are shown in Figure 3F. It would have been more logical to show the same results as in Figure 3B, but with the left part restricted to highly 16 kHz-selective cells and the right part to highly 54 kHz-selective cells. However, the authors chose to pool all responses to 16 kHz and 54 kHz tones in every triplet of conditions (control, opto stimulation on 16 kHz-preferring cells and opto stimulation on 54 kHz-preferring cells), which blurs the result of the analysis.

    4. Author response:

      The following is the authorsโ€™ response to the original reviews

      We would like to thank you and the reviewers for valuable feedback on the first version of the manuscript. We now addressed all of the issues raised by reviewers, mostly by implementing the suggested changes and clarifying important details in the revised version of the manuscript. A detailed response to each comment is provided in the rebuttal letter. Briefly, the main changes were as follow:

      - We changed homeostatic balance to network balance especially when describing the main finding as the response changes induced by the stimulation occurred on a fast timescale. We speculate the sustained changes observed in the post-stimulation condition are the result of homeostatic mechanisms.

      - We added additional verification on the target stimulation effect by adding a supplementary result showing its effect between the target and off-target z-planes, as well as demonstrating the minimal impact of the imaging laser to rsChRmine.

      - We added a simple toy model illustrating suppression specifically applied to co-tuned cells that yields the response amplitude decrease, to further support our findings.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Kang et al. provide the first experimental insights from holographic stimulation of auditory cortex. Using stimulation of functionally-defined ensembles, they test whether overactivation of a specific subpopulation biases simultaneous and subsequent sensory-evoked network activations.

      Strengths:

      The investigators use a novel technique to investigate the sensory response properties in functionally defined cell assemblies in auditory cortex. These data provide the first evidence of how acutely perturbing specific frequency-tuned neurons impacts the tuning across a broader population.

      Weaknesses:

      I have several main concerns about the interpretation of these data:<br /> (1) The premise of the paper suggests that sensory responses are noisy at the level of neurons, but that population activity is reliable and that different neurons may participate in sensory coding on different trials. However, no analysis related to single trial variance or overall stability of population coding is provided. Specifically, showing that population activity is stable across trials in terms of total activity level or in some latent low dimensional representation would be required to support the concept of "homeostatic balancing".

      Thank you for raising an important point. We agree that the term โ€˜homeostatic balancingโ€™ may be not the best term to be applied to explain the main results. We now have toned down on the homeostatic plasticity aspect to explain the main result. We have changed the term to a simple โ€˜network balanceโ€™, potentially due to various factors including rapid synaptic plasticity. We speculate the persistent activity of co-tuned cells in the post-stimulation session as a result of homeostatic balance, instead of rapidly changing back their responses to the baseline. Relevant changes are implemented throughout the manuscript including Introduction (e.g., lines 76-78) and Discussion sections (e.g., lines 453-456).

      (2) Rebalancing would predict either that the responses of stimulated neurons would remain A) elevated after stimulation due to a hebbian mechanism or B) suppressed due to high activity levels on previous trials, a homeostatic mechanism. The authors report suppression in targeted neurons after stimulation blocks, but this appears similar to all other non-stimulated neurons. How do the authors interpret the post-stimulation effect in stimulated neurons?

      It is true that the post stimulation effect of no response change both from co-tuned and non co-tuned neurons, and both from stimulation and control sessions. This could be due to neuronal activity being adapted and decreased enough from the consecutive presentation of acoustic stimuli themselves. However, we still think that if the stimulation driven co-tuned non stimulated neuronsโ€™ response decrease is highly driven by stimulation without homeostasis, at least their responses should bounce back during the post-stimulation. We agree that further investigation would be required to further confirm such effect. We elaborated this as another discussion point in the discussion section (lines 457-464).

      (3) The authors suggest that ACtx is different from visual cortex in that neurons with different tuning properties are intermingled. While that is true at the level of individual neurons, there is global order, as demonstrated by the authors own widefield imaging data and others at the single cell level (e.g. Tischbirek et al. 2019). Generally, distance is dismissed as a variable in the paper, but this is not convincing. Work across multiple sensory systems, including the authors own work, has demonstrated that cortical neuron connectivity is not random but varies as a function of distance (e.g. Watkins et al. 2014). Better justification is needed for the spatial pattern of neurons that were chosen for stimulation. Further, analyses that account for center of mass of stimulation, rather than just the distance from any stimulated neuron would be important to any negative result related to distance.

      Thank you for the further suggestion regarding the distance matter. While Watkins et al., 2014 and Levy and Reyes (2012) showed stronger connectivity for nearby cells as well as for more distant patches, on a functional level, Winkowski & Kanold 2013 showed high frequency heterogeneity especially in L2/3, where we targeted to image in this study. Thus, connected cells can have varied tuning consistent with spine imaging (Konnerth paper). We now also calculated the distance based on the center of mass of target cells to calculate the distance effect for an additional verification and still observed no distance related stimulation effect. We now replaced the Figure 4B with the result from the center of mass calculation.

      (4) Data curation and presentation: Broadly, the way the data were curated and plotted makes it difficult to determine how well-supported the authors claims are. In terms of curation, the removal of outliers 3 standard deviations above the mean in the analysis of stimulation effects is questionable. Given the single-cell stimulation data presented in Figure 1, the reader is led to believe that holographic stimulation is quite specific. However, the justification for removing these outliers is that there may be direct stimulation 20-30 um from the target. Without plotting and considering the outliers as well, it is difficult to understand if these outsized responses are due to strong synaptic connections with neighboring neurons or rather just direct off-target stimulation. Relatedly, data presentation is limited to the mean + SEM for almost all main effects and pre-post stimulation effects are only compared indirectly. Whether stimulation effects are driven by just a few neurons that are particularly suppressed or distinct populations which are suppressed or enhanced remains unclear.

      Thank you for pointing this out. Now we specifically removed neighboring cells that are < 20 um from the target point and we observed similar. We replaced all the relevant figures, texts, and statistical results to ensure that the exclusion was specific to overlapping neighboring cells.

      Reviewer #2 (Public review):

      The goal of HiJee Kang et al. in this study is to explore the interaction between assemblies of neurons with similar pure-tone selectivity in mouse auditory cortex. Using holographic optogenetic stimulation in a small subset of target cells selective for a given pure tone (PTsel), while optically monitoring calcium activity in surrounding non-target cells, they discovered a subtle rebalancing process: co-tuned neurons that are not optogenetically stimulated tend to reduce their activity. The cortical network reacts as if an increased response to PTsel in some tuned assemblies is immediately offset by a reduction in activity in the rest of the PTsel-tuned assemblies, leaving the overall response to PTsel unchanged. The authors show that this rebalancing process affects only the responses of neurons to PTsel, not to other pure tones. They also show that assemblies of neurons that are not selective for PTsel don't participate in the rebalancing process. They conclude that assemblies of neurons with similar pure-tone selectivity must interact in some way to organize this rebalancing process, and they suggest that mechanisms based on homeostatic signaling may play a role.

      he conclusions of this paper are very interesting but some aspects of the study including methods for optogenetic stimulation, statistical analysis of the results and interpretation of the underlying mechanisms need to be clarified and extended.

      (1) This study uses an all-optical approach to excite a restricted group of neurons chosen for their functional characteristics (their frequency tuning), and simultaneously record from the entire network observable in the FOV. As stated by the authors, this approach is applied for the first time to the auditory cortex, which is a tour de force. However, such an approach is complex and requires precise controls to be convincing. In the manuscript, several methodological aspects are not sufficiently described to allow a proper understanding.

      (i) The use of CRmine together with GCaMP8s has been reported as problematic as the 2Ph excitation of GCaMP8s also excites the opsin. Here, the authors use a red-shifted version of CRmine to prevent such cross excitation by the imaging laser. To be convincing, they should explain how they controlled for the absence of rsCRmine activation by the 940nm light. Showing the fluorescence traces immediately after the onset of the imaging session would ensure that neurons are not excited as they are imaged.

      Thank you for pointing this out. We realized that the important reference was omitted. Kishi et al. 2022 validated the efficacy of the rsChRmine compared to ChRmine. In this paper, they compared regular ChRmine and rsChRmine activity to different wavelengths and setting and showed the efficiency of rsChRmine with reduced optical cross talk. This reference is now included in the manuscript (line 98). We also checked the spontaneous baseline activity that lasted about 10 sec. before any of the sound presentation and observed a relatively stable activity throughout, rather than any imaging session onset related activation, which is also similar to what we see from another group of GCaMP6s transgenic animals.

      Author response image 1.

      Baseline fluorescence activity across cells within FOVs from AAV9-hSyn-GCaMP8s-T2A-rsChRmine injected mice (top) and CBA X Thy1-GCaMP6s F1 transgenic mice (bottom). Fluorescence levels and activity patterns remain similar, suggesting no evident imaging laser-induced activation from rsChRmine. Note that GCaMP8s examples are smoothed by using moving average of 4 points as GCaMP8s show faster activity.

      (ii) Holographic patterns used to excite 5 cells simultaneously may be associated with out-of-focus laser hot spots. Cells located outside of the FOV could be activated, therefore engaging other cells than the targeted ones in the stimulation. This would be problematic in this study as their tuning may be unrelated to the tuning of the targeted cells. To control for such an effect, one could in principle decouple the imaging and the excitation planes, and check for the absence of out-of-focus unwanted excitation.

      We further verified whether the laser power at the targeted z-plane influences cellsโ€™ activity at nearby z-planes. As the Reviewer pointed out, the previous x- and y-axis shifts were tested by single-cell stimulation. This time, we stimulated five cells simultaneously, to match the actual experiment setup and assess potential artifacts in other planes. We observed no stimulation-driven activity increase in cells at a z-planed shifted by 20 ยตm (Supplementary Figure 1). This confirms the holographic stimulation accurately manipulates the pre-selected target cells and the effects we observe is not likely due to out-of-focus stimulation artifacts. It is true that not all pre-selected cells showing significant response changes prior to the main experiment are effectively activated t every trial during the experiments. We varied the target cell distances across FOVs, from nearby cells to those farther apart within the FOV. We have not observed a significant relationship between the target cell distances and stimulation effect. Lastly, cells within < 20 ยตm of the target were excluded to prevent potential excitation due to the holographic stimulation power. Given the spontaneous movements of the FOV during imaging sessions due to animalโ€™s movement, despite our efforts to minimize them, we believe that any excitation from these neighboring neurons would be directly from the stimulation rather than the light pattern artifact itself.

      (iii) The control shown in Figure 1B is intended to demonstrate the precision of the optogenetic stimulation: when the stimulation spiral is played at a distance larger or equal to 20 ยตm from a cell, it does not activate it. However, in the rest of the study, the stimulation is applied with a holographic approach, targeting 5 cells simultaneously instead of just one. As the holographic pattern of light could produce out-of-focus hot spots (absent in the single cell control), we don't know what is the extent of the contamination from non-targeted cells in this case. This is important because it would determine an objective criterion to exclude non-targeted but excited cells (last paragraph of the Result section: "For the stimulation condition, we excluded non-target cells that were within 15 ยตm distance of the target cells...")

      Highly sensitive neurons to certain frequency also shows the greatest adaptation effect, which can be observed the control condition. Therefore, the high sensitive neurons showing greater amplitude change is first related to the neuronal adaptation to its sensitive information. However, by stimulating the co-tuned target neurons, other co-tuned non-target neurons shows significantly greater amplitude decrease, compared to either non co-tuned target neurons stimulation or control (the latter did not meet the significance level).

      We also tried putting more rigorous criterion as 20 um instead of 15 um as you pointed out since the spiral size was 20 um. The result yielded further significant response amplitude decrease due to the stimulation effect only from co-tuned non-target neurons for processing their preferred frequency information.

      (2) A strength of this study comes from the design of the experimental protocol used to compare the activity in non-target co-tuned cells when the optogenetic stimulation is paired with their preferred tone versus a non-preferred pure tone. The difficulty lies in the co-occurrence of the rebalancing process and the adaptation to repeated auditory stimuli, especially when these auditory stimuli correspond to a cell's preferred pure tones. To distinguish between the two effects, the authors use a comparison with a control condition similar to the optogenetic stimulation conditions, except that the laser power is kept at 0 mW. The observed effect is shown as an extra reduction of activity in the condition with the optogenetic paired with the preferred tone, compared to the control condition. The specificity of this extra reduction when stimulation is synchronized with the preferred tone, but not with a non-preferred tone, is a potentially powerful result, as it points to an underlying mechanism that links the assemblies of cells that share the same preferred pure tones.

      The evidence for this specificity is shown in Figure 3A and 3D. However, the universality of this specificity is challenged by the fact that it is observed for 16kHz preferring cells, but not so clearly for 54kHz preferring cells: these 54kHz preferring cells also significantly (p = 0.044) reduce their response to 54kHz in the optogenetic stimulation condition applied to 16kHz preferring target cells compared to the control condition. The proposed explanation for this is the presence of many cells with a broad frequency tuning, meaning that these cells could have been categorized as 54kHz preferring cells, while they also responded significantly to a 16kHz pure tone. To account for this, the authors divide each category of pure tone cells into three subgroups with low, medium and high frequency preferences. Following the previous reasoning, one would expect at least the "high" subgroups to show a strong and significant specificity for an additional reduction only if the optogenetic stimulation is targeted to a group of cells with the same preferred frequency. Figure 3D fails to show this. The extra reduction for the "high" subgroups is significant only when the condition of opto-stimulation synchronized with the preferred frequency is compared to the control condition, but not when it is compared to the condition of opto-stimulation synchronized with the non-preferred frequency.

      Therefore, the claim that "these results indicate that the effect of holographic optogenetic stimulation depends not on the specific tuning of cells, but on the co-tuning between stimulated and non-stimulated neurons" (end of paragraph "Optogenetic holographic stimulation decreases activity in non-target co-tuned ensembles") seems somewhat exaggerated. Perhaps increasing the number of sessions in the 54kHz target cell optogenetic stimulation condition (12 FOV) to the number of sessions in the 16kHz target cell optogenetic stimulation condition (18 FOV) could help to reach significance levels consistent with this claim.

      We previously also tested by randomly subselecting 12 FOVs from 16kHz stimulation condition to match the same number of FOV between two groups and did not really see any result difference. However, to further ensure the results, we now added three more dataset for 54 kHz target cell stimulation condition (now 15 FOV) which yielded similar outcome. We have now updated the statistical values from added datasets.

      (3) To interpret the results of this study, the authors suggest that mechanisms based on homeostatic signaling could be important to allow the rebalancing of the activity of assemblies of co-tuned neurons. In particular, the authors try to rule out the possibility that inhibition plays a central role. Both mechanisms could produce effects on short timescales, making them potential candidates. The authors quantify the spatial distribution of the balanced non-targeted cells and show that they are not localized in the vicinity of the targeted cells. They conclude that local inhibition is unlikely to be responsible for the observed effect. This argument raises some questions. The method used to quantify spatial distribution calculates the minimum distance of a non-target cell to any target cell. If local inhibition is activated by the closest target cell, one would expect the decrease in activity to be stronger for non-target cells with a small minimum distance and to fade away for larger minimum distances. This is not what the authors observe (Figure 4B), so they reject inhibition as a plausible explanation. However, their quantification doesn't exclude the possibility that non-target cells in the minimum distance range could also be close and connected to the other 4 target cells, thus masking any inhibitory effect mediated by the closest target cell. In addition, the authors should provide a quantitative estimate of the range of local inhibition in layers 2/3 of the mouse auditory cortex to compare with the range of distances examined in this study (< 300 ยตm). Finally, the possibility that some target cells could be inhibitory cells themselves is considered unlikely by the authors, given the proportions of excitatory and inhibitory neurons in the upper cortical layers. On the other hand, it should be acknowledged that inhibitory cells are more electrically compact, making them easier to be activated optogenetically with low laser power.

      Minimum distance is defined as the smallest distance non-target cell to any of the target cells. Thus, if this is local inhibition, it is likely that the closest target cell would have affected the non-target cellsโ€™ response changes. We also calculated the distance based on the center of mass of target cells to calculate the distance effect for an additional verification, based on both Reviewersโ€™ comments, and still observed no distance related stimulation effect. The result is now updated in Figure 4B.

      Based on previous literature, such as Levy & Reyes 2012, the excitatory and inhibitory connectivity is known to range around 100 um distance. Our results do not necessarily show any further effect observed for cells with distance below 100 um. This suggests that such effect is not limited to local inhibition. We also added further speculation on why our results are less likely due to increased inhibition, albeit the biological characteristics of inhibitory neurons to optogenetics.

      Reviewer #3 (Public review):

      Summary:

      The authors optogenetically stimulate 5 neurons all preferring the same pure tone frequency (16 or 54 kHz) in the mouse auditory cortex using a holography-based single cell resolution optogenetics during sound presentation. They demonstrate that the response boosting of target neurons leads to a broad suppression of surrounding neurons, which is significantly more pronounced in neurons that have the same pure tone tuning as the target neurons. This effect is immediate and spans several hundred micrometers. This suggests that the auditory cortical network balances its activity in response to excess spikes, a phenomenon already seen in visual cortex.

      Strengths:

      The study is based on a technologically very solid approach based on single-cell resolution two-photon optogenetics. The authors demonstrate the potency and resolution of this approach. The inhibitory effects observed upon targeted stimulation are clear and the relative specificity to co-tuned neurons is statistically clear although the effect size is moderate.

      Weaknesses:

      The evaluation of the results is brief and some aspects of the observed homeostatic are not quantified. For example, it is unclear whether stimulation produces a net increase or decrease of population activity, or if the homeostatic phenomenon fully balances activity. A comparison of population activity for all imaged neurons with and without stimulation would be instructive. The selectivity for co-tuned neurons is significant but weak. Although it is difficult to evaluate this issue, this result may be trivial, as co-tuned neurons fire more strongly. Therefore, the net activity decrease is expected to be larger, in particular, for the number of non-co-tuned neurons which actually do not fire to the target sound. The net effect for the latter neurons will be zero just because they do not respond. The authors do not make a very strong case for a specific inhibition model in comparison to a broad and non-specific inhibitory effect. Complementary modeling work would be needed to fully establish this point.

      Thank you for raising important points. We agree that the term homeostatic balancing may have been an overstatement. We toned down regarding the homeostatic plasticity and conclude the result from the rapid plasticity at a single trial level now. Regardless, the average activity level did not differ among stimulation conditions (control, 16kHz stim, and 54kHz stim), which seems to suggest that overall activity level has been maintained regardless of the stimulation. We added a new figure of the global activity change as Fig. 4A.

      We also added a simple model work in which a suppression term was applied either to all neurons or specifically to non-target co-tuned cells to test our results from the data.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) For the first holography paper in A1, more information is needed about how holographic stimulation was performed and how stimulation artifacts were avoided or removed from the data set, especially as the text states that the PMTs were left open for the duration of the experiment.

      We further clarified the rationale of leaving the shutter open to avoid any mechanic sounds to activate neurons in the AC. We further clarified that we keep the uncaging shutter open since the Bruker default setting (Software version: 5.7) opens and closes the shutter for the every iteration of the stimulation which generates extra heavy mechanical sounds which then hinders whether the activation is due to the sound or stimulation.

      (2) The choice of the dF/F as the primary tool for quantifying data should be better justified. Presumably, cells have very different variances in baseline activity levels and baseline fluorescence levels that create a highly skewed distribution of responses across the population. Further, a

      To take the baseline activity variances into account, we first calculate dF/F normalising to the baseline period (about 330 ms before the sound onset) right before each trial, per cell level. By doing so, we minimize any effect that could have been driven by variable baseline activity levels across neurons.

      (3) More analysis should be performed to determine why 33% of stimulated cells are not activated, and instead are suppressed during stimulation. Is this related to a cells baseline fluorescence?

      Great point. Although we tried our best to pre-select stimulation-responsive neurons before we start the actual experiments and head fix the animals as much as possible, these neurons do not stay as the โ€œbest stimulation-responsive neuronsโ€ throughout the entire imaging session. There can be various caveats on this. First, they seem to change their activity levels due to the optogenetic stimulation after they are exposed to acoustic stimulation. Second, since the AC is in the temporal side, it is likely to be more affected from the animalsโ€™ and their brain movements throughout the imaging session, which could be bigger than visual cortex or motor cortex. However, 33% of 5 cells is about 1.5 cells so it is usually missed about one cell on average, although some sessions have all 5 cells being stimulated while some other sessions have clearly less effective holographic stimulation effect.

      We even manually visualised the fluorescence change due to the holographic stimulation before we start any imaging sessions. Regardless, they donโ€™t stay as the โ€˜best stimulation responsive cellsโ€™ throughout which we cannot control the natural biological aspect of neuronal activities. Regardless, based on the significant stimulation effects observed by presenting different pure tone frequencies as well as delivering different target stimulation and no-stimulation control, we believe that the effect itself is valid. We added these caveats into the manuscript as a further discussion point and things to consider.

      (4) The linear mixed-effects model should include time as a variable as A) the authors hypothesize that responses should be reduced over time due to sensory adaptation and that B) stimulation induced suppression might be dynamic (though they find it is not).

      Since the stimulation effect seems to be independent from trial-by-trial changes among stimulation conditions (Fig. 4) and we now have toned down on the aspect of homeostasis, we kept the current mixed-effect model variables.

      (5) More speculation is needed on why stimulation suppresses responses from the first trial onwards.

      We further speculate such rapid response changes due to activity-dependent synaptic changes due to overall network energy shift from optogenetic stimulation to maintain the cortical circuit balance. ย 

      (6) What does each dot represent in Figure 4a vs. Figure 4B? They are very different in number.

      In 4A, each dot is average amplitude change values per each trial level. They are exactly same number of dots between frequency, cell groups and conditions as each dot represents each trial (20 each). The reason why it may look differ could be only due to some overlaps between frequencies.

      In 4B, each dot is each cell. The reason why itโ€™s denser in Stimulation conditionsโ€™ 16kHz preferring cells panel is that it naturally had more FOVs thus more cells to be plotted. We further clarified these details in the figure legend.

      (7) How sensory responsive neurons were selected should be shown in the figures. Specifically, which fraction of the 30% of most responsive neurons were stimulated should be stated. Depending on the exact yield in the field of view, all or only a minority of strongly sensory responsive neurons are being stimulated, which in either case would color the interpretation of the data.

      We tried varying the FOV as much as possible across sessions to ensure that FOVs are directly in the A1 covering a range of frequencies. If we cannot observe more than 80 neurons as sound responsive neurons from processed suite2p data, we searched for another FOV. ย 

      We now included an example FOV of the widefield imaging we first conducted to identify A1, and another example FOV of the 2-photon imaging where we conducted a short sound presentation session to identify the sensory responsive neurons, as an inset of the โ€˜Cell selectionโ€™ part in Figure 1.

      Reviewer #2 (Recommendations for the authors):

      Minor points:

      - p.4, last line: "of" probably missing "the processing the target..."

      Fixed.

      - p.5, top, end of the first paragraph of this page: Figure 3B and 3E don't show exemplar traces.

      Corrected as Figure 2A and 2D.

      - P.5, first sentence of the paragraph "Optogenetic holographic stimulation increases activity in targeted ensembles": reference to Figure 3A and 3D should rather be Figure 2A and 2D.

      Corrected.

      - P.9, 2nd paragraph: sentence with a strange syntax: "since their response amplitude..."

      Corrected.

      - Figure 2: panels C and F are missing.

      Corrected.

      - p.11, methods: "wasthen" should be "was then".

      Corrected.

      - p.12, analysis: it is not clearly explained why the sound evoked activity is computed based on the 160ms to 660ms after sound onset instead of 0ms to 660 ms. It is likely related to some potential contamination but it should be explicitly explained.

      Due to the relatively slow calcium transient to more correctly capture the sound related evoked responses. Added this detail.

      - Methods, analysis: the authors should better explain how they conducted the random permutation described in the Figures 1D, 2B and 2E. Which signals were permutated?

      Random permutation to shuffle the target cell ID.

      - References 55 and 56 don't explicitly state that excitatory neurons generally have stronger responses to sound than inhibitory neurons.

      Thank you for pointing out this error. We replaced those references with Maor et al. 2016 and Kerlin et al. 2010, showing excitatory neurons show more selective tuning, and also changed the wording more appropriately.

      - It is not explained whether the imaging sessions are performed on awake or anaesthetized animals. It is probably done on awake animals, but then it is not clear what procedure is used to get the animals used to the head restraint. It usually takes a few days for the mice to get used to it, and the stress level is often different at the beginning and end of an experiment. Given the experimental protocol used in the study, in which sessions are performed sequentially and compared to each other, this aspect could play a role. However, the main comparison made is probably safe as it compares a control condition (laser at 0mW) and conditions with optogenetic stimulation, all done with similar sequences of sessions.

      The experiment was conducted on awake animals. Although we did not have any control on comparing their status in the beginning and the end of the experiment, they all had a widefield imaging session imaging session to identify the A1 region which uses the same head-fixation setup, thus they are more used to the setup when we conduct 2-photon imaging and stimulation. Regardless of the session, if animals show any sign of extra discomfort due to the unfamiliar setup, we keep them there for 10-15 minutes until they are accustomed to the setup with no movement. If they still show a sign of discomfort, we take them out and try for another day. We now included this detail on the manuscript.

      Reviewer #3 (Recommendations for the authors):

      - Evaluate the global effect of stimulation on the population activity averaged across all neurons (activated and non-activated).

      Thank you for your suggestions. We now included a new Figure 3A that present the population activity across all responsive cells. The average activity level did not differ among stimulation conditions (control, 16kHz stim, and 54kHz stim).

      - Evaluate with a simple model if a population of neurons with different sound tuning receiving non-specific inhibition would not produce the observed effect.

      Thank you for the suggestion. We generated a simple model in which a suppression term was applied either to all neurons or specifically to non-target co-tuned cells to test our results from the data. We took a similar range of number of neurons and FOVs to closely simulate the model to the real dataset structure. On 50 simulated calcium traces of neurons (n),

      Trace<sub>n(t)</sub> = R<sub>n(t)</sub> โ€“ theta<sub>n</sub> + epsilon<sub>n(t)</sub>

      Where R<sub>n(t)</sub> is a response amplitude from either baseline or stimulation session, theta<sub>n</sub> is a suppression term applied either to all neurons or only to non-target co-tuned neurons, only during the stimulation session, and epsilon<sub>n(t)</sub> is additive noise. Theta was defined based on the average amount of increased activity amplitudes generated from target neurons due to the stimulation, implemented from the real dataset with extra neuron-level jitter. Similar to the real data analyses, we compared the response change between the stimulation and baseline sessionsโ€™ trace amplitudes. By comparing two different model outcomes and the real data, we observed a significant effect of the model type (F(2, 2535) = 34.943, p < 0.0001) and interaction between the model type and cell groups was observed (F(2, 2535) = 36.348, p < 0.0001). Applying suppression to only non-target co-tuned cells during the stimulation session yielded a significant response amplitude decrease for co-tuned cells compared to non co-tuned cells (F(1, 2535) = 45.62, p < 0.0001), which resembles the real data In contrast, applying suppression to all non-target cells led to similar amplitude changes in both co-tuned and non co-tuned neurons (F(1, 2535) = 0.87, p = 0.35), which was not observed in either the real data or the simulated data restricted to co-tuned cell suppression. Therefore, the model predicts correctly that the specific suppression given to only co-tuned neurons drove the real data outcome. All of this information is now added into Methods and Results sections and the figure is added as Figure 3C.

    1. eLife Assessment

      In this manuscript, Lim and collaborators present an important system for developing self-amplifying RNA with convincing evidence that it does not provoke a strong host inflammatory response in cultured cells. This approach could be further strengthened going forward by testing these self-amplying RNAs in an in vivo system.

    2. Reviewer #1 (Public review):

      Summary:

      The authors have developed self-amplifying RNAs (saRNAs) encoding additional genes to suppress dsRNA-related inflammatory responses and cytokine release. Their results demonstrate that saRNA constructs encoding anti-inflammatory genes effectively reduce cytotoxicity and cytokine production, enhancing the potential of saRNAs. This work is significant for advancing saRNA therapeutics by mitigating unintended immune activation.

      Strengths:

      This study successfully demonstrates the concept of enhancing saRNA applications by encoding immune-suppressive genes. A key challenge for saRNA-based therapeutics, particularly for non-vaccine applications, is the innate immune response triggered by dsRNA recognition. By leveraging viral protein properties to suppress immunity, the authors provide a novel strategy to overcome this limitation. The study presents a well-designed approach with potential implications for improving saRNA stability and minimizing inflammatory side effects.

      Comments on revisions:

      All comments have been thoroughly addressed, and the manuscript has been significantly improved.

    3. Reviewer #3 (Public review):

      Summary:

      Context - this is the 2nd review, of a manuscript that has already undergone some revisions.<br /> The manuscript explores ways to make self-amplifying RNA (saRNA) more silent through the inclusion of genes to inhibit the innate immune response. The readouts are predominantly expression and cell viability. They take a layered approach, adding multiple genes, as well as altering the capping of the anti-immune genes.

      Strengths:

      As described by the other reviewers, the authors take a stepwise approach to demonstrate that they can lead to sustained expression of the transgene.

      Weaknesses:

      The following weaknesses need some consideration

      (1) The data show sustained expression, but do not directly show amplification. The amount of RFP is constantly decreasing over the time course. There is some evidence for the srIฮบBฮฑ-Smad7-SOCS1 construct. But measuring the RNA itself would be beneficial<br /> (2) The end construct is very large - it has 12 genes, this may have manufacturing considerations, affecting the translatability.

    4. Author response:

      The following is the authorsโ€™ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors have developed self-amplifying RNAs (saRNAs) encoding additional genes to suppress dsRNA-related inflammatory responses and cytokine release. Their results demonstrate that saRNA constructs encoding anti-inflammatory genes effectively reduce cytotoxicity and cytokine production, enhancing the potential of saRNAs. This work is significant for advancing saRNA therapeutics by mitigating unintended immune activation.

      Strengths:

      This study successfully demonstrates the concept of enhancing saRNA applications by encoding immune-suppressive genes. A key challenge for saRNA-based therapeutics, particularly for non-vaccine applications, is the innate immune response triggered by dsRNA recognition. By leveraging viral protein properties to suppress immunity, the authors provide a novel strategy to overcome this limitation. The study presents a well-designed approach with potential implications for improving saRNA stability and minimizing inflammatory side effects.

      We thank Reviewer #1 for their thorough review and for recognizing both the significance of our work and the potential of our strategy to expand saRNA applications beyond vaccines.

      Weaknesses:

      (1) Impact on Cellular Translation:

      The authors demonstrate that modified saRNAs with additional components enhance transgene expression by inhibiting dsRNA-sensing pathways. However, it is unclear whether these modifications influence global cellular translation beyond the expression of GFP and mScarlet-3 (which are encoded by the saRNA itself). Conducting a polysome profiling analysis or a puromycin labeling assay would clarify whether the modified saRNAs alter overall translation efficiency. This additional data would strengthen the conclusions regarding the specificity of dsRNA-sensing inhibition.

      We thank the Reviewer for this insightful suggestion. We performed a puromycin labeling assay to assess global translation rates (Figure 3โ€”figure supplement 1c). This experiment revealed that the E3 construct significantly reduces global protein synthesis, despite driving high levels of saRNAencoded transgene expression (Figure 1d, e). In contrast, the E3-NSs-L* construct mitigated this reduction in global translation while maintaining moderate transgene expression. These findings support our hypothesis that E3 enhances transgene output in part by activating RNase L, which degrades host mRNAs and thereby reduces ribosomal competition. We appreciate the Reviewerโ€™s recommendation of this experiment, which has strengthened the manuscript.

      (2) Stability and Replication Efficiency of Long saRNA Constructs:

      The saRNA constructs used in this study exceed 16 kb, making them more fragile and challenging to handle. Assessing their mRNA integrity and quality would be crucial to ensure their robustness.

      Furthermore, the replicative capacity of the designed saRNAs should be confirmed. Since Figure 4 shows lower inflammatory cytokine production when encoding srIkBฮฑ and srIkBฮฑSmad7-SOCS1, it is important to determine whether this effect is due to reduced immune activation or impaired replication. Providing data on replication efficiency and expression levels of the encoded anti-inflammatory proteins would help rule out the possibility that reduced cytokine production is a consequence of lower replication.

      We thank the Reviewer for these valuable suggestions.

      To assess the integrity of the saRNA constructs, we performed denaturing gel electrophoresis (Supplemental Figure 6c). The native saRNA, E3, and E3-NSs-L* constructs each migrated as a single band. The moxBFP, srIฮบBฮฑ, and srIฮบBฮฑ-Smad7-SOCS1 constructs showed both a full-length transcript and a lower-abundance truncated band (Supplemental Figure 6d), suggestive of a cryptic terminator sequence introduced in a region common to these three constructs.

      To evaluate replicative capacity, we performed qPCR targeting EGFP, which is encoded by all constructs. This analysis revealed that the srIฮบBฮฑ-Smad7-SOCS1 construct exhibited lower replication efficiency than both native saRNA and E3. Several factors may contribute to this difference, including the longer transcript length, reduced molar input when equal mass was used for transfection, prevention of host mRNA degradation due to RNase L inhibition, or the presence of truncated transcripts.

      Given these confounding variables, we revised our approach to analyzing cytokine production. Rather than comparing all six constructs together, we split the analysis into two parts: (1) the effects of dsRNA-sensing pathway inhibition (Figure 4a), and (2) the effects of inflammatory signalling inhibition (Figure 4c). For the latter, we compared srIฮบBฮฑ and srIฮบBฮฑ-Smad7-SOCS1 to moxBFP, as these three constructs are more comparable in size, share the same truncated transcript, and all encode L* to inhibit RNase L. This strategy minimizes the likelihood that differences in the cytokine responses are due to variation in replication efficiency.

      (3) Comparative Data with Native saRNA:

      Including native saRNA controls in Figures 5-7 would allow for a clearer assessment of the impact of additional genes on cytokine production. This comparison would help distinguish the effect of the encoded suppressor proteins from other potential factors.

      We thank the Reviewer for this helpful suggestion. We have added the native saRNA condition to Figure 5 as a visual reference. However, due to the presence of truncated transcripts in the constructs designed to inhibit inflammatory signalling pathways, the actual amount of full-length saRNA delivered in these conditions is likely lower than expected, despite using equal total RNA mass for transfection. This complicates direct comparisons with constructs targeting dsRNAsensing pathways, which do not show transcript truncation. For this reason, native saRNA was included only as a visual reference and was not used in statistical comparisons with the inflammatory signalling inhibitor constructs.

      (4) In vivo Validation and Safety Considerations:

      Have the authors considered evaluating the in vivo potential of these saRNA constructs? Conducting animal studies would provide stronger evidence for their therapeutic applicability. If in vivo experiments have not been performed, discussing potential challenges - such as saRNA persistence, biodistribution, and possible secondary effectswould be valuable.

      (5) Immune Response to Viral Proteins:

      Since the inhibitors of dsRNA-sensing proteins (E3, NSs, and L*) are viral proteins, they would be expected to induce an immune response. Analyzing these effects in vivo would add insight into the applicability of this approach.

      We appreciate the Reviewerโ€™s points regarding in vivo validation and safety considerations. While in vivo studies are beyond the scope of the present investigation, we agree that evaluating therapeutic potential, biodistribution, persistence, and secondary effects will be essential for future translation. We have now included a brief discussion of these considerations at the end of the revised discussion. In ongoing work, we are planning follow-up studies incorporating in vivo imaging and functional assessments of saRNA-driven cargo delivery in preclinical models of inflammatory joint pain.

      Regarding the immune response to viral proteins, we agree that this is an important consideration and have now included a clearer discussion of this limitation in the revised manuscript. Specifically, we highlight that encoding multiple viral inhibitors (E3, NSs, and L*), in combination with the VEEV replicase, may increase the likelihood of adaptive immune recognition via MHC class I presentation. This could lead to cytotoxic T cellโ€“mediated clearance of saRNA-transfected cells, thereby limiting therapeutic durability. We emphasize that addressing both intrinsic cytotoxicity and immune-mediated clearance will be essential for advancing the clinical potential of this platform.

      (6) Streamlining the Discussion Section:

      The discussion is quite lengthy. To improve readability, some content - such as the rationale for gene selection-could be moved to the Results section. Additionally, the descriptions of Figure 3 should be consolidated into a single section under a broader heading for improved coherence.

      Thank you for these helpful suggestions. We have streamlined the Discussion to improve readability and have moved the rationale for gene selection to the results section, as recommended. In addition, we have consolidated the Figure 3 descriptions to improve coherence and to simplify the presentation.

      Reviewer #2 (Public review):

      Summary:

      Lim et al. have developed a self-amplifying RNA (saRNA) design that incorporates immunomodulatory viral proteins, and show that the novel design results in enhanced protein expression in vitro in mouse primary fibroblast-like synoviocytes. They test constructs including saRNA with the vaccinia virus E3 protein and another with E3, Toscana virus NS protein and Theiler's virus L protein (E3 + NS + L), and another with srIฮบBฮฑ-Smad7SOCS1. They have also tested whether ML336, an antiviral, enables control of transgene expression.

      Strengths:

      The experiments are generally well-designed and offer mechanistic insight into the RNAsensing pathways that confer enhanced saRNA expression. The experiments are carried out over a long timescale, which shows the enhance effect of the saRNA E3 design compared to the control. Furthermore, the inhibitors are shown to maintain the cell number, and reduce basal activation factor-โบ levels.

      We thank Reviewer #2 for their thoughtful and detailed assessment of our manuscript, and for recognizing the mechanistic insights provided by our study. We also appreciate their positive comments on the experimental design, the extended timescale, and the observed effects on transgene expression, cell viability, and basal fibroblast activation factor-ฮฑ levels.

      Weaknesses:

      One limitation of this manuscript is that the RNA is not well characterized; some of the constructs are quite long and the RNA integrity has not been analyzed. Furthermore, for constructs with multiple proteins, it's imperative to confirm the expression of each protein to confirm that any therapeutic effect is from the effector protein (e.g. E3, NS, L). The ML336 was only tested at one concentration; it is standard in the field to do a dose-response curve. These experiments were all done in vitro in mouse cells, thus limiting the conclusion we can make about mechanisms in a human system.

      Thank you for your detailed feedback. We have added new experiments and clarified limitations in the revised manuscript to address these concerns:

      RNA integrity: We performed denaturing gel electrophoresis on the in vitro transcribed saRNA constructs (Supplemental Figure 7c). Constructs targeting dsRNA-sensing pathways migrated as a single band, while those targeting inflammatory signalling pathways showed both a full-length product and a common, lower-abundance truncated transcript. This suggests that the actual amount of full-length RNA delivered for the constructs inhibiting inflammatory signalling was overestimated. To account for this, we avoided direct comparisons between the two types of constructs and instead focused on comparisons within each type to ensure more meaningful interpretation.

      Confirmation of protein expression: While we acknowledge that direct measurement of each protein would provide additional insight, we believe the functional assays presented offer strong evidence that the encoded proteins are expressed and exert their intended biological effects. Additionally, IRES functionality was confirmed visually using fluorescent protein reporters, supporting the successful expression of downstream genes.

      ML336 concentrationโ€“response: We have now performed a concentrationโ€“response analysis for ML336 (Figure 8a and b), which demonstrates its ability to modulate transgene expression in a concentration-dependent manner.

      Use of human cells: We agree that testing these constructs in human cells is essential for future translational applications and are actively exploring opportunities to evaluate them in patientderived FLS. However, previous studies have shown that Theilerโ€™s virus L* does not inhibit human RNase L (Sorgeloos et al., PLoS Pathog 2013). As a result, it is highly likely that the E3-NSs-L* construct will not function as intended in human systems. Addressing this limitation will be a priority in our future work, where we aim to develop constructs incorporating inhibitors specific to human RNase L to ensure efficacy in human cells.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Figure 2c is not indicated.

      Thank you for pointing out this error. It has now been corrected in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      (1) The Graphical Abstract is a bit confusing; suggest modifying it to represent the study and findings more accurately.

      We have revised the graphical abstract to improve clarity and better reflect the studyโ€™s design and main findings. Thank you for the suggestion.

      (2) The impact of this paper would be greatly improved if these experiments were repeated, at least partially, in human cells. The rationale for mouse cells in vitro is unclear.

      The rationale for developing constructs targeting mouse cells is based on our intention to utilize these constructs in mouse models of inflammatory joint pain in future studies.

      We recognize that incorporating data from human cells would significantly enhance the translational relevance of our work, and we are actively pursuing collaborations to test these constructs in patient-derived FLS. However, a key component of our saRNA constructsโ€”Theilerโ€™s virus L*โ€”has been shown to inhibit mouse, but not human, RNase L (Sorgeloos et al., PLoS Pathog 2013). Consequently, the E3-NSs-L* polyprotein may not function as intended in human cells. To address this limitation, future work will focus on developing constructs that incorporate inhibitors specific to human RNase L, thereby facilitating more effective translation of our findings to human systems.

      (3) The ML336 was only tested at one concentration and works mildly well, but would be more impactful if tested in a dose-response curve.

      We have now performed a concentrationโ€“response analysis for ML336 (Figure 8a and b), which demonstrates its concentration-dependent effects on transgene expression and saRNA elimination. Thank you for the suggestion.

      (4) Overall, there is not a cohesive narrative to the story, instead it comes off as we tried these three different approaches, and they worked in different contexts.

      We have revised the graphical abstract, results, and discussion to improve the cohesiveness of the manuscriptโ€™s narrative and to better integrate the mechanistic rationale linking the different approaches. We appreciate the feedback.

      (5) The title is not supported by the data; the saRNA is still somewhat cytotoxic, immunostimulatory and the antiviral minimally controls transgene expression; suggest making this reflect the data.

      We have revised the title to better reflect the scope of the data and the mechanistic focus of the study. The updated title emphasizes the pathways targeted and the outcomes demonstrated, while avoiding overstatement. Thank you for this helpful recommendation.

    1. eLife Assessment

      This important work introduces a splitGFP-based labeling tool with an analysis pipeline for the synaptic scaffold protein bruchpilot, with tests in the adult Drosophila mushroom bodies, a learning center in the Drosophila brain. The evidence supporting the conclusions is solid. However, additional controls, validation of synapse-specificity, validation of activity-dependence, details on image processing, and additional functional experiments are needed to strengthen the study.

    2. Reviewer #1 (Public review):

      Summary:

      The study by Wu et al. uses endogenous bruchpilot expression in a cell-type-specific manner to assess synaptic heterogeneity in adult Drosophila melanogaster mushroom body output neurons. The authors performed genomic on locus tagging of the presynaptic scaffold protein bruchpilot (BRP) with one part of splitGFP (GFP11) using the CRISPR/Cas9 methodology and co-expressed the other part of splitGFP (GFP1-10) using the GAL4/UAS system. Upon expression of both parts of splitGFP, fluorescent GFP is assembled at the N-terminus of BRP, exactly where BRP is endogenously expressed in active zones. For manageable analysis, a high-throughput pipeline was developed. This analysis evaluated parameters like location of BRP clusters, volume of clusters, and cluster intensity as a direct measure of the relative amount of BRP expression levels on site, using publicly available 3D analysis tools that are integrated in Fiji. Analysis was conducted for different mushroom body cell types in different mushroom body lobes using various specific GAL4 drivers. To test this new method of synapse assessment, Wu et al. performed an associative learning experiment in which an odor was paired with an aversive stimulus and found that, in a specific time frame after conditioning, the new analysis solidly revealed changes in BRP levels at specific synapses that are associated with aversive learning.

      Strengths:

      Expression of splitGFP bound to BRP enables intensity analysis of BRP expression levels as exactly one GFP molecule is expressed per BRP. This is a great tool for synapse assessment. This tool can be widely used for any synapse as long as driver lines are available to co-express the other part of splitGFP in a cell-type-specific manner. As neuropils and thus the BRP label can be extremely dense, the analysis pipeline developed here is very useful and important. The authors have chosen an exceptionally dense neuropil - the mushroom bodies - for their analysis and convincingly show that BRP assessment can be achieved with such densely packed active zones. The result that BRP levels change upon associative learning in an experiment with odor presentation paired with punishment is likewise convincing, and strongly suggests that the tool and pipeline developed here can be used in an in vivo context.

      Weaknesses:

      Although BRP is an important scaffold protein and its expression levels were associated with function and plasticity, I am still somewhat reluctant to accept that synapse structure profiling can be inferred from only assessing BRP expression levels and BRP cluster volume. Also, is it guaranteed that synaptic plasticity is not impaired by the large GFP fluorophore? Could the GFP10 construct that is tagged to BRP in all BRP-expressing cells, independent of GAL4, possibly hamper neuronal function? Is it certain that only active zones are labeled? I do see that plastic changes are made visible in this study after an associative learning experiment with BRP intensity and cluster volume as read-out, but I would be reassured by direct measurement of synaptic plasticity with splitGFP directly connected to BRP, maybe at a different synapse that is more accessible.

    3. Reviewer #2 (Public review):

      Summary:

      The authors developed a cell-type specific fluorescence-tagging approach using a CRISPR/Cas9 induced spilt-GFP reconstitution system to visualize endogenous Bruchpilot (BRP) clusters as presynaptic active zones (AZ) in specific cell types of the mushroom body (MB) in the adult Drosophila brain. This AZ profiling approach was implemented in a high-throughput quantification process, allowing for the comparison of synapse profiles within single cells, cell types, MB compartments, and between different individuals. The aim is to analyse in more detail neuronal connectivity and circuits in this centre of associative learning. These are notoriously difficult to investigate due to the density of cells and structures within a cell. The authors detect and characterize cell-type-specific differences in BRP-dependent profiling of presynapses in different compartments of the MB, while intracellular AZ distribution was found to be stereotyped. Next to the descriptive part characterizing various AZ profiles in the MB, the authors apply an associative learning assay and detect consequent AZ re-organisation.

      Strengths:

      The strength of this study lies in the outstanding resolution of synapse profiling in the extremely dense compartments of the MB. This detailed analysis will be the entry point for many future analyses of synapse diversity in connection with functional specificity to uncover the molecular mechanisms underlying learning and memory formation and neuronal network logics. Therefore, this approach is of high importance for the scientific community and a valuable tool to investigate and correlate AZ architecture and synapse function in the CNS.

      Weaknesses:

      The results and conclusions presented in this study are, in many aspects, well-supported by the data presented. To further support the key findings of the manuscript, additional controls, comments, and possibly broader functional analysis would be helpful. In particular:

      (1) All experiments in the study are based on spilt-GFP lines (BRP:GFP11 and UAS-GFP1-10). The Materials and Methods section does not contain any cloning strategy (gRNA, primer, PCR/sequencing validation, exact position of tag insertion, etc.) and only refers to a bioRxiv publication. It might be helpful to add a Materials and Methods section (at least for the BRP:GFP11 line). Additionally, as this is an on locus insertion the in BRP-ORF, it needs a general validation of this line, including controls (Western Blot and correlative antibody staining against BRP) showing that overall BRP expression is not compromised due to the GFP insertion and localizes as BRP in wild type flies, that flies are viable, have no defects in locomotion and learning and memory formation and MB morphology is not affected compared to wild type animals.

      (2) Several aspects of image acquisition and high-throughput quantification data analysis would benefit from a more detailed clarification.

      a) For BRP cluster segmentation it is stated in the Materials and Methods state, that intensity threshold and noise tolerance were "set" - this setting has a large effect on the quantification, and it should be specified and setting criteria named and justified (if set manually (how and why) or automatically (to what)). Additionally, if Pyhton was used for "Nearest Neigbor" analysis, the code should be made available within this manuscript; otherwise, it is difficult to judge the quality of this quantification step.

      b) To better evaluate the quality of both the imaging analysis and image presentation, it would be important to state, if presented and analysed images are deconvolved and if so, at least one proof of principle example of a comparison of original and deconvoluted file should be shown and quantified to show the impact of deconvolution on the output quality as this is central to this study.

      (3) The major part of this study focuses on the description and comparison of the divergent synapse parameters across cell-types in MB compartments, which is highly relevant and interesting. Yet it would be very interesting to connect this new method with functional aspects of the heterogeneous synapses. This is done in Figure 7 with an associative learning approach, which is, in part, not trivial to follow for the reader and would profit from a more comprehensive analysis.

      a) It would be important for the understanding and validation of the learning induced changes, if not (only) a ratio (of AZ density/local intensity) would be presented, but both values on their own, especially to allow a comparison to the quoted, previous AZ remodelling analysis quantifying BRP intensities (ref. 17, 18). It should be elucidated in more detail why only the ratio was presented here.

      b) The reason why a single instead of a dual odour conditioning was performed could be clarified and discussed (would that have the same effects?).

      c) Additionally, "controls" for the unpaired values - that is, in flies receiving neither shock nor odour - it would help to evaluate the unpaired control values in the different MB compartments.

      d) The temporal resolution of the effect is very interesting (Figure 7D), and at more time points, especially between 90 and 270 min, this might raise interesting results.

      e) Additionally, it would be very interesting and rewarding to have at least one additional assay, relating structure and function, e.g. on a molecular level by a correlative analysis of BRP and synaptic vesicles (by staining or co-expression of SV-protein markers) or calcium activity imaging or on a functional level by additional learning assays

    4. Reviewer #3 (Public review):

      Summary:

      The authors develop a tool for marking presynaptic active zones in Drosophila brains, dependent on the GAL4 construct used to express a fragment of GFP, which will incorporate with a genome-engineered partial GFP attached to the active zone protein bruchpilot - signal will be specific to the GAL4-expressing neuronal compartment. They then use various GAL4s to examine innervation onto the mushroom bodies to dissect compartment-specific differences in the size and intensity of active zones. After a description of these differences, they induce learning in flies with classic odour/electric shock pairing and observe changes after conditioning that are specific to the paired conditioning/learning paradigm.

      Strengths:

      The imaging and analysis appear strong. The tool is novel and exciting.

      Weaknesses:

      I feel that the tool could do with a little more characterisation. It is assumed that the puncta observed are AZs with no further definition or characterisation.

    1. eLife Assessment

      This study identifies astrocyte-intrinsic mechanisms by which the LRRK2 G2019S, a mutation linked to familial Parkinson's disease, disrupts synaptic integrity in the anterior cingulate cortex. The findings are convincing, as they rely on a comprehensive set of in vivo and in vitro genetic, biochemical, proteomic, and electrophysiological approaches. They are important because of their translational value, being validated in both mouse models and post-mortem human samples.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors aim to uncover how the Parkinson's disease-linked LRRK2 G2019S mutation affects synaptic integrity through astrocyte-intrinsic mechanisms. Specifically, they investigate whether LRRK2-driven ERM hyperphosphorylation disrupts astrocyte morphology and excitatory synapse maintenance, with a focus on regional specificity within the cortex.

      Strengths:

      (1) Novelty and significance: The work provides important insights into non-neuronal contributions to Parkinson's disease (PD) pathology by highlighting a previously underappreciated role of astrocytic ERM signaling in synapse maintenance. This astrocyte-specific mechanism might help explain early cognitive dysfunctions in PD.

      (2) Mechanistic depth: The authors present a detailed molecular pathway where the LRRK2 G2019S mutation increases ERM phosphorylation, disrupting Ezrin-Atg7 interactions critical for astrocyte morphology.

      (3) Robust methodology: The study uses a powerful combination of tools, including AAV-mediated gene delivery, BioID-based interactome mapping, PALE labeling, and patch-clamp electrophysiology to link molecular, morphological, and functional changes.

      (4) Physiological relevance: Parallel findings in both mouse models and human post-mortem brains suggest conservation of the observed phenotypes and strengthen the relevance to PD pathogenesis.

      Weaknesses:

      (1) Causal directionality: While ERM hyperphosphorylation is clearly shown to correlate with morphological and synaptic changes, the specific causal hierarchy-especially between Ezrin-Atg7 interaction loss and synapse alteration, is inferred but not definitively proven. For example, a rescue experiment directly restoring Atg7 function alongside Ezrin manipulation could strengthen this point.

      (2) Brain region specificity: Although regional differences between ACC and MOp are well documented, the underlying cause of this differential vulnerability remains speculative. Examining astrocyte heterogeneity within cortical layers or via transcriptomic/proteomic profiling could clarify these regional effects.

      (3) Autophagy function: While Atg7 knockdown leads to clear morphological changes, autophagic flux (e.g., LC3-II turnover or p62 accumulation) is not directly assessed. This would strengthen the mechanistic link to autophagy disruption.

      (4) GFAP-based astrogliosis interpretation: The conclusion that no astrogliosis occurs in LRRK2 G2019S mice is based solely on GFAP staining. However, GFAP-negative reactive states have been reported. Including additional markers would help validate this interpretation.

      (5) Impact on neuronal populations: The authors conclude that changes in inhibitory synapse density in the MOp are not rescued by astrocytic Ezrin manipulation and suggest developmental effects on interneurons. However, this is speculative without neuronal cell-type-specific data. Including interneuron density or synaptic connectivity analysis would make this claim more robust.

      (6) Despite these limitations, the authors substantially achieve their stated aims. Their results provide strong support for a model in which astrocytic ERM signaling downstream of LRRK2 contributes to region-specific synaptic changes, particularly in the anterior cingulate cortex. While certain mechanistic links-such as the role of Ezrin-Atg7 interaction in synaptic maintenance-would benefit from further functional validation, the study offers a well-supported framework for understanding astrocyte-intrinsic contributions to synaptic dysfunction in Parkinson's disease.

      This work is likely to contribute meaningfully to ongoing research in neurodegeneration, glial biology, and synaptic regulation. The methodological approaches - especially the combination of in vivo models with proteomics and electrophysiology - will be of interest to others studying astrocyte function and neuron-glia interactions. More broadly, the study highlights the importance of astrocyte heterogeneity and regional specialization in shaping neural circuit vulnerability, providing a valuable foundation for future investigations.

    3. Reviewer #2 (Public review):

      Summary:

      This is an important study that examines the relationship between a Parkinson's 's-associated mutation in LRRK2 kinase and increased ERM phosphorylation in astrocytes, altered excitatory and inhibitory synapse density and function, and a reduction in astrocyte size. The scope is impressively large and includes human and mouse samples, and employs immunolabeling, whole cell patch clamp recording techniques, molecular manipulation in vivo, and BioID. Experiments have appropriate controls, and the outcomes are mostly convincing. The chief weakness is that the study emphasizes scope over depth, such that it falls short of a unifying model of LRRK2-ERM interactions and leave many outcomes difficult to interpret.

      The main idea is that the G2019S Parkinson's mutation in LRRK2 increases its kinase activity and that this either directly or indirectly increases ERM phosphorylation. This excessive ERM phosphorylation is expected to occur within perisynaptic astrocytic processes, reduce astrocyte complexity, and reduce excitatory synapse density and function in ACC. Overexpression of a dominant negative ezrin (phospho-dead) in astrocytes restores their morphology and excitatory synapse density in ACC. This pathway is well supported if taken on its own. But several datapoints presented do not fit this model. The reasoning driving selectivity to ACC and not M1 is not discussed or pursued (is it relevant that pERM levels appear lower in M1 at P21? Do astrocytes in S1 from G2019S mice also show reduced territories?); the differential effects on excitatory versus inhibitory synapses does not fit the model (or is this effect also expected to lie downstream of astrocytes?). Importantly, the effects of ezrin manipulation in wildtype samples (see below) are not integrated into the model, perhaps because the data run counter to expectation.

      Specific Concerns and Questions:

      (1) Effects in wildtype mice are not fully incorporated into the model. Overexpressing (OE) WT ezrin appears to reduce pERM levels by about half (Figure 1i vs 4B). OE-phospho-dead ezrin also appears to reduce pERM integrated density compared to control levels (same figures). This is not discussed (see also item 2). OE phospho-dead ezrin decreases synapse density and maybe function compared to OE WT ezrin in wildtype mice (4C, 4F), but it is not clear whether or not these data differ from unmanipulated wildtype sections/slices (Figures 2 and 3) because the data are normalized. These synaptic findings in wildtype should also be joined to the morphology findings in wildtype astrocytes, where OE-phospho-dead ezrin reduces astrocyte territory similar to LRRK2-G2019S. The shared morphological outcome is discussed as a potential defect in ERM phospho/dephospho balance, but it was hard to see if this could be similarly related to changes in synapse density.

      (2) Labeling for pERMs shown in wildtype mouse and control human is not convincing, but is convincing in the G2019S samples (e.g., Figure 1/S1, Figure 2) (although concentration in perisynaptic astrocytes is not clear). The data presented seem to better support the idea that the mutation confers a pathological gain of ERM phosphorylation (rather than hyperphosphorylation). If the faint labeling in wildtype and control samples is genuine, one would anticipate that pERM labeling would be different in shControl vs. shLrrk2 astrocytes.

      (3) Given the data presented, it would seem that overexpressing the BirA2 ezrin construct, like wildtype ezrin, could impact astrocyte biology. If overexpressing a wildtype ezrin reduces pERM levels, then perhaps the BirA2 construct expression already favors a closed conformation. This is not so much a critique of the approach as a request for clarification and to include, if possible, whether there are reasons to believe or data to support that the BirA2 construct adopts both open and closed conformations.

    4. Reviewer #3 (Public review):

      Summary:

      Wang et al. reported a new role of LRRK2-GS mutant in astrocyte morphology and synapse maintenance and a potential mechanism that acts through phosphorylation of ERM, which binds to ATG7. In both human LRRK2-GS patients and LRRK2-GS KI mouse brain cortex, they found increased ERM phosphorylation levels. LRRK2-GS alters excitatory and inhibitory synapse densities and functions in the cortex, which can be restored by p-ERM-dead mutant. They further demonstrated that LRRK2 regulates astrocyte morphological complexity in vivo through ERM phosphorylation. Proteomic and biochemistry approaches found that ATG7 interacts with Ezrin, which is inhibited by Ezrin phosphorylation. This provides a potential mechanism by which LRRK2-GS impairs the astrocyte morphology.

      Strengths:

      (1) Data in human PD patients (Figure 1B, C) is impressive, showing a clear increase of p-ERM in LRRK2-GS samples.

      (2) Both LRRK2-GS and siLRRK2 show similar phenotypes, supporting both GOF and LOF decrease astrocyte complexity and size.

      (3) Using p-ERM-dead and mimic mutants is elegant. The data is striking that the p-ERM-dead mutant can restore LRRK2-GS-induced excitatory synapse density in the ACC and astrocyte territory volume and complexity, while the p-ERM-mimic mutant can restore the siLRRK2 phenotype.

      (4) ATG7 binding to Ezrin provides a potential mechanism. It is compelling that siATG7 shows a similar decrease in astrocyte territory volume and complexity, and siATG7 in LRRK2-GS does not enhance the astrocyte phenotype.

      Weaknesses:

      (1) The authors claim that p-ERM colocalizes with astrocyte marker ALDH1L1, e.g., Figure 1E, F, G, H, J, K. It is hard to tell from the representative images. Given that this is critical for this paper, it would be appreciated if the authors could improve the images and show clear colocalization. The same concern for Figures S1, 2, 3. Validation of the p-ERM antibody is critical. Figure S4, using ฮป-PPase to eliminate the phosphorylation signal in general, is very helpful. Additional validation of the p-ERM antibody specific to ERM would be appreciated.

      (2) Does the total ERM level change /increase in LRRK2-GS samples? The increased p-ERM levels could be because the total ERM level increases. Then, the follow-up question is whether the total ERM level matters to the astrocyte phenotypes seen in the paper.

      (3) WT mice carry WT-LRRK2, which also has kinase activity to phosphorylate ERM. So, what are the effects of overexpression of the p-ERM mutants (dead or mimic) on the excitatory and inhibitory synapse densities and functions in WT mouse samples? In Figure 4, statistics should be done comparing WT+Ezrin O/E vs WT+phosphor-dead Ezrin O/E. From what is shown in the graphs, it looks like phosphor-dead Ezrin worsens the phenotype in WT mice, which is opposite to the GS mice. How to explain? The same question for the graphs in Figure 5.

      (4) Rab10 is not a robust substrate for the LRRK2-G2019S mutant, and p-Rab10 is very difficult to detect in mouse brains. The specificity of the pRab10 immunostaining signal in Fig. S8 is not certain.

      (5) Would ATG7, Ezrin, and LRRK2 form a complex?

    1. eLife Assessment

      In this manuscript, Park et al. developed a multiplexed CRISPR construct to genetically ablate the GABA transporter GAT3 in the mouse visual cortex, with effects on population-level neuronal activity. This work is important, as it sheds light on how GAT3 controls the processing of visual information. The findings are compelling, leveraging state-of-the-art gene CRISPR/Cas9, in vivo two-photon laser scanning microscopy, and advanced statistical modeling.

    2. Reviewer #1 (Public review):

      Summary:

      The authors have investigated the role of GAT3 in the visual system. First, they have developed a CRISPR/Cas9-based approach to locally knock out this transporter in the visual cortex. They then demonstrated electrophysiologically that this manipulation increases inhibitory synaptic input into layer 2/3 pyramidal cells. They further examined the functional consequences by imaging neuronal activity in the visual cortex in vivo. They found that the absence of GAT3 leads to reduced spontaneous neuronal activity and attenuated neuronal responses and reliability to visual stimuli, but without an effect on orientation selectivity. Further analysis of this data suggests that Gat3 removal leads to less coordinated activity between individual neurons and in population activity patterns, thereby impairing information encoding. Overall, this is an elegant and technically advanced study that demonstrates a new and important role of GAT3 in controlling the processing of visual information.

      Strengths:

      (1) Development of a new approach for a local knockout (GAT3).

      (2) Important and novel insights into visual system function and its dependence on GAT3.

      (3) Plausible cellular mechanism.

      Weaknesses:

      No major weaknesses were identified by this reviewer.

    3. Reviewer #2 (Public review):

      Summary:

      Park et al. have made a tool for spatiotemporally restricted knockout of the astrocytic GABA transporter GAT3, leveraging CRISPR/Cas9 and viral transduction in adult mice, and evaluated the effects of GAT3 on neural encoding of visual stimulation.

      Strengths:

      This concise manuscript leverages state-of-the-art gene CRISPR/Cas9 technology for knocking out astrocytic genes. This has only to a small degree been performed previously in astrocytes, and it represents an important development in the field. Moreover, the authors utilize in vivo two-photon imaging of neural responses to visual stimuli as a readout of neural activity, in addition to validating their data with ex vivo electrophysiology. Lastly, they use advanced statistical modeling to analyze the impact of GAT3 knockout. Overall, the study comes across as rigorous and convincing.

      Weaknesses:

      Adding the following experiments would potentially have strengthened the conclusions and helped with interpreting the findings:

      (1) Neural activity is quite profoundly influenced by GAT3 knockout. Corroborating these relatively large changes to neural activity with in vivo electrophysiology of some sort as an additional readout would have strengthened the conclusions.

      (2) Given the quite large effects on neural coding in visual cortex assessed pรฅ jRGECO imaging, it would have been interesting if the mouse groups could have been subjected to behavioral testing, assessing the visual system.

    1. eLife Assessment

      This study offers important insights into the development of infants' responses to music based on the exploration of EEG neural auditory responses and video-based movement analysis. The convincing results revealed that evoked responses emerge between 3 and 12 months of age, but data analysis requires further refinement to fully complement the findings related to movement in response to music. This study will be of significant interest to developmental psychologists and neuroscientists, as well as researchers interested in music processing and in the translation of perception into action.

    2. Reviewer #1 (Public review):

      Summary:

      This study aims to investigate the development of infants' responses to music by examining neural activity via EEG and spontaneous body kinematics using video-based analysis. The authors also explore the role of musical pitch in eliciting neural and motor responses, comparing infants at 3, 6, and 12 months of age.

      Strengths:

      A key strength of the study lies in its analysis of body kinematics and modeling of stimulus-motor coupling, demonstrating how the amplitude envelope of music predicts infant movement, and how higher musical pitch may enhance auditory-motor synchronization.

      Weaknesses:

      The neural data analysis is currently limited to auditory evoked potentials aligned with beat timing. A more comprehensive approach is needed to robustly support the proposed developmental trajectory of neural responses to music.

    3. Reviewer #2 (Public review):

      Summary:

      Infants' auditory brain responses reveal processing of music (clearly different from shuffled music patterns) from the age of 3 months; however, they do not show a related increase in spontaneous movement activity to music until the age of 12 months.

      Strengths:

      This is a nice paper, well designed, with sophisticated analyses and presenting clear results that make a lot of sense to this reviewer. The additions of EEG recordings in response to music presentations at 3 different infant ages are interesting, and the manipulation of the music stimuli into shuffled, high, and low pitch to capture differences in brain response and spontaneous movements is good. I really enjoyed reading this work and the well-written manuscript.

      Weaknesses:

      I only have two comments. The first is a change to the title. Maybe the title should refer to the first "postnatal" year, rather than the first year of life. There are controversies about when life really starts; it could be in the womb, so using postnatal to refer to the period after birth resolves that debate.

      The other comment relates to the 10 Principal Movements (PMs) identified. I was wondering about the rationale for identifying these different PMs and to what extent many PMs entered in the analyses may hinder more general pattern differences. Infants' spontaneous movements are very variable and poorly differentiated in early development. Maybe, instead of starting with 10 distinct PMs, a first analysis could be run using the combined Quantity of Movements (QoM) without PM distinctions to capture an overall motor response to music. Maybe only 2 PMs could be entered in the analysis, for the arms and for the legs, regardless of the patterns generated. Maybe the authors have done such an analysis already, but describing an overall motor response, before going into specific patterns of motor activation, could be useful to describe the level of motor response. Again, infants provide extremely variable patterns of response, and such variability may potentially hinder an overall effect if the QoM were treated as a cumulated measure rather than one with differentiated patterns.

    4. Reviewer #3 (Public review):

      Summary:

      This study provides a detailed investigation of neural auditory responses and spontaneous movements in infants listening to music. Analyses of EEG data (event-related potentials and steady-state responses) first highlighted that infants at 3, 6, and 12 months of age and adults showed enhanced auditory responses to music than shuffled music. 6-month-olds also exhibited enhanced P1 response to high-pitch vs low-pitch stimuli, but not the other groups. Besides, whole body spontaneous movements of infants were decomposed into 10 principal components. Kinematic analyses revealed that the quantity of movement was higher in response to music than shuffled music only at 12 months of age. Although Granger causality analysis suggested that infants' movement was related to the music intensity changes, particularly in the high-pitch condition, infants did not exhibit phase-locked movement responses to musical events, and the low movement periodicity was not coordinated with music.

      Strengths:

      This study investigates an important topic on the development of music perception and translation to action and dance. It targets a crucial developmental period that is difficult to explore. It evaluates two modalities by measuring neural auditory responses and kinematics, while cross-modal development is rarely evaluated. Overall, the study fills a clear gap in the literature.

      Besides, the study uses state-of-the-art analyses. All steps are clearly detailed. The manuscript is very clear, well-written, and pleasant to read. Figures are well-designed and informative.

      Weaknesses:

      (1) Differences in neural responses to high-pitch vs low-pitch stimuli between 6-month-olds and other infants are difficult to interpret.

      (2) Making some links between the neural and movement responses that are described in this manuscript could be expected, given the study goal. Although kinematic analyses suggested that movement responses are not phase-locked to the music stimuli, analyses of Granger causality between motion velocity and neural responses could be relevant.

      (3) The study considers groups of infants at different ages, but infants within each group might be at different stages of motor development. Was this assessed behaviorally? Would it be possible to explore or take into account this possible inter-individual variability?

    1. eLife Assessment

      This paper undertakes an important investigation to determine whether movement slowing in microgravity is due to a strategic conservative approach or rather due to an underestimation of the mass of the arm. While the experimental dataset is unique and the coupled experimental and computational analyses comprehensive, the authors present incomplete results to support the claim that movement slowing is due to mass underestimation. Further analysis is needed to rule out alternative explanations.

    2. Reviewer #1 (Public review):

      Summary:

      This article investigates the origin of movement slowdown in weightlessness by testing two possible hypotheses: the first is based on a strategic and conservative slowdown, presented as a scaling of the motion kinematics without altering its profile, while the second is based on the hypothesis of a misestimation of effective mass by the brain due to an alteration of gravity-dependent sensory inputs, which alters the kinematics following a controller parameterization error.

      Strengths:

      The article convincingly demonstrates that trajectories are affected in 0g conditions, as in previous work. It is interesting, and the results appear robust. However, I have two major reservations about the current version of the manuscript that prevent me from endorsing the conclusion in its current form.

      Weaknesses:

      (1) First, the hypothesis of a strategic and conservative slowdown implicitly assumes a similar cost function, which cannot be guaranteed, tested, or verified. For example, previous work has suggested that changing the ratio between the state and control weight matrices produced an alteration in movement kinematics similar to that presented here, without changing the estimated mass parameter (Crevecoeur et al., 2010, J Neurophysiol, 104 (3), 1301-1313). Thus, the hypothesis of conservative slowing cannot be rejected. Such a strategy could vary with effective mass (thus showing a statistical effect), but the possibility that the data reflect a combination of both mechanisms (strategic slowing and mass misestimation) remains open.

      (2) The main strength of the article is the presence of directional effects expected under the hypothesis of mass estimation error. However, the article lacks a clear demonstration of such an effect: indeed, although there appears to be a significant effect of direction, I was not sure that this effect matched the model's predictions. A directional effect is not sufficient because the model makes clear quantitative predictions about how this effect should vary across directions. In the absence of a quantitative match between the model and the data, the authors' claims regarding the role of misestimating the effective mass remain unsupported.

      In general, both the hypotheses of slowing motion (out of caution) and misestimating mass have been put forward in the past, and the added value of this article lies in demonstrating that the effect depended on direction. However, (1) a conservative strategy with a different cost function can also explain the data, and (2) the quantitative match between the directional effect and the model's predictions has not been established.

      Specific points:

      (1) I noted a lack of presentation of raw kinematic traces, which would be necessary to convince me that the directional effect was related to effective mass as stated.

      (2) The presentation and justification of the model require substantial improvement; the reason for their presence in the supplementary material is unclear, as there is space to present the modelling work in detail in the main text. Regarding the model, some choices require justification: for example, why did the authors ignore the nonlinear Coriolis and centripetal terms?

      (3) The increase in the proportion of trials with subcomponents is interesting, but the explanatory power of this observation is limited, as the initial percentage was already quite high (from 60-70% during the initial study to 70-85% in flight). This suggests that the potential effect of effective mass only explains a small increase in a trend already present in the initial study. A more critical assessment of this result is warranted.

    3. Reviewer #2 (Public review):

      This study explores the underlying causes of the generalized movement slowness observed in astronauts in weightlessness compared to their performance on Earth. The authors argue that this movement slowness stems from an underestimation of mass rather than a deliberate reduction in speed for enhanced stability and safety.

      Overall, this is a fascinating and well-written work. The kinematic analysis is thorough and comprehensive. The design of the study is solid, the collected dataset is rare, and the model tends to add confidence to the proposed conclusions. That being said, I have several comments that could be addressed to consolidate interpretations and improve clarity.

      Main comments:

      (1) Mass underestimation

      a) While this interpretation is supported by data and analyses, it is not clear whether this gives a complete picture of the underlying phenomena. The two hypotheses (i.e., mass underestimation vs deliberate speed reduction) can only be distinguished in terms of velocity/acceleration patterns, which should display specific changes during the flight with a mass underestimation. The experimental data generally shows the expected changes but for the 45{degree sign} condition, no changes are observed during flight compared to the pre- and post-phases (Figure 4). In Figure 5E, only a change in the primary submovement peak velocity is observed for 45{degree sign}, but this finding relies on a more involved decomposition procedure. It suggests that there is something specific about 45{degree sign} (beyond its low effective mass). In such planar movements, 45{degree sign} often corresponds to a movement which is close to single-joint, whereas 90{degree sign} and 135{degree sign} involve multi-joint movements. If so, the increased proportion of submovements in 90{degree sign} and 135{degree sign} could indicate that participants had more difficulties in coordinating multi-joint movements during flight. Besides inertia, Coriolis and centripetal effects may be non-negligible in such fast planar reaching (Hollerbach & Flash, Biol Cyber, 1982) and, interestingly, they would also be affected by a mass underestimation (thus, this is not necessarily incompatible with the author's view; yet predicting the effects of a mass underestimation on Coriolis/centripetal torques would require a two-link arm model). Overall, I found the discrepancy between the 45{degree sign} direction and the other directions under-exploited in the current version of the article. In sum, could the corrective submovements be due to a misestimation of Coriolis/centripetal torques in the multi-joint dynamics (caused specifically -or not- by a mass underestimation)?

      b) Additionally, since the taikonauts are tested after 2 or 3 weeks in flight, one could also assume that neuromuscular deconditioning explains (at least in part) the general decrease in movement speed. Can the authors explain how to rule out this alternative interpretation? For instance, weaker muscles could account for slower movements within a classical time-effort trade-off (as more neural effort would be needed to generate a similar amount of muscle force, thereby suggesting a purposive slowing down of movement). Therefore, could the observed results (slowing down + more submovements) be explained by some neuromuscular deconditioning combined with a difficulty in coordinating multi-joint movements in weightlessness (due to a misestimation or Coriolis/centripetal torques) provide an alternative explanation for the results?

      (2) Modelling

      a) The model description should be improved as it is currently a mix of discrete time and continuous time formulations. Moreover, an infinite-horizon cost function is used, but I thought the authors used a finite-horizon formulation with the prefixed duration provided by the movement utility maximization framework of Shadmehr et al. (Curr Biol, 2016). Furthermore, was the mass underestimation reflected both in the utility model and the optimal control model? If so, did the authors really compute the feedback control gain with the underestimated mass but simulate the system with the real mass? This is important because the mass appears both in the utility framework and in the LQ framework. Given the current interpretations, the feedforward command is assumed to be erroneous, and the feedback command would allow for motor corrections. Therefore, it could be clarified whether the feedback command also misestimates the mass or not, which may affect its efficiency. For instance, if both feedforward and feedback motor commands are based on wrong internal models (e.g., due to the mass underestimation), one may wonder how the astronauts would execute accurate goal-directed movements.

      b) The model seems to be deterministic in its current form (no motor and sensory noise). Since the framework developed by Todorov (2005) is used, sensorimotor noise could have been readily considered. One could also assume that motor and sensory noise increase in microgravity, and the model could inform on how microgravity affects the number of submovements or endpoint variance due to sensorimotor noise changes, for instance.

      c) Finally, how does the model distinguish the feedforward and feedback components of the motor command that are discussed in the paper, given that the model only yields a feedback control law? Does 'feedforward' refer to the motor plan here (i.e., the prefixed duration and arguably the precomputed feedback gain)?

      (3) Brevity of movements and speed-accuracy trade-off

      The tested movements are much faster (average duration approx. 350 ms) than similar self-paced movements that have been studied in other works (e.g., Wang et al., J Neurophysiology, 2016; Berret et al., PLOS Comp Biol, 2021, where movements can last about 900-1000 ms). This is consistent with the instructions to reach quickly and accurately, in line with a speed-accuracy trade-off. Was this instruction given to highlight the inertial effects related to the arm's anisotropy? One may however, wonder if the same results would hold for slower self-paced movements (are they also with reduced speed compared to Earth performance?). Moreover, a few other important questions might need to be addressed for completeness: how to ensure that astronauts did remember this instruction during the flight? (could the control group move faster because they better remembered the instruction?). Did the taikonauts perform the experiment on their own during the flight, or did one taikonaut assume the role of the experimenter?

      (4) No learning effect

      This is a surprising effect, as mentioned by the authors. Other studies conducted in microgravity have indeed revealed an optimal adaptation of motor patterns in a few dozen trials (e.g., Gaveau et al., eLife, 2016). Perhaps the difference is again related to single-joint versus multi-joint movements. This should be better discussed given the impact of this claim. Typically, why would a "sensory bias of bodily property" persist in microgravity and be a "fundamental constraint of the sensorimotor system"?

    4. Reviewer #3 (Public review):

      Summary:

      The authors describe an interesting study of arm movements carried out in weightlessness after a prolonged exposure to the so-called microgravity conditions of orbital spaceflight. Subjects performed radial point-to-point motions of the fingertip on a touch pad. The authors note a reduction in movement speed in weightlessness, which they hypothesize could be due to either an overall strategy of lowering movement speed to better accommodate the instability of the body in weightlessness or an underestimation of body mass. They conclude for the latter, mainly based on two effects. One, slowing in weightlessness is greater for movement directions with higher effective mass at the end effector of the arm. Two, they present evidence for an increased number of corrective submovements in weightlessness. They contend that this provides conclusive evidence to accept the hypothesis of an underestimation of body mass.

      Strengths:

      In my opinion, the study provides a valuable contribution, the theoretical aspects are well presented through simulations, the statistical analyses are meticulous, the applicable literature is comprehensively considered and cited, and the manuscript is well written.

      Weaknesses:

      Nevertheless, I am of the opinion that the interpretation of the observations leaves room for other possible explanations of the observed phenomenon, thus weakening the strength of the arguments.

      First, I would like to point out an apparent (at least to me) divergence between the predictions and the observed data. Figures 1 and S1 show that the difference between predicted values for the 3 movement directions is almost linear, with predictions for 90ยบ midway between predictions for 45ยบ and 135ยบ. The effective mass at 90ยบ appears to be much closer to that of 45ยบ than to that of 135ยบ (Figure S1A). But the data shown in Figure 2 and Figure 3 indicate that movements at 90ยบ and 135ยบ are grouped together in terms of reaction time, movement duration, and peak acceleration, while both differ significantly from those values for movements at 45ยบ.

      Furthermore, in Figure 4, the change in peak acceleration time and relative time to peak acceleration between 1g and 0g appears to be greater for 90ยบ than for 135ยบ, which appears to me to be at least superficially in contradiction with the predictions from Figure S1. If the effective mass is the key parameter, wouldn't one expect as much difference between 90ยบ and 135ยบ as between 90ยบ and 45ยบ? It is true that peak speed (Figure 3B) and peak speed time (Figure 4B) appear to follow the ordering according to effective mass, but is there a mathematical explanation as to why the ordering is respected for velocity but not acceleration? These inconsistencies weaken the author's conclusions and should be addressed.

      Then, to strengthen the conclusions, I feel that the following points would need to be addressed:

      (1) The authors model the movement control through equations that derive the input control variable in terms of the force acting on the hand and treat the arm as a second-order low-pass filter (Equation 13). Underestimation of the mass in the computation of a feedforward command would lead to a lower-than-expected displacement to that command. But it is not clear if and how the authors account for a potential modification of the time constants of the 2nd order system. The CNS does not effectuate movements with pure torque generators. Muscles have elastic properties that depend on their tonic excitation level, reflex feedback, and other parameters. Indeed, Fisk et al.* showed variations of movement characteristics consistent with lower muscle tone, lower bandwidth, and lower damping ratio in 0g compared to 1g. Could the variations in the response to the initial feedforward command be explained by a misrepresentation of the limbs' damping and natural frequency, leading to greater uncertainty about the consequences of the initial command? This would still be an argument for unadapted feedforward control of the movement, leading to the need for more corrective movements. But it would not necessarily reflect an underestimation of body mass.

      *Fisk, J. O. H. N., Lackner, J. R., & DiZio, P. A. U. L. (1993). Gravitoinertial force level influences arm movement control. Journal of neurophysiology, 69(2), 504-511.

      (2) The movements were measured by having the subjects slide their finger on the surface of a touch screen. In weightlessness, the implications of this contact are expected to be quite different than those on the ground. In weightlessness, the taikonauts would need to actively press downward to maintain contact with the screen, while on Earth, gravity will do the work. The tangential forces that resist movement due to friction might therefore be different in 0g. This could be particularly relevant given that the effect of friction would interact with the limb in a direction-dependent fashion, given the anisotropy of the equivalent mass at the fingertip evoked by the authors. Is there some way to discount or control for these potential effects?

      (3) The carefully crafted modelling of the limb neglects, nevertheless, the potential instability of the base of the arm. While the taikonauts were able to use their left arm to stabilize their bodies, it is not clear to what extent active stabilization with the contralateral limb can reproduce the stability of the human body seated in a chair in Earth gravity. Unintended motion of the shoulder could account for a smaller-than-expected displacement of the hand in response to the initial feedforward command and/or greater propensity for errors (with a greater need for corrective submovements) in 0g. The direction of movement with respect to the anchoring point could lead to the dependence of the observed effects on movement direction. Could this be tested in some way, e.g., by testing subjects on the ground while standing on an unstable base of support or sitting on a swing, with the same requirement to stabilize the torso using the contralateral arm?

      The arguments for an underestimation of body mass would be strengthened if the authors could address these points in some way.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      This article investigates the origin of movement slowdown in weightlessness by testing two possible hypotheses: the first is based on a strategic and conservative slowdown, presented as a scaling of the motion kinematics without altering its profile, while the second is based on the hypothesis of a misestimation of effective mass by the brain due to an alteration of gravity-dependent sensory inputs, which alters the kinematics following a controller parameterization error.

      Strengths:

      The article convincingly demonstrates that trajectories are affected in 0g conditions, as in previous work. It is interesting, and the results appear robust. However, I have two major reservations about the current version of the manuscript that prevent me from endorsing the conclusion in its current form.

      Weaknesses:

      (1) First, the hypothesis of a strategic and conservative slow down implicitly assumes a similar cost function, which cannot be guaranteed, tested, or verified. For example, previous work has suggested that changing the ratio between the state and control weight matrices produced an alteration in movement kinematics similar to that presented here, without changing the estimated mass parameter (Crevecoeur et al., 2010, J Neurophysiol, 104 (3), 1301-1313). Thus, the hypothesis of conservative slowing cannot be rejected. Such a strategy could vary with effective mass (thus showing a statistical effect), but the possibility that the data reflect a combination of both mechanisms (strategic slowing and mass misestimation) remains open.

      We test whether changing the ratio between the state and control weight matrices can generate the observed effect. As shown in Author response image 1 and Author response image 2, the cost function change cannot produce a reduced peak velocity/acceleration and their timing advance simultaneously, but a mass estimation change can. In other words, using mass underestimation alone can explain the two key findings, amplitude reduction and timing advance. Yes, we cannot exclude the possibility of a change in cost function on top of the mass underestimation, but the principle of Occamโ€™s Razor would support to adhering to a simple explanation, i.e., using body mass underestimation to explain the key findings. We will include our exploration on possible changes in cost function in the revision (in the Supplemental Materials).

      Author response image 1.

      Simulation using an altered cost function with ฮฑ = 3.0. Panels A, B, and E show simulated position, velocity, and acceleration profiles, respectively, for the three movement directions. Solid lines correspond to pre- and post-exposure conditions, while dashed lines represent the in-flight condition. Panels C and D display the peak velocity and its timing across the three phases (Pre, In, Post), and Panels F and G show the corresponding peak acceleration and its timing. Note, varying the cost function, while leading to reduced peak velocity/acceleration, leads to an erroneous prediction of delayed timing of peak velocity/acceleration.

      Author response image 2.

      Simulation results using a cost function with ฮฑ = 0.3. The format is the same as in Author response image 1. Note, this ten-fold decrease in ฮฑ, while finally getting the timing of peak velocity/acceleration right (advanced or reduced), leads to an erroneous prediction of increased peak velocity/acceleration.

      (2) The main strength of the article is the presence of directional effects expected under the hypothesis of mass estimation error. However, the article lacks a clear demonstration of such an effect: indeed, although there appears to be a significant effect of direction, I was not sure that this effect matched the model's predictions. A directional effect is not sufficient because the model makes clear quantitative predictions about how this effect should vary across directions. In the absence of a quantitative match between the model and the data, the authors' claims regarding the role of misestimating the effective mass remain unsupported.

      Our paper does not aim to quantitatively reproduce human reaching movements in microgravity. We will make this more clearly in the revision.

      (1) The model is a simplification of the actual situation. For example, the model simulates an ideal case of moving a point mass (effective mass) without friction and without considering Coriolis and centripetal torques, while the actual situation is that people move their finger across a touch screen. The two-link arm model assumes planar movements, but our participants move their hand on a table top without vertical support to constrain their movement in 2D.

      (2) Our study merely uses well-established (though simplified) models to qualitatively predict the overall behavioral patterns if mass underestimation is at play. For this purpose, the results are well in line with modelsโ€™ qualitative predictions: we indeed confirm that key kinematic features (peak velocity and acceleration) follow the same ranking order of movement direction conditions as predicted.

      (3) Using model simulation to qualitatively predict human behavioral patterns is a common practice in motor control studies, prominent examples including the papers on optimal feedback control (Todorov, 2004 and 2005) and movement vigor (Shadmehr et al., 2016). In fact, our model was inspired by the model in the latter paper.

      Citations:

      Todorov, E. (2004). Optimality principles in sensorimotor control. Nature Neuroscience, 7(9), 907.

      Todorov, E. (2005). Stochastic optimal control and estimation methods adapted to the noise characteristics of the sensorimotor system. Neural Computation, 17(5), 1084โ€“1108.

      Shadmehr, R., Huang, H. J., & Ahmed, A. A. (2016). A Representation of Effort in Decision-Making and Motor Control. Current Biology: CB, 26(14), 1929โ€“1934.

      In general, both the hypotheses of slowing motion (out of caution) and misestimating mass have been put forward in the past, and the added value of this article lies in demonstrating that the effect depended on direction. However, (1) a conservative strategy with a different cost function can also explain the data, and (2) the quantitative match between the directional effect and the model's predictions has not been established.

      Specific points:

      (1) I noted a lack of presentation of raw kinematic traces, which would be necessary to convince me that the directional effect was related to effective mass as stated.

      We are happy to include exemplary speed and acceleration trajectories. One example subjectโ€™s detailed trajectories are shown below and will be included in the revision. The reduced and advanced velocity/acceleration peaks are visible in typical trials.

      Author response image 3.

      Hand speed profiles (upper panels), hand acceleration profiles (middle panels) and speed profiles of the primary submovements (lower panels) towards different directions from an example participant.

      (2) The presentation and justification of the model require substantial improvement; the reason for their presence in the supplementary material is unclear, as there is space to present the modelling work in detail in the main text. Regarding the model, some choices require justification: for example, why did the authors ignore the nonlinear Coriolis and centripetal terms?

      Response: In brief, our simulations show that Coriolis and centripetal forces, despite having some directional anisotropy, only have small effects on predicted kinematics (see our responses to Reviewer 2). We will move descriptions of the model into the main text with more justifications for using a simple model.

      (3) The increase in the proportion of trials with subcomponents is interesting, but the explanatory power of this observation is limited, as the initial percentage was already quite high (from 60-70% during the initial study to 70-85% in flight). This suggests that the potential effect of effective mass only explains a small increase in a trend already present in the initial study. A more critical assessment of this result is warranted.

      Response: Indeed, the percentage of submovements only increases slightly, but the more important change is that the IPI (the inter-peak interval between submovements) also increases at the same time. Moreover, it is the effect of IPI that significantly predicts the duration increase in our linear mixed model. We will highlight this fact in our revision to avoid confusion.

      Reviewer #2 (Public review):

      This study explores the underlying causes of the generalized movement slowness observed in astronauts in weightlessness compared to their performance on Earth. The authors argue that this movement slowness stems from an underestimation of mass rather than a deliberate reduction in speed for enhanced stability and safety.

      Overall, this is a fascinating and well-written work. The kinematic analysis is thorough and comprehensive. The design of the study is solid, the collected dataset is rare, and the model tends to add confidence to the proposed conclusions. That being said, I have several comments that could be addressed to consolidate interpretations and improve clarity.

      Main comments:

      (1) Mass underestimation

      a) While this interpretation is supported by data and analyses, it is not clear whether this gives a complete picture of the underlying phenomena. The two hypotheses (i.e., mass underestimation vs deliberate speed reduction) can only be distinguished in terms of velocity/acceleration patterns, which should display specific changes during the flight with a mass underestimation. The experimental data generally shows the expected changes but for the 45{degree sign} condition, no changes are observed during flight compared to the pre- and post-phases (Figure 4). In Figure 5E, only a change in the primary submovement peak velocity is observed for 45{degree sign}, but this finding relies on a more involved decomposition procedure. It suggests that there is something specific about 45{degree sign} (beyond its low effective mass). In such planar movements, 45{degree sign} often corresponds to a movement which is close to single-joint, whereas 90{degree sign} and 135{degree sign} involve multi-joint movements. If so, the increased proportion of submovements in 90{degree sign} and 135{degree sign} could indicate that participants had more difficulties in coordinating multi-joint movements during flight. Besides inertia, Coriolis and centripetal effects may be non-negligible in such fast planar reaching (Hollerbach & Flash, Biol Cyber, 1982) and, interestingly, they would also be affected by a mass underestimation (thus, this is not necessarily incompatible with the author's view; yet predicting the effects of a mass underestimation on Coriolis/centripetal torques would require a two-link arm model). Overall, I found the discrepancy between the 45{degree sign} direction and the other directions under-exploited in the current version of the article. In sum, could the corrective submovements be due to a misestimation of Coriolis/centripetal torques in the multi-joint dynamics (caused specifically -or not- by a mass underestimation)?

      We agree that the effect of mass underestimation is less in the 45ยฐ direction than the other two directions, possibly related to its reliance on single-joint (elbow) as opposed to two-joints (elbow and shoulder) movements. Plus, movement correction using one joint is probably easier (as also suggested by another reviewer), this possibility will be further discussed in the revision. However, we find that our model simplification (excluding Coriolis and centripetal torques) does not affect our main conclusions at all. First, we performed a simple simulation and found that, under the current optimal hand trajectory, incorporating Coriolis and centripetal torques has only a limited impact on the resulting joint torques (see simulations in Author response image 4). One reason is that we used smaller movements than Hallerbach & Flash did. In addition, we applied an optimal feedback control model to a more realistic 2-joint arm configuration. Despite its simplicity, this model produced a speed profile consistent with our current predictions and made similar predictions regarding the effects of mass underestimation (Author response image 5). We will provide a more realistic 2-joint arm model muscle dynamics in the revision to improve the simulation further, but the message will be same: including or excluding Coriolis and centripetal torques will not affect the theoretical predictions about mass underestimation. Second, as the reviewer correctly pointed out, the mass (and its underestimation) also affects these two torque terms, thus its effect on kinematic measures is not affected much even with the full model.

      Author response image 4.

      Joint angles and joint torque of shoulder and elbow with simulated trajectories towards different directions. A. Shoulder (green) and elbow (blue) angles change with time for the 45ยฐ movement direction. B. Components of joint interaction torques at the shoulder. Solid line: net torque at the shoulder; dotted line: shoulder inertia torque; dashed line: shoulder Coriolis and centripetal torque. C. Same plot as B for the elbow joint. Dโ€“F. Coriolis and centripetal components in the full 360ยฐ workspace, beyond three movement directions (45ยฐ, 90ยฐ, and 135ยฐ). D. Net torque. E. Inertial torque. F. Combined Coriolis and centripetal torque. Note the polar plots of Coriolis/centripetal torques (F) have a scale that is two magnitudes smaller than that of inertial torque in our simulation. All torques were simulated with the optimal movement duration. Torques were squared and integrated over each trajectory.

      Author response image 5.

      Comparison between simulation results from the full model with the addition of Coriolis/centripetal torques (left) and the simplified model (right). The position profiles (top) and the corresponding speed profiles low) are shown. Solid lines are for normal mass estimation and dashed lines for mass underestimation in microgravity. The three colors represent three movement directions (dark red: 45ยฐ, red: 90ยฐ, yellow: 135ยฐ). The full model used a 2-link arm model without realistic muscle dynamics yet (will include in the formal revision) thus the speed profile is not smooth. Importantly, the full model also predict the same effect of mass underestimation, i.e., reduced peak velocity/acceleration and their timing advance.

      b) Additionally, since the taikonauts are tested after 2 or 3 weeks in flight, one could also assume that neuromuscular deconditioning explains (at least in part) the general decrease in movement speed. Can the authors explain how to rule out this alternative interpretation? For instance, weaker muscles could account for slower movements within a classical time-effort trade-off (as more neural effort would be needed to generate a similar amount of muscle force, thereby suggesting a purposive slowing down of movement). Therefore, could the observed results (slowing down + more submovements) be explained by some neuromuscular deconditioning combined with a difficulty in coordinating multi-joint movements in weightlessness (due to a misestimation or Coriolis/centripetal torques) provide an alternative explanation for the results?

      Response: Neuromuscular deconditioning is indeed a space or microgravity effect; thanks for bringing this up as we omitted the discussion of its possible contribution in the initial submission. However, muscle weakness is less for upper-limb muscles than for postural and lower-limb muscles (Tesch et al., 2005). The handgrip strength decreases 5% to 15% after several months (Moosavi et al., 2021); shoulder and elbow muscles atrophy, though not directly measured, was estimated to be minimal (Shen et al., 2017). The muscle weakness is unlikely to play a major role here since our reaching task involves small movements (~12cm) with joint torques of a magnitude of ~2Nยทm. Coriolis/centripetal torques does not affect the putative mass effect (as shown above simulations). The reviewer suggests that their poor coordination in microgravity might contribute to slowing down + more submovements. Poor coordination is an umbrella term for any motor control problems, and it can explain any microgravity effect. The feedforward control changes caused by mass underestimation can also be viewed as poor coordination. If we limit it as the coordination of the two joints or coordinating Coriolis/centripetal torques, we should expect to see some trajectory curvature changes in microgravity. However, we further analyzed our reaching trajectories and found no sign of curvature increase in our large collection of reaching movements. We probably have the largest dataset of reaching movements collected in microgravity thus far, given that we had 12 taikonauts and each of them performed about 480 to 840 reaching trials during their spaceflight. We believe the probability of Type II error is quite low here. We will include descriptive statistics of these new analyses in our revision.

      Citation: Tesch, P. A., Berg, H. E., Bring, D., Evans, H. J., & LeBlanc, A. D. (2005). Effects of 17-day spaceflight on knee extensor muscle function and size. European journal of applied physiology, 93(4), 463-468.

      Moosavi, D., Wolovsky, D., Depompeis, A., Uher, D., Lennington, D., Bodden, R., & Garber, C. E. (2021). The effects of spaceflight microgravity on the musculoskeletal system of humans and animals, with an emphasis on exercise as a countermeasure: A systematic scoping review. Physiological Research, 70(2), 119.

      Shen, H., Lim, C., Schwartz, A. G., Andreev-Andrievskiy, A., Deymier, A. C., & Thomopoulos, S. (2017). Effects of spaceflight on the muscles of the murine shoulder. The FASEB Journal, 31(12), 5466.

      (2) Modelling

      a) The model description should be improved as it is currently a mix of discrete time and continuous time formulations. Moreover, an infinite-horizon cost function is used, but I thought the authors used a finite-horizon formulation with the prefixed duration provided by the movement utility maximization framework of Shadmehr et al. (Curr Biol, 2016). Furthermore, was the mass underestimation reflected both in the utility model and the optimal control model? If so, did the authors really compute the feedback control gain with the underestimated mass but simulate the system with the real mass? This is important because the mass appears both in the utility framework and in the LQ framework. Given the current interpretations, the feedforward command is assumed to be erroneous, and the feedback command would allow for motor corrections. Therefore, it could be clarified whether the feedback command also misestimates the mass or not, which may affect its efficiency. For instance, if both feedforward and feedback motor commands are based on wrong internal models (e.g., due to the mass underestimation), one may wonder how the astronauts would execute accurate goal-directed movements.

      b) The model seems to be deterministic in its current form (no motor and sensory noise). Since the framework developed by Todorov (2005) is used, sensorimotor noise could have been readily considered. One could also assume that motor and sensory noise increase in microgravity, and the model could inform on how microgravity affects the number of submovements or endpoint variance due to sensorimotor noise changes, for instance.

      c) Finally, how does the model distinguish the feedforward and feedback components of the motor command that are discussed in the paper, given that the model only yields a feedback control law? Does 'feedforward' refer to the motor plan here (i.e., the prefixed duration and arguably the precomputed feedback gain)?

      We appreciate these very helpful suggestions about our model presentation. Indeed, our initial submission did not give detailed model descriptions in the main text, due to text limits for early submissions. We actually used a finite-horizon framework throughout, with a pre-specified duration derived from the utility model. In the revision, we will make that point clear, and we will also revise the Methods section to explicitly distinguish feedforward vs. feedback components, clarify the use of mass underestimation in both utility and control models, and update the equations accordingly.

      (3) Brevity of movements and speed-accuracy trade-off

      The tested movements are much faster (average duration approx. 350 ms) than similar self-paced movements that have been studied in other works (e.g., Wang et al., J Neurophysiology, 2016; Berret et al., PLOS Comp Biol, 2021, where movements can last about 900-1000 ms). This is consistent with the instructions to reach quickly and accurately, in line with a speed-accuracy trade-off. Was this instruction given to highlight the inertial effects related to the arm's anisotropy? One may however, wonder if the same results would hold for slower self-paced movements (are they also with reduced speed compared to Earth performance?). Moreover, a few other important questions might need to be addressed for completeness: how to ensure that astronauts did remember this instruction during the flight? (could the control group move faster because they better remembered the instruction?). Did the taikonauts perform the experiment on their own during the flight, or did one taikonaut assume the role of the experimenter?

      Thanks for highlighting the brevity of movements in our experiment. Our intention in emphasizing fast movements is to rigorously test whether movement is indeed slowed down in microgravity. The observed prolonged movement duration clearly shows that microgravity affects peopleโ€™s movement duration, even when they are pushed to move fast. The second reason for using fast movement is to highlight that feedforward control is affected in microgravity. Mass underestimation specifically affects feedforward control in the first place. Slow movement would inevitably have online corrections that might obscure the effect of mass underestimation. Note that movement slowing is not only observed in our speed-emphasized reaching task, but also in whole-arm pointing in other astronauts studies (Berger, 1997; Sangals, 1999), which have been quoted in our paper. We thus believe these findings are generalizable.

      Regarding the consistency of instructions: all our experiments conducted in the Tiangong space station were monitored in real time by experimenters in the Control Center located in Beijing. The task instructions were presented on the initial display of the data acquisition application and ample reading time was allowed. In fact, all the pre-, in-, and post-flight test sessions were administered by the same group of experimenters with the same instruction. It is common that astronauts serve both as participants and experimenters at the same time. And, they were well trained for this type of role on the ground. Note that we had multiple pre-flight test sessions to familiarize them with the task. All these rigorous measures were in place to obtain high-quality data. We will include these experimental details and the rationales for emphasizing fast movements in the revision.

      Citations:

      Berger, M., Mescheriakov, S., Molokanova, E., Lechner-Steinleitner, S., Seguer, N., & Kozlovskaya, I. (1997). Pointing arm movements in short- and long-term spaceflights. Aviation, Space, and Environmental Medicine, 68(9), 781โ€“787.

      Sangals, J., Heuer, H., Manzey, D., & Lorenz, B. (1999). Changed visuomotor transformations during and after prolonged microgravity. Experimental Brain Research. Experimentelle Hirnforschung. Experimentation Cerebrale, 129(3), 378โ€“390.

      (4) No learning effect

      This is a surprising effect, as mentioned by the authors. Other studies conducted in microgravity have indeed revealed an optimal adaptation of motor patterns in a few dozen trials (e.g., Gaveau et al., eLife, 2016). Perhaps the difference is again related to single-joint versus multi-joint movements. This should be better discussed given the impact of this claim. Typically, why would a "sensory bias of bodily property" persist in microgravity and be a "fundamental constraint of the sensorimotor system"?

      We believe the differences between our study and Gaveau et al.โ€™s study cannot be simply attributed to single-joint versus multi-joint movements. One of the most salient differences is that their adaptation is about incorporating microgravity in control for minimizing effort, while our adaptation is about rightfully perceiving body mass. We will elaborate on possible reasons for the lack of learning in the light of this previous study.

      We can elaborate on โ€œsensory biasโ€ and โ€œfundamental constraint of the sensorimotor systemโ€. If an inertial change is perceived (like an extra weight attached to the forearm, as in previous motor adaptation studies), people can adapt their reaching in tens of trials. In this case, sensory cues are veridical as they correctly inform about the inertial perturbation. However, in microgravity, reduced gravitational pull and proprioceptive inputs constantly inform the controller that the body mass is less than its actual magnitude. In other words, sensory cues in space are misleading for estimating body mass. The resulting sensory bias prevents the sensorimotor system from correctly adapt. Our statement was too brief in the initial submission; we will expand it in the revision.

      Reviewer #3 (Public review):

      Summary:

      The authors describe an interesting study of arm movements carried out in weightlessness after a prolonged exposure to the so-called microgravity conditions of orbital spaceflight. Subjects performed radial point-to-point motions of the fingertip on a touch pad. The authors note a reduction in movement speed in weightlessness, which they hypothesize could be due to either an overall strategy of lowering movement speed to better accommodate the instability of the body in weightlessness or an underestimation of body mass. They conclude for the latter, mainly based on two effects. One, slowing in weightlessness is greater for movement directions with higher effective mass at the end effector of the arm. Two, they present evidence for an increased number of corrective submovements in weightlessness. They contend that this provides conclusive evidence to accept the hypothesis of an underestimation of body mass.

      Strengths:

      In my opinion, the study provides a valuable contribution, the theoretical aspects are well presented through simulations, the statistical analyses are meticulous, the applicable literature is comprehensively considered and cited, and the manuscript is well written.

      Weaknesses:

      Nevertheless, I am of the opinion that the interpretation of the observations leaves room for other possible explanations of the observed phenomenon, thus weakening the strength of the arguments.

      First, I would like to point out an apparent (at least to me) divergence between the predictions and the observed data. Figures 1 and S1 show that the difference between predicted values for the 3 movement directions is almost linear, with predictions for 90ยบ midway between predictions for 45ยบ and 135ยบ. The effective mass at 90ยบ appears to be much closer to that of 45ยบ than to that of 135ยบ (Figure S1A). But the data shown in Figure 2 and Figure 3 indicate that movements at 90ยบ and 135ยบ are grouped together in terms of reaction time, movement duration, and peak acceleration, while both differ significantly from those values for movements at 45ยบ.

      Furthermore, in Figure 4, the change in peak acceleration time and relative time to peak acceleration between 1g and 0g appears to be greater for 90ยบ than for 135ยบ, which appears to me to be at least superficially in contradiction with the predictions from Figure S1. If the effective mass is the key parameter, wouldn't one expect as much difference between 90ยบ and 135ยบ as between 90ยบ and 45ยบ? It is true that peak speed (Figure 3B) and peak speed time (Figure 4B) appear to follow the ordering according to effective mass, but is there a mathematical explanation as to why the ordering is respected for velocity but not acceleration? These inconsistencies weaken the author's conclusions and should be addressed.

      Indeed, the model predicts an almost equal separation between 45ยฐ and 90ยฐ and between 90ยฐ and 135ยฐ, while the data indicate that the spacing between 45ยฐ and 90ยฐ is much smaller than between 90ยฐ and 135ยฐ. We do not regard the divergence as evidence undermining our main conclusion since 1) the model is a simplification of the actual situation. For example, the model simulates an ideal case of moving a point mass (effective mass) without friction and without considering Coriolis and centripetal torques. 2) Our study does not make quantitative predictions of all the key kinematic measures; that will require model fitting and parameter estimation; instead, our study uses well-established (though simplified) models to qualitatively predict the overall behavioral pattern we would observe. For this purpose, our results are well in line with our expectations: though we did not find equal spacing between direction conditions, we do confirm that the key kinematic properties (Figure 2 and Figure 3 as questioned) follow the same ranking order of directions as predicted.

      We thank the reviewer for pointing out the apparent discrepancy between model simulation and observed data. We will elaborate on the reasons behind the discrepancy in the revision.

      Then, to strengthen the conclusions, I feel that the following points would need to be addressed:

      (1) The authors model the movement control through equations that derive the input control variable in terms of the force acting on the hand and treat the arm as a second-order low-pass filter (Equation 13). Underestimation of the mass in the computation of a feedforward command would lead to a lower-than-expected displacement to that command. But it is not clear if and how the authors account for a potential modification of the time constants of the 2nd order system. The CNS does not effectuate movements with pure torque generators. Muscles have elastic properties that depend on their tonic excitation level, reflex feedback, and other parameters. Indeed, Fisk et al.* showed variations of movement characteristics consistent with lower muscle tone, lower bandwidth, and lower damping ratio in 0g compared to 1g. Could the variations in the response to the initial feedforward command be explained by a misrepresentation of the limbs' damping and natural frequency, leading to greater uncertainty about the consequences of the initial command? This would still be an argument for unadapted feedforward control of the movement, leading to the need for more corrective movements. But it would not necessarily reflect an underestimation of body mass.

      *Fisk, J. O. H. N., Lackner, J. R., & DiZio, P. A. U. L. (1993). Gravitoinertial force level influences arm movement control. Journal of neurophysiology, 69(2), 504-511.

      We agree that muscle properties, tonic excitation level, proprioception-mediated reflexes all contribute to reaching control. Fisk et al. (1993) study indeed showed that arm movement kinematics change, possibly owing to lower muscle tone and/or damping. However, reduced muscle damping and reduced spindle activity are more likely to affect feedback-based movements. Like in Fisk et al.โ€™s study, people performed continuous arm movements with eyes closed; thus their movements largely relied on proprioceptive control. Our major findings are about the feedforward control, i.e., the reduced and โ€œadvancedโ€ peak velocity/acceleration in discrete and ballistic reaching movements. Note that the peak acceleration happens as early as approximately 90-100ms into the movements, clearly showing that feedforward control is affected -- a different effect from Fisk et alโ€™s findings. It is unlikely that people โ€œadvancedโ€ their peak velocity/acceleration because they feel the need for more later corrective movements. Thus, underestimation of body mass remains the most plausible explanation.

      (2) The movements were measured by having the subjects slide their finger on the surface of a touch screen. In weightlessness, the implications of this contact are expected to be quite different than those on the ground. In weightlessness, the taikonauts would need to actively press downward to maintain contact with the screen, while on Earth, gravity will do the work. The tangential forces that resist movement due to friction might therefore be different in 0g. This could be particularly relevant given that the effect of friction would interact with the limb in a direction-dependent fashion, given the anisotropy of the equivalent mass at the fingertip evoked by the authors. Is there some way to discount or control for these potential effects?

      We agree that friction might play a role here, but normal interaction with a touch screen typically involves friction between 0.1 and 0.5N (e.g., Ayyildiz et al., 2018). We believe that the directional variation is even smaller than 0.1N. It is very small compared to the force used to accelerate the arm for the reaching movement (10-15N). Thus, friction anisotropy is unlikely to explain our data.

      Citation: Ayyildiz M, Scaraggi M, Sirin O, Basdogan C, Persson BNJ. Contact mechanics between the human finger and a touchscreen under electroadhesion. Proc Natl Acad Sci U S A. 2018 Dec 11;115(50):12668-12673.

      (3) The carefully crafted modelling of the limb neglects, nevertheless, the potential instability of the base of the arm. While the taikonauts were able to use their left arm to stabilize their bodies, it is not clear to what extent active stabilization with the contralateral limb can reproduce the stability of the human body seated in a chair in Earth gravity. Unintended motion of the shoulder could account for a smaller-than-expected displacement of the hand in response to the initial feedforward command and/or greater propensity for errors (with a greater need for corrective submovements) in 0g. The direction of movement with respect to the anchoring point could lead to the dependence of the observed effects on movement direction. Could this be tested in some way, e.g., by testing subjects on the ground while standing on an unstable base of support or sitting on a swing, with the same requirement to stabilize the torso using the contralateral arm?

      Body stabilization is always a challenge for human movement studies in space. We minimized its potential confounding effects by using left-hand grasping and foot straps for postural support throughout the experiment. We would argue shoulder stability is an unlikely explanation because unexpected shoulder instability should not affect the feedforward (early) part of the ballistic reaching movement: the reduced peak acceleration and its early peak were observed at about 90-100ms after movement initiation. This effect is too early to be explained by an expected stability issue.

      The arguments for an underestimation of body mass would be strengthened if the authors could address these points in some way.

    1. eLife Assessment

      The authors proposed two hypotheses: first, that methamphetamine induces neuroinflammation, and second, that it alters neuronal stem cell differentiation. These are valuable hypotheses, and the authors provided in vivo observations of the methamphetamine response in mice. However, concerns remain regarding the interpretation of the data, and the current evidence is incomplete, requiring substantial experimental validation.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript focuses on single-cell RNA sequencing (scRNA-seq) analysis following chronic methamphetamine (METH) treatment in mice. The authors propose two hypotheses:

      (1) METH induces neuroinflammation involving T and NKT cells, and (2) METH alters neuronal stem cell differentiation.

      Strengths:

      The authors provide a substantial dataset with numerous replicates, offering valuable resources to the research community.

      Weaknesses:

      Concerns persist regarding the interpretation of data and the validation of experiments. First, the presence of T cells, NKT cells, and neutrophils in both the control and METH-treated hippocampi suggests that blood contamination rather than immune cell infiltration is the cause. Since the authors claim that METH disrupts the blood-brain barrier, increasing the infiltration of these immune cells, identifying the source of these immune cells is critical.

      Secondly, the pseudotime analysis, which suggests altered neural stem cell (NSC) differentiation, is not conclusively supported by the current data and requires further validation.

      Overall, the authors provided comprehensive in vivo data on the impact of methamphetamine on the hippocampus; however, further in vivo and in vitro experimental validation of the key findings is needed.

    3. Reviewer #2 (Public review):

      Summary:

      Chronic methamphetamine (METH) abuse leads to significant structural and functional deficits in the cortical and hippocampal regions in humans. However, the specific mechanisms underlying chronic METH-induced neurotoxicity in the hippocampus and its contribution to cognitive deficits remain poorly understood. The authors aim to address this knowledge gap using a single-cell transcriptomic atlas of the hippocampus under chronic METH exposure in mice. They present analyses of differential gene expression, cell-cell communication, pseudotemporal trajectories, and transcription factor regulation to characterize the cellular-level impact of METH abuse. However, the overall quality of the manuscript is currently very poor due to a lack of basic quality control, overly descriptive content, and unclear conclusions.

      Strengths:

      The major strength of this study is that it may represent the first report on the impact of METH on the hippocampus in mice. However, the authors should clarify whether similar studies have been previously conducted, as this point remains uncertain.

      Weaknesses:

      Despite this potential novelty, the study has numerous weaknesses. Notably, single-cell RNA sequencing was unable to capture an adequate number of neuronal populations. Neurons accounted for only approximately 0.6% of the total nuclei, representing a significant underrepresentation compared to their actual physiological proportion. Given that the behavioral effects of METH are likely mediated by neuronal dysfunction, readers would reasonably expect to see transcriptional changes in neurons. The authors should explain why they were unable to capture a sufficient number of neurons and justify how this incomplete dataset can still provide meaningful scientific insights for researchers studying METH-induced hippocampal damage and behavioral alterations.

      Another significant weakness of this study is the lack of a cohesive hypothesis or overarching conclusion regarding how METH impacts neural populations. The authors provide a largely descriptive account of transcriptional alterations across various cell types, but the manuscript lacks clear, biologically meaningful conclusions. This descriptive approach makes it difficult for readers to identify the key findings or take-home messages. To improve clarity and impact, the authors should focus on developing and presenting a few plausible hypotheses or mechanistic scenarios regarding METH-induced neurotoxicity, grounded in their scRNA-seq data. Including schematic figures to illustrate these hypotheses would also help readers better understand and interpret the study.

      The final major weakness of this study is its poor readability. It appears that the authors did not adequately proofread the manuscript, as there are numerous typographical errors (e.g., line 333: trisulting; line 756: essencial), unsupported scientific claims lacking citations (e.g., lines 485, 503, 749-753), and grammatically incorrect sentences (e.g., lines 470-472, 540-543, 749-753). In addition, many paragraphs are unorganized and overly descriptive, which further hinders clarity. Some figures are also problematic - too small in size and overcrowded with text in fonts that are difficult to read. It is recommended that the authors carry out quality control. There are too many typographical and grammatical errors to list individually; the authors should carefully review and revise the entire manuscript to address all of these issues.

      Overall, this study could have offered some incremental new insights into neurotoxicity following chronic METH exposure, despite the poor capture of neuronal populations. However, the current manuscript feels more like a data dump than a thoughtfully constructed scientific narrative. I encourage the authors to extract and highlight meaningful biological insights from their dataset and clearly articulate these in the conclusion, ideally supported by an additional schematic figure. Furthermore, I strongly urge the authors to substantially improve the basic quality of the manuscript through careful proofreading and by seeking feedback from colleagues or other readers.

    4. Reviewer #3 (Public review):

      Summary:

      This study aimed to elucidate the intricate mechanisms underlying cognitive decline induced by chronic METH abuse, focusing on the hippocampus at a single-cell resolution. The authors established a robust mouse model of chronic METH exposure. They observed significant impairments in working memory, spatial cognition, learning, and cognitive memory through Y-maze and novel object recognition tests. To gain deeper insights into the cellular and molecular changes, they utilized single-cell RNA sequencing to profile hippocampal cells. They performed extensive bioinformatics analyses, including cell clustering, differential gene expression, cellular communication, pseudotemporal trajectory, and transcription factor regulation.

      Strengths:

      (1) The authors performed a comprehensive suite of bioinformatics analyses, including differential gene expression, cellular cross-talk, pseudotime trajectory, and SCENIC analysis, which enable a multifaceted exploration of METH-induced changes at both the cellular and molecular levels.

      (2) The study demonstrates an awareness of the potential influence of circadian rhythms, dedicating a specific section in the discussion to the disruption of circadian rhythms, which has rarely been mentioned in previous studies on METH. They highlight the frequent occurrence of circadian regulation in their analysis across several cell types.

      (3) The pseudotime analysis provides valuable insights into hindered neurogenesis, showing a shift in NSC differentiation toward astrocytes rather than neuroblasts in METH-treated mice. The detailed analysis of BBB components (endothelial cells, mural cells, SMCs) and their heterogeneous responses to METH is also a significant contribution.

      Weaknesses:

      (1) While the bioinformatics analyses are extensive, the study is primarily descriptive at the molecular level. The absence of experimental validation, such as targeted mRNA/protein quantification and gene knockdown/overexpression to confirm the causal relationship between these identified genes and METH-induced cognitive deficits, is a notable limitation.

      (2) While the discussion extensively covers the functional implications of specific molecular pathways and cell types, it would greatly benefit from a comparison of these findings with existing RNA sequencing data from other METH models in hippocampal tissue.

      (3) The conclusion that "prolonged METH use may progressively impair cognitive function" may not be uniformly supported by the behavioral data: Figures 1C and F (discrimination and preference indexes) exhibited that the 4-week test further declined in the METH group compared to the 2-week. In contrast, Figure 1E and H present a contradictory pattern.

    1. eLife Assessment

      This valuable study investigates the neural basis of bidirectional communication between the cortex and hippocampus during learning. The evidence supporting the identification of specific circuits and functional cell types involved is convincing. However, certain aspects of the behavioral analysis and statistical interpretation remain incomplete. Overall, the work will be of interest to neuroscientists studying learning and memory.

    2. Reviewer #1 (Public review):

      Summary:

      This work by Hall et al provides a novel and important new finding about communication between the anterior cingulate cortex (ACC) and the CA1 region of the dorsal hippocampus: there is a clear ability of ACC to predict CA1 activity, and that is modulated by learning/experience. Furthermore, they have some evidence that the modulation differs by whether the CA1 neurons were in the deep versus superficial sub-layer of CA1. The evidence is suggestive of new and exciting findings, but some gaps and weaknesses remain to be addressed before I believe all of the authors' claims can be supported. The figures also need to be slightly better organized, and the discussion is missing a major dimension in my opinion. Overall, this is a strong submission, but with some gaps to fill.

      Strengths:

      (1) This is a well-written manuscript - the introduction was especially clear, well-cited, and motivating.

      (2) The sub-layer specific communication between ACC and CA1 represents the discovery of a novel and functionally impactful piece of neurobiology.

      (3) Optogenetics was an important verification of ACC-CA1 communication, as was the analysis of neurons by waveform type.

      Weaknesses:

      (1) Figure 2: Why are the data separated into two groups from the outset? If all data are combined, is there a general drop in prediction gain from pre to post?

      (2) 2b and 2c are important since they are complementary means to show the same thing, and it is important that they cross-validate each other, especially since the non-significant task active neuron difference in 2b appears to be nearly as strong as the significant difference to its left. A more holistic analysis can be done to compare these dimensions.

      (3) Sup vs deep neuron definition: Did the authors have any means to validate this anatomical separation using histology or otherwise? I don't believe they described anything like that, and instead use physiology to infer anatomical location. I understand anatomy-based methods may be practically impossible with tetrodes, but this limitation should at least be mentioned, and it should be explained that without something like silicon probes or histological validation, anatomy had to be inferred from physiology.

      (4) Superficial vs deep differences in firing rate ratio based on PG: there are many fewer CAdeep neurons, but in 4c, the trends appear to be the same pre-training, top PG lower than others. It seems the lack of difference in CA1deep in 4c may be due to the much lower power/n. This should be discussed or addressed.

      (5) In Figure 5, the term "firing rate ratio" is used, and it sounds the same as in previous figures, but this is a different ratio (based on modulation by opto stim, not task).

      (6) I would like to learn more about these v-type neurons. I understand we do not yet know about their molecular or morphologic correlate, but more analysis can be done with the current data.

      (7) I would like more discussion of ACC-CA1 connectivity.

      (8) Some elements may be missing from the discussion, relating baseline functioning versus post-learning function.

    3. Reviewer #2 (Public review):

      Summary:

      This study uncovers an inhibitory pathway from the anterior cingulate cortex (ACC) to pyramidal cells in the superficial sublayer of hippocampal area CA1 (CA1sup). As ACC neuron spiking tends to precede hippocampal ripples, this presents the intriguing possibility that ACC inputs are selectively inhibiting particular CA1sup neurons, which could play a role in the reactivation of task-related ensembles known to take place during hippocampal ripples. Indeed, through a generalized linear model (GLM) analysis, the authors demonstrate that the ACC activity within the 200ms immediately preceding the ripple is predictive of the ripple content.

      Strengths:

      The biggest strength of the work is the optogenetic manipulation experiments, which convincingly demonstrate that stimulation of ACC pyramidal neurons activates an interneuron population with symmetric spike waveforms, and inhibits parvalbumin interneurons and pyramidal cells in CA1sup but not CA1deep sublayer.

      An additional strength in the GLM analysis which consistently shows that ACC activity preceding the ripple is predictive of hippocampal activity during the ripple considerably more than in shuffled data for all cells and periods tested.

      Weaknesses:

      The major weakness of this work is that the link with learning and memory is not very well supported.

      The only evidence of rebalancing and reorganization appears to be a single statistical test (the test in Figure 1f, p=0.013) demonstrating a decrease of the GLM prediction gain from pre-task sleep to post-task sleep; the same test is repeated for subsets of the data in the rest of the figures. As the idea of rebalancing and reorganization is central to the paper as currently written, exploring it through another measure, independent of the GLM prediction gain, should be expected. The notion that this pathway is suppressed in sleep following learning can be supported by demonstrating a decrease in any of the following measures: ACC spike-triggered average CA1sup responses, cross-covariances (Wierzynski et al 2009) between ACC and CA1sup cells in post-task sleep, or ripple-triggered cross-correlations (Sirota et al. 2009).

      The differences between task-active and task-inactive neurons are not convincing. The separation between task-active and task-inactive neurons is to divide a distribution that is far from bimodal into what appears to be two arbitrary groups. Similarly, the authors divide cells relative to their prediction gain ("Top PG" and "Bottom PG" in Figure 2c), which fails to select for the population of significantly predicted cells (relative to the shuffle). Within CA1sup cells, after learning, there is a significant decrease in the prediction gain for "task-inactive" cells but not "task-active" cells, but it is important to keep in mind that the "task-active" group contains only 24 neurons, and there was no difference between the two groups of cells ("task-active" vs "task-inactive") when directly compared.

      Finally, it is not clear whether the identity of the pathway-responsive CA1sup neurons is fixed or whether it may change with learning. A deeper analysis into the cell pair cross-correlations or the weights of the GLM analysis may reveal whether there is a reorganization of CA1sup responses (some cells that were inhibited are no longer inhibited, and vice versa) or a dampening (the same CA1sup cells are inhibited in both cases, but the inhibition is less-pronounced in post-task sleep). The possibility of a rigid circuit dampened immediately following fear conditioning, is not discussed by the authors.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, Hall and colleagues investigate how the coupling of activity from ACC to CA1is altered by fear learning, showing that during sleep immediately before learning, there is evidence for increased coupling of ACC activity with neurons that will subsequently be inhibited during the learning process. They go on to show that this effect seems to be mediated most by a subpopulation of neurons in the superficial layer of CA1. This fits with previous reports suggesting that these superficial neurons are key for the flexible updating of memory. The authors then go on to show that artificial activation of ACC using optogenetics results in varied effects in CA1, including a subtle decrease in activity of superficial neurons that lasts longer than the stimulus itself. Finally, the authors present some preliminary data suggesting that different interneurons may be recruited by this optogenetic stimulation in different ways and at different times.

      Overall, this is an interesting paper, but much of the analysis is very preliminary, and much of the crucial data about the learning effects and alterations to cell firing are not presented clearly and fully. This is further confounded by a rather opaque description of the results and analysis in the text. Overall, there is something very interesting here, but there needs to be a substantial series of extra analyses to clearly say what this is. In many cases, more robust analysis may render the results underpowered, which could dramatically change the conclusions of the paper.

      Strengths:

      The authors performed difficult, dual-location recordings across a multi-day learning paradigm, which seems like it could be a really nice dataset. They delve into the circuit basis of an interesting finding regarding ACC to CA1 connectivity and how this changes before and after fear conditioning. They provide data to suggest this connectivity may be through specific and distinct subcircuits in CA1.

      Weaknesses:

      (1) There is essentially no information in the text or figures about what the actual learning was, how it was done, how individual animals performed, and how any of these metrics related to learning. Looking at the methods, the authors did a number of things never mentioned anywhere in the text or figures, including novel arena exposure, contextual reexposure in extinction after learning, etc. It seems that this is a very rich dataset that has not been presented at all. I would recommend at the very least:<br /> a) Plot all of the behavioural training data, and how each mouse relates to one another - did the mice learn? At this stage, we don't know!<br /> b) Explain in the text in detail exactly what was done and why, and what this tells us about the neuronal activity.<br /> c) If there is variance in learning and or conditioning, does this relate to features in the analysis, such as the GLM result.

      (2) Along similar lines, a key metric for most of the paper is that neurons most coupled with ACC are more likely to be inhibited during training. However, there is nothing anywhere in the paper showing these data. How do neurons in general respond to contextual shocks? The methods describe this as the average firing rate during training, normalised to pre-sleep activity. This metric seems a bit coarse and may obscure really important task-relevant dynamics. Are the neurons active at specific times, are they tuned to relevant parts of the task, and do any of these features of the cell activity also relate to the coupling with ACC? Similarly, how did the authors mitigate the influence of electrical artefacts caused by the foot shock in their recordings? Again, there is a huge amount of data here that is not being described, and likely holds very valuable information about what is actually happening. The paper would really benefit from the inclusion of these data in an accessible form, such as heatmaps of spiking, how these patterns change over time, and around e.g., foot shock, etc. Also key is how these features are altered by the variability of learning across subjects.

      (3) A number of the effects are presented by comparing a statistically significant effect to a non-statistically significant effect (e.g. in Figure 2b, Figure 2d, Figure 4 b,c, and others). This isn't really valid - the key test that the two groups are different is either with a direct test of the difference or an interaction term in an e.g., ANOVA test. In some places, I am not sure the same conclusions will be drawn from the data with these tests.

      (4) To what extent is defining superficial and deep CA1 neurons solely by ripple waveform an accepted method? Of the two papers referenced for this approach, one is a 2-photon calcium imaging paper that does not do electrical recordings (as far as I am aware), and the second uses this as a descriptor after defining the positions of units on an array. It would be good to clarify how accepted this is, and also how robust this is. At the very least, some kind of metric or walkthrough in the supplement as to how this was done, and how well each cell was classified and with what confidence, or some metric of how distinct and separate the two populations were (or was it just a smudge).

      (5) In the optogenetic experiment in Figure 5, the effect on the CA1 sup neurons seems to be driven by changes in a small subpopulation of this group, with no change in the others. Related to point 2, is there anything else in the data that can pull out what these cells are? More detailed analysis of the firing of these neurons might pull out something really interesting.

      (6) Related to this - a number of comparisons simply pool neurons across mice and analyse them as if independent. This is done a lot in the past, but it would be better if an approach that included the interdependence of neurons recorded from the same mouse at the same time were used (such as a hierarchical model). While this is complex, a simpler approach would just be to plot the summary data also per mouse. For example, in Figure 5, how do the neurons inhibited by ACC activation spread across the different mice? Is the level of inhibition related to how well the mice learned the CS-US association?

      (7) Figure 6 is interesting, but very preliminary. None of the effects are quantified, and one of the cell types is not identified. I think some proper analysis needs to be done, again across mice, to be able to draw conclusions from these data.

      (8) Finally, in general, I felt that the way the paper was written was very hard to follow, often relying on very processed levels of analysis that were hard to relate back to the raw traces and their biological meaning. In general taking more words to really simply and fully explain each analysis, and taking the words and figures to walk through how each analysis was done and what it tells us about the neuronal data/biology would be really beneficial, especially to someone who is not an extracellular electrophysiologist or immersed in the immediate field.

      In summary, while this manuscript explores an intriguing hypothesis about pre-learning circuit dynamics, it is currently held back by insufficient clarity in behavioural analysis, data presentation, and statistical quantification. Addressing these core issues would greatly improve interpretability and confidence in the findings.

    5. Author response:

      We would like to thank the reviewers and the editorial team for all their thoughtful and constructive feedback. The reviewers provided many helpful comments which we will work to incorporate in our resubmission as we believe they will significantly enhance the quality of our manuscript.

      An overarching critique shared among reviewers was regarding limitations in our datasets. Namely, lower N-values for certain groups make some conclusions less reliable. We acknowledge this limitation and will add more experiments to address this concern. Additionally, attention was drawn to our reliance on using the generalized linear model (GLM) for making claims about rebalancing and learning-related changes. To address this, we will work to include additional analyses such as ACC spike-triggered average CA1sup responses, cross-covariances between ACC and CA1sup cells in post-task sleep, and ripple-triggered cross-correlations, among others as per reviewer recommendations. We will also provide a deeper analysis of the weights CA1 neuron in our GLM analysis and their specific features during learning. In accordance, we will provide a clearer description of our learning paradigm including performance data for each animal and how performance relates to our analyses. Overall, we will include more analyses of our datasets across various task events such as recall, to make more efficient use of the full repertoire of our recordings.

      Concerns were also raised regarding some aspects of our statistical analyses. During revision, we will ensure we select the most appropriate statistical measure for each of our tests. Our paper implements the use of tetrode recordings to assess sublayer identification. This approach comes with limitations, and in our resubmission, we will provide a more detailed explanation of those limitations along with a more thorough description of our measures to mitigate them.

      Lastly, in our follow-up submission we will work to improve the written clarity of findings. Specifically, we will simplify and better explain our findings and provide clearer justification for our interpretations and choice of analyses.

    1. eLife Assessment

      This revised paper provides valuable findings that altruistic tendency during moral decision-making is gain/loss context-dependent and oxytocin can restore the absence of altruistic choices in the loss domain. The methods and analyses are solid, yet the study could still benefit from better overall framing and more clarity and precision in the definition of key constructs, as pointed out by reviewers. If these concerns are addressed, this study would be of interest to social scientists and neuroscientists who work on moral decision-making and oxytocin.

    2. Reviewer #1 (Public review):

      Summary:

      Zhang et al. addressed the question of whether hyperaltruistic preference is modulated by decision context and tested how oxytocin (OXT) may modulate this process. Using an adapted version of a previously well-established moral decision-making task, healthy human participants in this study undergo decisions that gain more (or lose less, termed as context) meanwhile inducing more painful shocks to either themselves or another person (recipient). The alternative choice is always less gain (or more loss) meanwhile less pain. Through a series of regression analyses, the authors reported that hyperaltruistic preference can only be found in the gain context but not in the loss context, however, OXT reestablished the hyperaltruistic preference in the loss context similar to that in the gain context.

      Strengths:

      This is a solid study that directly adapted a previously well-established task and the analytical pipeline to assess hyperaltruistic preference in separate decision contexts. Context-dependent decisions have gained more and more attention in literature in recent years, hence this study is timely. It also links individual traits (via questionnaires) with task performance, to test potential individual differences. The OXT study is done with great methodological rigor, including pre-registration. Both studies have proper power analysis to determine the sample size.

      Weaknesses:

      Despite the strengths, multiple analytical decisions have to be explained, justified, or clarified. Also, there is scope to enhance the clarity and coherence of the writing - as it stands, readers will have to go back and forth to search for information. Last, it would be helpful to add line numbers in the manuscript during the revision, as this will help all reviewers to locate the parts we are talking about.

      Introduction:<br /> (1) The introduction is somewhat unmotivated, with key terms/concepts left unexplained until relatively late in the manuscript. One of the main focuses in this work is "hyperaltruistic", but how is this defined? It seems that the authors take the meaning of "willing to pay more to reduce other's pain than their own pain", but is this what the task is measuring? Did participants ever need to PAY something to reduce the other's pain? Note that some previous studies indeed allow participants to pay something to reduce other's pain. And what makes it "HYPER-altruistic" rather than simply "altruistic"? Plus, in the intro, the authors mentioned that the "boundary conditions" remain unexplored, but this idea is never touched again. What do boundary conditions mean here in this task? How do the results/data help with finding out the boundary conditions? Can this be discussed within wider literature in the Discussion section? Last, what motivated the authors to examine decision context? It comes somewhat out of the blue that the opening paragraph states that "We set out to [...] decision context", but why? Are there other important factors? Why decision context is more important than studying those others?

      Experimental design:<br /> (2) The experiment per se is largely solid, as it followed a previously well-established protocol. But I am curious about how the participants got instructed? Did the experimenter ever mention the word "help" or "harm" to the participants? It would be helpful to include the exact instructions in the SI.

      (3) Relatedly, the experimental details were not quite comprehensive in the main text. Indeed, Methods come after the main text, but to be able to guide readers to understand what was going on, it would be very helpful if the authors could include some necessary experimental details at the beginning of the Results section.

      Statistical analysis<br /> (3) One of the main analyses uses the harm aversion model (Eq1) and the results section keeps referring to one of the key parameters of it (ie, k). However, it is difficult to understand the text without going to the Methods section below. Hence it would be very helpful to repeat the equation also in the main text. A similar idea goes to the delta_m and delta_s terms - it will be very helpful to give a clear meaning of them, as nearly all analyses rely on knowing what they mean.

      (4) There is one additional parameter gamma (choice consistency) in the model. Did the authors also examine the task-related difference of gamma? This might be important as some studies have shown that the other-oriented choice consistency may differ in different prosocial contexts.

      (5) I am not fully convinced that the authors included two types of models: the harm aversion model and logistic regression models. Indeed, the models look similar, and the authors have acknowledged that. But I wonder if there is a way to combine them? For example:<br /> Choice ~ delta_V * context * recipient (*Oxt_v._placebo)<br /> The calculation of delta_V follows Equation 1.<br /> Or the conceptual question is, if the authors were interested in the specific and independent contribution of dalta_m and dalta_s to behavior, as their logistic model did, why the authors examine the harm aversion first, where a parameter k is controlling for the trade-off? One way to find it out is to properly run different models and run model comparison. In the end, it would be beneficial to only focus on the "winning" model to draw inferences.

      (6) The interpretation of the main OXT results needs to be more cautious. According to the operationalization, "hyperaltruistic" is the reduction of pain of others (higher % of choosing the less painful option) relative to the self. But relative to the placebo (as baseline), OXT did not increase the % of choosing the less painful option for others, rather, it decreased the % of choosing the less painful option for themselves. In other words, the degree of reducing other's pain is the same under OXT and placebo, but the degree of benefiting self-interest is reduced under OXT. I think this needs to be unpacked, and some of the wording needs to be changed. I am not very familiar with the OXT literature, but I believe it is very important to differentiate whether OXT is doing something on self-oriented actions vs other-oriented actions. Relatedly, for results such as that in Fig5A, it would be helpful to not only look at the difference, but also the actual magnitude of the sensitivity to the shocks, for self and others, under OXT and placebo.

      Comments on revisions:

      I did not change my original public review, as I think it can still be helpful for the field to see the reasoning and argument.

      For the revision, the authors have done a thorough job of addressing my previous comments and questions.

      The only aspect I would like to ask is that, it would still be great to have a clear definition of hyperaltruism. As it stands, hyperaltruism refers to "people's willingness to pay more to reduce other's pain than<br /> their own pain", ie, this means the "hyper" bit is considered with respect to "self". But shouldn't hyperaltruism be classified contrasting "normal" altruism?

      It is fine that it follows a previously published work (Crockett et al., 2014), but it would still be necessary to explain/define the construct being tested in a standalone fashion rather than letting readers to go back to the original work.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors reported two studies where they investigated the context effect of hyperaltruistic tendency in moral decision-making. They replicated the hyperaltruistic moral preference in the gain domain, where participants inflicted electric shocks to themselves or another person in exchange for monetary profits for themselves. In the loss domain, such hyperaltruistic tendency abolished. Interestingly, oxytocin administration reinstated the hyperaltruistic tendency in the loss domain. The authors also examined the correlation between individual differences in utilitarian psychology and the context effect of hyperaltruistic tendency.

      Strengths:

      (1) The research question - the boundary condition of hyperaltruistic tendency in moral decision-making and its neural basis - is theoretically important.<br /> (2) Manipulating the brain via pharmacological means offers causal understanding of the neurobiological basis of the psychological phenomenon in question.<br /> (3) Individual difference analysis reveals interesting moderators of the behavioral tendency.

      Weaknesses:

      (1) The theoretical hypothesis needs to be better justified. There are studies addressing the neurobiological mechanism of hyperaltruistic tendency, which the authors unfortunately skipped entirely.<br /> (2) There are some important inconsistencies between the preregistration and the actual data collection/analysis, which the authors did not justify.<br /> (3) Some of the exploratory analysis seems underpowered (e.g., large multiple regression models with only about 40 participants).<br /> (4) Inaccurate conceptualization of utilitarian psychology and the questionnaire used to measure it.

      Comments on revisions:

      The authors have addressed the weakness in the second round of revision

    4. Reviewer #3 (Public review):

      Summary:

      In this study, the authors aimed to index individual variation in decision-making when decisions pit the interests of the self (gains in money, potential for electric shock) against the interests of an unknown stranger in another room (potential for unknown shock). In addition, the authors conducted an additional study in which male participants were either administered intranasal oxytocin or placebo before completing the task to identify the role of oxytocin in moderating task responses. Participants' choice data was analyzed using a harm aversion model in which choices were driven by the subjective value difference between the less and more painful options.

      Strengths:

      Overall, I think this is a well-conducted, interesting, and novel set of research studies exploring decision-making that balances outcomes for the self versus a stranger, and the potential role of the hormone oxytocin (OT) in shaping theseย decisions. The pain component of the paradigm is well designed, as is the decision-making task, and overall the analyses were well suited to evaluating and interpreting the data. Advantages of the task design include the absence of deception, e.g., the use of a real study partner and real stakes, as a trial from the task was selected at random after the study and the choice the participant made were actually executed.ย 

      Weaknesses:

      The primary weakness of the paper concerns its framing. Although it purports to be measuring "hyper-altruism," which is the same term used in prior similar (although not identical) designs, I do not believe the task constitutes altruism, but rather the decision to engage, or not engage, in instrumental aggression.

      I continue to believe that when in the "other" trials the only outcome possible for the study partner is pain, and the only outcome possible for the participant is monetary gain, these trials measure decisions about instrumental aggression. That is the exact definition of instrumental aggression is: causing others harm for personal gain. Altruism is not equivalent to refraining from engaging in instrumental aggression, although some similar mechanisms may support both.ย True altruismย would be to accept shocks to the self for the other's benefit (e.g., money).ย  The interpretation of this task as assessing instrumental aggression is supported by the fact that only the Instrumental Harm subscale of the OUS was associated with outcomes in the task, but not the Impartial Benevolence subscale. By contrast, the IB subscale is the one more consistently associated with altruism (e.g,. Kahane et al 2018; Amormino at al, 2022) I believe it is important for scientific accuracy for theย paper, including the title, to be rewritten to reflect what it is testing.

      Although I recognize similar tasks have been previously characterized as "hyper-altruism" I do not believe that is sufficient justification for continuing to promulgate this descriptor without any caveats. I hope the authors will engage more seriously with the idea that this is what the task is measuring.

      Relatedly, in the introduction, I believe it would be important to discuss the non-symmetry of moral obligations related to help/harm--we have obligations not to harm strangers but no obligation to help strangers. This is another reason I do not think the term "hyper altruism" is a good description for this task--given it is typically viewed as morally obligatory not to harm strangers, choosing not to harm them is not "hyper" altruistic (and again, I do not view it as obviously altruism at all).

    5. Author response:

      The following is the authorsโ€™ response to the original reviews

      Reviewer #1:

      Despite the strengths, multiple analytical decisions have to be explained, justified, or clarified. Also, there is scope to enhance the clarity and coherence of the writing - as it stands, readers will have to go back and forth to search for information. Last, it would be helpful to add line numbers in the manuscript during the revision, as this will help all reviewers to locate the parts we are talking about.

      We thank the reviewerโ€™s suggestions have added the line numbers to the revised manuscript.

      (1) Introduction:

      The introduction is somewhat unmotivated, with key terms/concepts left unexplained until relatively late in the manuscript. One of the main focuses in this work is "hyperaltruistic", but how is this defined? It seems that the authors take the meaning of "willing to pay more to reduce other's pain than their own pain", but is this what the task is measuring? Did participants ever need to PAY something to reduce the other's pain? Note that some previous studies indeed allow participants to pay something to reduce other's pain. And what makes it "HYPER-altruistic" rather than simply "altruistic"?

      As the reviewer noted, we adopted a well-established experimental paradigm to study the context-dependent effect on hyper-altruism. Altruism refers to the fact that people take othersโ€™ welfare into account when making decisions that concern both parties. Research paradigms investigating altruistic behavior typically use a social decision task that requires participants to choose between options where their own financial interests are pitted against the welfare of others (FeldmanHall et al., 2015; Hu et al., 2021; Hutcherson et al., 2015; Teoh et al., 2020; Xiong et al., 2020). On the other hand, the hyperaltruistic tendency emphasizes subjectsโ€™ higher valuation to otherโ€™s pain than their own pain (Crockett et al., 2014, 2015, 2017; Volz et al., 2017). One example for the manifestation of hyperaltruism would be the following scenario: the subject is willing to forgo $2 to reduce othersโ€™ pain by 1 unit (social-decision task) and only willing to forgo $1 to reduce the same amount of his/her own pain (self-decision task) (Crockett et al., 2014). On the contrary, if the subjects are willing to forgo less money to reduce othersโ€™ suffering in the social decision task than in the self-decision task, then it can be claimed that no hyperaltruism is observed. Therefore, hyperaltruistic preference can only be measured by collecting subjectsโ€™ choices in both the self and social decision tasks and comparing the choices in both tasks.

      In our task, as in the studies before ours (Crockett et al., 2014, 2015, 2017; Volz et al., 2017), subjects in each trial were faced with two options with different levels of pain on others and monetary payoffs on themselves. Based on subjectsโ€™ choice data, we can infer how much subjects were willing to trade 1 unit of monetary payoff in exchange of reducing othersโ€™ pain through the regression analysis (see Figure 1 and methods for the experimental details). We have rewritten the introduction and methods sections to make this point clearer to the audience. ย 

      Plus, in the intro, the authors mentioned that the "boundary conditions" remain unexplored, but this idea is never touched again. What do boundary conditions mean here in this task? How do the results/data help with finding out the boundary conditions? Can this be discussed within wider literature in the Discussion section?

      Boundary conditions here specifically refer to the variables or decision contexts that determine whether hyperaltruistic behavior can be elicited. Individual personality trait, motivation and social relationship may all be boundary conditions affecting the emergence of hyperaltruistic behavior. In our task, we specifically focused on the valence of the decision context (gain vs. loss) since previous studies only tested the hyperaltruistic preference in the gain context and the introduction of the loss context might bias subjectsโ€™ hyperaltruistic behavior through implicit moral framing.

      We have explained the boundary conditions in the revised introduction (Lines 45 ~ 49).

      โ€œHowever, moral norm is also context dependent: vandalism is clearly against social and moral norms yet vandalism for self-defense is more likely to be ethically and legally justified (the Doctrine of necessity). Therefore, a crucial step is to understand the boundary conditions for hyperaltruism.โ€

      Last, what motivated the authors to examine the decision context? It comes somewhat out of the blue that the opening paragraph states that "We set out to [...] decision context", but why? Are there other important factors? Why decision context is more important than studying those others?

      We thank the reviewer for the comment. The hyperaltruistic preference was originally demonstrated between conditions where subjectsโ€™ personal monetary gain was pitted against othersโ€™ pain (social-condition) or against subjectsโ€™ own suffering (self-condition) (Crockett et al., 2014). Follow up studies found that subjects also exhibited strong egoistic tendencies if instead subjects needed to harm themselves for otherโ€™s benefit in the social condition (by flipping the recipients of monetary gain and electric shocks) (Volz et al., 2017). However, these studies have primarily focused on the gain contexts, neglecting the fact that valence could also be an influential factor in biasing subjectsโ€™ behavior (difference between gain and loss processing in humans). It is likely that replacing monetary gains with losses in the money-pain trade-off task might bias subjectsโ€™ hyperaltruistic preference due to heightened vigilance or negative emotions in the face of potential loss (such as loss aversion) (Kahneman & Tversky, 1979; Liu et al., 2020; Pachur et al., 2018; Tom et al., 2007; Usher & McClelland, 2004; Yechiam & Hochman, 2013). Another possibility is that gain and loss contexts may elicit different subjective moral perceptions (or internal moral framings) in participants, affecting their hyperaltruistic preferences (Liu et al., 2017; Losecaat Vermeer et al., 2020; Markiewicz & Czupryna, 2018; Wu et al., 2018). In our manuscript, we did not strive to compare which factors might be more important in eliciting hyperaltruistic behavior, but rather to demonstrate the crucial role played by the decision context and to show that the internal moral framing could be the mediating factor in driving subjectsโ€™ hyperaltruistic behavior. In fact, we speculate that the egoistic tendencies found in the Volz et al. 2017 study was partly driven by the subjectsโ€™ failure to engage the proper internal moral framing in the social condition (harm for self, see Volz et al., 2017 for details).

      (2) Experimental Design:

      (2a) The experiment per se is largely solid, as it followed a previously well-established protocol. But I am curious about how the participants got instructed? Did the experimenter ever mention the word "help" or "harm" to the participants? It would be helpful to include the exact instructions in the SI.

      In the instructions, we avoided words such as โ€œharmโ€, โ€œhelpโ€, or other terms reminding subjects about the moral judgement of the decisions they were about to make. Instead, we presented the options in a neutral and descriptive manner, focusing only on the relevant components (shocks and money). The instructions for all four conditions are shown in supplementary Fig. 9.

      (2b) Relatedly, the experimental details were not quite comprehensive in the main text. Indeed, the Methods come after the main text, but to be able to guide readers to understand what was going on, it would be very helpful if the authors could include some necessary experimental details at the beginning of the Results section.

      We thank the reviewerโ€™s suggestion. We have now provided a brief introduction of the experimental details in the revised results section (Lines 125 ~132).

      โ€œPrior to the money-pain trade-off task, we individually calibrated each subjectโ€™s pain threshold using a standard procedure[4โ€“6]. This allowed us to tailor a moderate electric stimulus that corresponded to each subjectโ€™s subjective pain intensity. Subjects then engaged in 240 decision trials (60 trials per condition), acting as the โ€œdeciderโ€ and trading off between monetary gains or losses for themselves and the pain experienced by either themselves or an anonymous โ€œpain receiverโ€ (gain-self, gain-other, loss-self and loss-other, see Supplementary Fig. 8 for the instructions and also see methods for details).โ€

      (3) Statistical Analysis<br /> (3a) One of the main analyses uses the harm aversion model (Eq1) and the results section keeps referring to one of the key parameters of it (ie, k). However, it is difficult to understand the text without going to the Methods section below. Hence it would be very helpful to repeat the equation also in the main text. A similar idea goes to the delta_m and delta_s terms - it will be very helpful to give a clear meaning of them, as nearly all analyses rely on knowing what they mean.

      We thank the reviewerโ€™s suggestion. We have now added the equation of the harm aversion model and provided more detailed description to the equations in the main text (Lines 150 ~155).

      โ€œWe also modeled subjectsโ€™ choices using an influential model where subjectsโ€™ behavior could be characterized by the harm (electric shock) aversion parameter ฮบ, reflecting the relative weights subjects assigned to โˆ†m and โˆ†s, the objective difference in money and shocks between the more and less painful options, respectively (โˆ†V=(1-ฮบ)โˆ†m - ฮบโˆ†s Eq.1, See Methods for details)[4โ€“6]. Higher ฮบ indicates that higher sensitivity is assigned to โˆ†s than โˆ†m and vice versa.โ€

      (3b) There is one additional parameter gamma (choice consistency) in the model. Did the authors also examine the task-related difference of gamma? This might be important as some studies have shown that the other-oriented choice consistency may differ in different prosocial contexts.

      To examine the task-related difference of choice consistency (ฮณ), we compared the performance of 4 candidate models:

      Model 1 (M1): The choice consistency parameter ฮณ remains constant across shock recipients (self vs. other) and decision contexts (gain vs. loss).

      Model 2 (M2): ฮณ differs between the self- and other-recipient conditions, with ฮณ<sub>self</sub> and ฮณ<sub>other</sub> representing the choice consistency when pain is inflicted on him/her-self or the other-recipient.

      Model 3 (M3): ฮณ differs between the gain and loss conditions, with ฮณ<sub>gain</sub> and ฮณ<sub>loss</sub> representing the choice consistencies in the gain and loss contexts, respectively.

      Model 4 (M4): ฮณ varies across four conditions, with ฮณ<sub>self-gain</sub>, ฮณ<sub>other-gain</sub>, ฮณ<sub>self-loss</sub> and ฮณ<sub>other-loss</sub> capturing the choice consistency in each condition.

      Supplementary Fig. 10 shows, after fitting all the models to subjectsโ€™ choice behavioral data, model 1 (M1) performed the best among all the four candidate models in both studies (1 & 2) with the lowest Bayesian Information Criterion (BIC). Therefore, we conclude that factors such as the shock recipients (self vs. other) and decision contexts (gain vs. loss) did not significantly influence subjectsโ€™ choice consistency and report model results using the single choice consistency parameter.

      (3c) I am not fully convinced that the authors included two types of models: the harm aversion model and the logistic regression models. Indeed, the models look similar, and the authors have acknowledged that. But I wonder if there is a way to combine them? For example:

      Choice ~ delta_V * context * recipient (*Oxt_v._placebo)

      The calculation of delta_V follows Equation 1.

      Or the conceptual question is, if the authors were interested in the specific and independent contribution of dalta_m and dalta_s to behavior, as their logistic model did, why did the authors examine the harm aversion first, where a parameter k is controlling for the trade-off? One way to find it out is to properly run different models and run model comparisons. In the end, it would be beneficial to only focus on the "winning" model to draw inferences.

      The reviewer raised an excellent point here. According to the logistic regression model, we have:

      Where P is the probability of selecting the less harmful option. Similarly, if we combine Eq.1 (โˆ†V=1-ฮบ)โˆ†m-ฮบโˆ†s) and Eq.2 ) of the harm aversion model, we have:

      If we ignore the constant term ฮฒ<sub>0</sub> from the logistic regression model, the harm aversion model is simply a reparameterization of the logistic regression model. The harm aversion model was implemented first to derive the harm aversion parameter (ฮบ), which is an parameter in the range of [0 1] to quantify how subjects value the relative contribution of ฮ”m and ฮ”s between options in their decision processes. Since previous studies used the term ฮบ<sub>other</sub>-ฮบ<sub>self</sub> to define the magnitude of hyperaltruistic preference, we adopted similar approach to compare our results with previous research under the same theoretical framework. However, in order to investigate the independent contribution of ฮ”m and ฮ”s, we will have to take ฮณ into account (we can see that the ฮฒ<sub>โˆ†m</sub> and ฮฒ<sub>โˆ†s</sub> in the logistic regression model are not necessarily correlated by nature; however, in the harm aversion model the coefficients (1-ฮบ) and ฮบ is always strictly negatively correlated (see Eq. 1). Only after multiplying ฮณ, the correlation between ฮณ(1-ฮบ) and ฮณฮบ will vary depending on the specific distribution of ฮณ and ฮบ). In summary, we followed the approach of previous research to estimate harm aversion parameter ฮบ to compare our results with previous studies and to capture the relative influence between ฮ”m and ฮ”s. When we studied the contextual effects (gain vs. loss or placebo vs. control) on subjectsโ€™ behavior, we further investigated the contextual effect on how subjects evaluated ฮ”m and ฮ”s, respectively. The two models (logistic regression model and harm aversion model) in our study are mathematically the same and are not competitive candidate models. Instead, they represent different aspects from which our data can be examined.

      We also compared the harm aversion model with and without the constant term ฮฒ<sub>0</sub> in the choice function. Adding a constant term ฮฒ<sub>0</sub> the above Equation 2 becomes:

      As the following figure shows, the hyperaltruistic parameters (ฮบ<sub>other</sub>-ฮบ<sub>self</sub>) calculated from the harm aversion model with the constant term (panels A & B) have almost identical patterns as the model without the constant term (panels C & D, i.e. Figs. 2B & 4B in the original manuscript) in both studies.

      Author response image 1.

      Figs. 2B & 4B in the original manuscript) in both studies.

      ย 

      (3d) The interpretation of the main OXT results needs to be more cautious. According to the operationalization, "hyperaltruistic" is the reduction of pain of others (higher % of choosing the less painful option) relative to the self. But relative to the placebo (as baseline), OXT did not increase the % of choosing the less painful option for others, rather, it decreased the % of choosing the less painful option for themselves. In other words, the degree of reducing other's pain is the same under OXT and placebo, but the degree of benefiting self-interest is reduced under OXT. I think this needs to be unpacked, and some of the wording needs to be changed. I am not very familiar with the OXT literature, but I believe it is very important to differentiate whether OXT is doing something on self-oriented actions vs other-oriented actions. Relatedly, for results such as that in Figure 5A, it would be helpful to not only look at the difference but also the actual magnitude of the sensitivity to the shocks, for self and others, under OXT and placebo.

      We thank the reviewer for this thoughtful comment. As the reviewer correctly pointed out, โ€œhyperaltruismโ€ can be defined as โ€œhigher % of choosing the less painful option to the others relative to the selfโ€. Closer examination of the results showed that both the degrees of reducing otherโ€™s pain as well as reducing their own pain decreased under OXT (Figure 4A). More specifically, our results do not support the claim that โ€œIn other words, the degree of reducing othersโ€™ pain is the same under OXT and placebo, but the degree of benefiting self-interest is reduced under OXT.โ€ Instead, the results show a significant reduction in the choice of less painful option under OXT treatment for both the self and other conditions (the interaction effect of OXT vs. placebo and self vs. other: F<sub>1.45</sub>= 16.812, P < 0.001, ฮท<sup>2</sup> = 0.272, simple effect OXT vs. placebo in the self- condition: F<sub>1.45</sub>=59.332, P < 0.001, ฮท<sup>2</sup> = 0.569, OXT vs. placebo in the other-condition: F<sub>1.45</sub>= 14.626, P < 0.001, ฮท<sup>2</sup> = 0.245, repeated ANOVA, see Figure 4A).

      We also performed mixed-effect logistic regression analyses where subjectsโ€™ choices were regressed against ย and ย in different valences (gain vs. loss) and recipients (self vs. other) conditions in both studies 1 & 2 (Supplementary Figs. 1 & 6). As we replot supplementary Fig. 6 and panel B (included as Supplementary Fig. 8 in the supplementary materials) in the above figure, we found a significant treatment ร— โˆ†<sub>s</sub> (differences in shock magnitude between the more and less painful options) interaction effect ฮฒ=0.136ยฑ0.029P < =0.001, 95% CI=[-0.192, -0.079]), indicating that subjectโ€™s sensitivities towards pain were indeed different between the placebo and OXT treatments for both self and other conditions. Furthermore, the significant four-way โˆ†<sub>s</sub> ร— treatment (OXT vs. Placebo) ร— context (gain vs. loss) ร— recipient (self vs. other) interaction effect (ฮฒ=0.125ยฑ0.053, P=0.018 95% CI=[0.022, 0.228]) in the regression analysis, followed by significant simple effects (In the OXT treatment: โˆ†<sub>s</sub> ร— recipient effect in the gain context: F<sub>1.45</sub>= 7.622, P < 0.008, ฮท<sup>2</sup> = 0.145; โˆ†<sub>s</sub> ร— recipient effect in the loss context: F<sub>1.45</sub>= 7.966, P 0.007, ฮท<sup>2</sup> = 0.150, suggested that under OXT treatment, participants showed a greater sensitivity toward โˆ†<sub>s</sub> (see asterisks in the OXT condition in panel B) in the other condition than the self-condition, thus restoring the hyperaltruistic behavior in loss context.

      As the reviewer suggested, OXTโ€™s effect on hyperaltruism does manifest separately on subjectsโ€™ harm sensitivities on self- and other-oriented actions. We followed the reviewerโ€™s suggestions and examined the actual magnitude of the sensitivities to shocks for both the self and other treatments (panel B in the figure above). Itโ€™s clear that the administration of OXT (compared to the Placebo treatment, panel B in the figure above) significantly reduced participantsโ€™ pain sensitivity (treatment ร— โˆ†<sub>s</sub>: ฮฒ=-0.136ยฑ0.029, P < 0.001, 95% CI=[-0.192,-0.079]), yet also restored the harm sensitivity patterns in both the gain and loss conditions. These results are included in the supplementary figures (6 & 8) as well as in the main texts.

      Recommendations:

      (1) For Figures 2A-B, it would be great to calculate the correlation separately for gain and loss, as in other figures.

      We speculate that the reviewer is referring to Figures 3A & B. Sorry that we did not present the correlations separately for the gain and loss contexts because the correlation between an individualโ€™s IH (instrumental harm), IB (impartial beneficence) and hyperaltruistic preferences was not significantly modulated by the contextual factors. The interaction effects in both Figs. 3A & B and Supplementary Fig.5 (also see Table S1& S2) are as following: Study1 valence ร— IH effect: ฮฒ=0.016ยฑ0.022, t<sub>152</sub>=0.726, P=0.469; valence ร— IB effect: ฮฒ=0.004ยฑ0.031, t<sub>152</sub>=0.115, P=0.908; Study2 placebo condition: valence ร— IH effect: ฮฒ=0.018ยฑ0.024, t<sub>84</sub>=0.030 P=0.463; valence ร— IB effect: ฮฒ=0.051ยฑ0.030, t<sub>84</sub>=1.711, P=0.702. We have added these statistics to the main text following the reviewerโ€™s suggestions.

      (2) "by randomly drawing a shock increment integer โˆ†s (from 1 to 19) such that [...] did not exceed 20 (๐‘†+ {less than or equal to} 20)." I am not sure if a random drawing following a uniform distribution can guarantee S is smaller than 20. More details are needed. Same for the monetary magnitude.

      We are sorry for the lack of clarity in the method description. As for the task design, we followed adopted the original design from previous literature (Crockett et al., 2014, 2017). More specifically:

      โ€œSpecifically, each trial was determined by a combination of the differences of shocks (ฮ”s, ranging from 1 to 19, with increment of 1) and money (ฮ”m, ranging from ยฅ0.2 to ยฅ19.8, with increment of ยฅ0.2) between the two options, resulting in a total of 19ร—99=1881 pairs of [ฮ”s, ฮ”m]. for each trial. To ensure the trials were suitable for most subjects, we evenly distributed the desired ratio ฮ”m / (ฮ”s + ฮ”m) between 0.01 and 0.99 across 60 trials for each condition. For each trial, we selected the closest [ฮ”s, ฮ”m] pair from the [ฮ”s, ฮ”m] pool to the specific ฮ”m / (ฮ”s + ฮ”m) ratio, which was then used to determine the actual money and shock amounts of two options. The shock amount (S<sub>less</sub>) for the less painful option was an integer drawn from the discrete uniform distribution [1-19], constraint by S<sub>less</sub> + โˆ†s < 20. Similarly, the money amount (M<sub>less</sub>) for the less painful option was drawn from a discrete uniform distribution [ยฅ0.2 - ยฅ19.8], with the constraint of M<sub>less</sub> + โˆ†m < 20. Once the S<sub>less</sub>and M<sub>less</sub> were selected, the shock (S<sub>more</sub>) and money (M<sub>more</sub>) magnitudes for the more painful option were calculated as: S<sub>more</sub> = S<sub>less</sub> + โˆ†s, M<sub>more</sub> = M<sub>less</sub> + โˆ†mโ€ ย 

      We have added these details to the methods section (Lines 520-533).

      Reviewer #2:

      (1) The theoretical hypothesis needs to be better justified. There are studies addressing the neurobiological mechanism of hyperaltruistic tendency, which the authors unfortunately skipped entirely.

      Also in recommendation #1:

      (1) In the Introduction, the authors claim that "the mechanistic account of the hyperaltruistic phenomenon remains unknown". I think this is too broad of a criticism and does not do justice to prior work that does provide some mechanistic account of this phenomenon. In particular, I was surprised that the authors did not mention at all a relevant fMRI study that investigates the neural mechanism underlying hyperaltruistic tendency (Crockett et al., 2017, Nature Neuroscience). There, the researchers found that individual differences in hyperaltruistic tendency in the same type of moral decision-making task is better explained by reduced neural responses to ill-gotten money (ฮ”m in the Other condition) in the brain reward system, rather than heightened neural responses to others' harm. Moreover, such neural response pattern is related to how an immoral choice would be judged (i.e., blamed) by the community. Since the brain reward system is consistently involved in Oxytocin's role in social cognition and decision-making (e.g., Dolen & Malenka, 2014, Biological Psychiatry), it is important to discuss the hypothesis and results of the present research in the context of this literature.

      We totally agree with the reviewer that the expression โ€œmechanistic account of the hyperaltruistic phenomenon remains unknownโ€ in our original manuscript can be misleading to the audience. Indeed, we were aware of the major findings in the field and cited all the seminal work of hyperaltruism and its related neural mechanism (Crockett et al., 2014, 2015, 2017). We have changed the texts in the introduction to better reflect this point and added further discussion as to how oxytocin might play a role:

      โ€œFor example, it was shown that the hyperaltruistic preference modulated neural representations of the profit gained from harming others via the functional connectivity between the lateral prefrontal cortex, a brain area involved in moral norm violation, and profit sensitive brain regions such as the dorsal striatum6.โ€ (Lines 41~45)

      โ€œOxytocin has been shown to play a critical role in social interactions such as maternal attachment, pair bonding, consociate attachment and aggression in a variety of animal models[42,43]. Humans are endowed with higher cognitive and affective capacities and exhibit far more complex social cognitive patterns[44]. โ€ (Lines 86~90)

      (2) There are some important inconsistencies between the preregistration and the actual data collection/analysis, which the authors did not justify.

      Also in recommendations:

      (4) It is laudable that the authors pre-registered the procedure and key analysis of the Oxytocin study and determined the sample size beforehand. However, in the preregistration, the authors claimed that they would recruit 30 participants for Experiment 1 and 60 for Experiment 2, without justification. In the paper, they described a "prior power analysis", which deviated from their preregistration. It is OK to deviate from preregistration, but this needs to be explicitly mentioned and addressed (why the deviation occurred, why the reported approach was justifiable, etc.).

      We sincerely appreciate the reviewerโ€™s thorough assessment of our manuscript. In the more exploratory study 1, we found that the loss decision context effectively diminished subjectsโ€™ hyperaltruistic preference. Based on this finding, we pre-registered study 2 and hypothesized that: 1) The administration of OXT may salvage subjectโ€™s hyperaltruistic preference in the loss context; 2) The administration of OXT may reduce subjectsโ€™ sensitivities towards electric shocks (but not necessarily their moral preference), due to the well-established results relating OXT to enhanced empathy for others (Barchi-Ferreira & Osรณrio, 2021; Radke et al., 2013) and the processing of negative stimuli(Evans et al., 2010; Kirsch et al., 2005; Wu et al., 2020); and 3) The OXT effect might be context specific, depending on the particular combination of valence (gain vs. loss) and shock recipient (self vs. other) (Abu-Akel et al., 2015; Kapetaniou et al., 2021; Ma et al., 2015).

      As our results suggested, the administration of OXT indeed restored subjectsโ€™ hyperaltruistic preference (confirming hypothesis 1, Figure 4A). Also, OXT decreased subjectsโ€™ sensitivities towards electric shocks in both the gain and loss conditions (supplementary Fig. 6 and supplementary Fig. 8), consistent with our second hypothesis. We must admit that our hypothesis 3 was rather vague, since a seminal study clearly demonstrated the context-dependent effect of OXT in human cooperation and conflict depending on the group membership of the subjects (De Dreu et al., 2010, 2020). Although our results partially validated our hypothesis 3 (supplementary Fig. 6), we did not make specific predictions as to the direction and the magnitude of the OXT effect.

      The main inconsistency is related to the sample size. When we carried out study 1, we recruited both male and female subjects. After we identified the context effect on the hyperaltruistic preference, we decided to pre-register and perform study 2 (the OXT study). We originally made a rough estimate of 60 male subjects for study 2. While conducting study 2, we also went through the literature of OXT effect on social behavior and realized that the actual subject number around 45 might be enough to detect the main effect of OXT. Therefore, we settled on the number of 46 (study 2) reported in the manuscript. Correspondingly, we increased the subject number in study 1 to the final number of 80 (40 males) to make sure the subject number is enough to detect a small-to-medium effect, as well as to have a fair comparison between study 1 and 2 (roughly equal number of male subjects). It should be noted that although we only reported all the subjects (male & female) results of study 1 in the manuscript, the main results remain very similar if we only focus on the results of male subjects in study 1 (see the figure below). We believe that these results, together with the placebo treatment group results in study 2 (male only), confirmed the validity of our original finding.

      Author response image 2.

      Author response image 3.

      We have included additional texts (Lines 447 ~ 452) in the Methods section for the discrepancy between the preregistered and actual sample sizes in the revised manuscript:

      โ€œIt should be noted that in preregistration we originally planned to recruit 60 male subjects for Study 2 but ended up recruiting 46 male subjects (mean age = ย years) based on the sample size reported in previous oxytocin studies[57,69]. Additionally, a power analysis suggested that the sample size > 44 should be enough to detect a small to median effect size of oxytocin (Cohenโ€™s d=0.24, ฮฑ=0.05, ฮฒ=0.8) using a 2 ร— 2 ร— 2 within-subject design[76].โ€

      (3) Some of the exploratory analysis seems underpowered (e.g., large multiple regression models with only about 40 participants).

      We thank the reviewerโ€™s comments and appreciate the concern that the sample size would be an issue affecting the results reliability in multiple regression analysis.

      In Fig. 2, the multiple regression analyses were conducted after we observed a valence-dependent effect on hyperaltruism (Fig. 2A) and the regression was constructed accordingly:

      Choice ~ โˆ†s *context*recipient + โˆ†m *context*recipient+(1+ โˆ†s *context*recipient + โˆ†s*context*recipient | subject)

      Where โˆ†s and โˆ†m indicate the shock level and monetary reward difference between the more and loss painful options, context as the monetary valence (gain vs. loss) and recipient as the identity of the shock recipient (self vs. other).

      Since we have 240 trials for each subject and a total of 80 subjects in Study 1, we believe that this is a reasonable regression analysis to perform.

      In Fig. 3, the multiple regression analyses were indeed exploratory. More specifically, we ran 3 multiple linear regressions:

      hyperaltruism~EC*context+IH*context+IB*context

      Relative harm sensitivity~ EC*context+IH*context+IB*context

      Relative money sensitivity~ EC*context+IH*context+IB*context

      Where Hyperaltruism is defined as ฮบ<sub>other</sub> - ฮบ<sub>self</sub>, Relative harm sensitivity as otherฮฒ<sub>โˆ†s</sub> - selfฮฒ<sub>โˆ†s</sub> and Relative monetary sensitivity as otherฮฒ<sub>โˆ†m</sub> - selfฮฒ<sub>โˆ†m</sub>. EC (empathic concern), IH (instrumental harm) and IB (impartial beneficence) were subjectsโ€™ scores from corresponding questionnaires.

      For the first regression, we tested whether EC, IH and IB scores were related to hyperaltruism and it should be noted that this was tested on 80 subjects (Study 1). After we identified the effect of IH on hyperaltruism, we ran the following two regressions. The reason we still included IB and EC as predictors in these two regression analyses was to remove potential confounds caused by EC and IB since previous research indicated that IB, IH and EC could be correlated (Kahane et al., 2018).

      In study 2, we performed the following regression analyses again to validate our results (Placebo treatment in study 2 should have similar results as found in study 1).

      Relative harm sensitivity~ EC*context+IH*context+IB*context

      Relative money sensitivity~ EC*context+IH*context+IB*context

      Again, we added IB and EC only to control for the nuance effects by the covariates. As indicated in Fig. 5 C-D, the placebo condition in study 2 replicated our previous findings in study 1 and OXT administration effectively removed the interaction effect between IH and valence (gain vs. loss) on subjectsโ€™ relative harm sensitivity.

      To more objectively present our data and results, we have changed the texts in the results section and pointed out that the regression analysis:

      hyperaltruism~EC*context+IH*context+IB*context

      was exploratory (Lines 186-192).

      โ€œWe tested how hyperaltruism was related to both IH and IB across decision contexts using an exploratory multiple regression analysis. Moral preference, defined as ฮบ<sub>other</sub> - ฮบ<sub>self</sub>, was negatively associated with IH (ฮฒ=-0.031ยฑ0.011, t<sub>156</sub>=-2.784, P =0.006) but not with IB (ฮฒ=0.008ยฑ0.016, t<sub>156</sub>=0.475, P=0.636) across gain and loss contexts, reflecting a general connection between moral preference and IH (Fig. 3A & B).โ€

      (4) Inaccurate conceptualization of utilitarian psychology and the questionnaire used to measure it.

      Also in recommendations:

      (2) Throughout the paper, the authors placed lots of weight on individual differences in utilitarian psychology and the Oxford Utilitarianism Scale (OUS). I am not sure this is the best individual difference measure in this context. I don't see a conceptual fit between the psychological construct that OUS reflects, and the key psychological processes underlying the behaviors in the present study. As far as I understand it, the conceptual core of utilitarian psychology that OUS captures is the maximization of greater goods. Neither the Instrumental Harm (IH) component nor the Impartial Beneficence (IB) component reflects a tradeoff between the personal interests of the decision-making agent and a moral principle. The IH component is about the endorsement of harming a smaller number of individuals for the benefit of a larger number of individuals. The IB component is about treating self, close others, and distant others equally. However, the behavioral task used in this study is neither about distributing harm between a smaller number of others and a larger number of others nor about benefiting close or distant others. The fact that IH showed some statistical association with the behavioral tendency in the present data set could be due to the conceptual overlap between IH and an individual's tendency to inflict harm (e.g., psychopathy; Table 7 in Kahane et al., 2018, which the authors cited). I urge the authors to justify more why they believe that conceptually OUS is an appropriate individual difference measure in the present study, and if so, interpret their results in a clearer and justifiable manner (taking into account the potential confound of harm tendency/psychopathy).

      We thank the reviewer for the thoughtful comment and agree that โ€œIH component is about the endorsement of harming a smaller number of individuals for the benefit of a larger number of individuals. The IB component is about treating self, close others, and distant others equallyโ€. As we mentioned in the previous response to the reviewer, we first ran an exploratory multiple linear regression analysis of hyperaltruistic preference (ฮบ<sub>other</sub> - ฮบ<sub>self</sub>) against IB and IH in study 1 based on the hypothesis that the reduction of hyperaltruistic preference in the loss condition might be due to 1) subjectsโ€™ altered altitudes between IB and hyperaltruistic preference between the gain and loss conditions, and/or 2) the loss condition changed how the moral norm was perceived and therefore affected the correlation between IH and hyperaltruistic preference. As Fig. 3 shows, we did not find a significant IB effect on hyperaltruistic preference (ฮบ<sub>other</sub> - ฮบ<sub>self</sub>), nor on the relative harm or money sensitivity (supplementary Fig. 3). These results excluded the possibility that subjects with higher IB might treat self and others more equally and therefore show less hyperaltruistic preference. On the other hand, we found a strong correlation between hyperaltruistic preference and IH (Fig. 3A): subjects with higher IH scores showed less hyperaltruistic preference. Since the hyperaltruistic preference (ฮบ<sub>other</sub> - ฮบ<sub>self</sub>) is a compound variable and we further broke it down to subjectsโ€™ relative sensitivity to harm and money (other ฮฒ<sub>โˆ†s</sub> - self ฮฒ<sub>โˆ†s</sub> and other ฮฒ<sub>โˆ†m</sub> - self ฮฒ<sub>โˆ†m</sub>, respectively). The follow up regression analyses revealed that the correlation between subjectsโ€™ relative harm sensitivity and IH was altered by the decision contexts (gain vs. loss, Fig. 3C-D). These results are consistent with our hypothesis that for subjects to engage in the utilitarian calculation, they should first realize that there is a moral dilemma (harming others to make monetary gain in the gain condition). When there is less perceived moral conflict (due to the framing of decision context as avoiding loss in the loss condition), the correlation between subjectsโ€™ relative harm sensitivity and IH became insignificant (Fig. 3C). It is worth noting that these results were further replicated in the placebo condition of study 2, further indicating the role of OXT is to affect how the decision context is morally framed.

      The reviewer also raised an interesting possibility that the correlation between subjectโ€™s behavioral tendency and IH may be confounded by the fact that IH is also correlated with other traits such as psychopathy. Indeed, in the Kahane et al., 2018 paper, the authors showed that IH was associated with subclinical psychopathy in a lay population. Although we only collected and included IB and Empathic concern (EC) scores as control variables and in principle could not rule out the influence of psychopathy, we argue it is unlikely the case. First, psychopaths by definition โ€œonly care about their own goodโ€ (Kahane et al., 2018). However, subjects in our studies, as well as in previous research, showed greater aversion to harming others (compared to harming themselves) in the gain conditions. This is opposite to the prediction of psychopathy. Even in the loss condition, subjects showed similar levels of aversion to harming others (vs. harming themselves), indicating that our subjects valuated their own and othersโ€™ well-being similarly. Second, although there appears to be an association between utilitarian judgement and psychopathy(Glenn et al., 2010; Kahane et al., 2015), the fact that people also possess a form of universal or impartial beneficence in their utilitarian judgements suggest psychopathy alone is not a sufficient variable explaining subjectsโ€™ hyperaltruistic behavior.

      We have thus rewritten part of the results to clarify our rationale for using the Oxford Utilitarianism Scale (especially the IH and IB) to establish the relationship between moral traits and subjectsโ€™ decision preference (Lines 212-215):

      โ€œFurthermore, our results are consistent with the claim that profiting from inflicting pains on another person (IH) is inherently deemed immoral1. Hyperaltruistic preference, therefore, is likely to be associated with subjectsโ€™ IH dispositions.โ€

      (3) Relatedly, in the Discussion, the authors mentioned "the money-pain trade-off task, similar to the well-known trolley dilemma". I am not sure if this statement is factually accurate because the "well-known trolley dilemma" is about a disinterested third-party weighing between two moral requirements - "greatest good for the greatest number" (utilitarianism) and "do no harm" (Kantian/deontology), not between a moral requirement and one's own monetary interest (which is the focus of the present study). The analogy would be more appropriate if the task required the participants to trade off between, for example, harming one person in exchange for a charitable donation, as a recent study employed (Siegel et al., 2022, A computational account of how individuals resolve the dilemma of dirty money. Scientific reports). I urge the authors to go through their use of "utilitarian/utilitarianismโ€ in the paper and make sure their usage aligns with the definition of the concept and the philosophical implications.

      We thank the reviewer for prompting us to think over the difference between our task and the trolley dilemma. Indeed, the trolley dilemma refers to a disinterested third-partyโ€™s decision between two moral requirements, namely, the utilitarianism and deontology. In our study, when the shock recipient was โ€œotherโ€, our task could be interpreted as either the decision between โ€œmoral norm of no harm (deontology) and oneโ€™s self-interest maximization (utilitarian)โ€, or a decision between โ€œgreatest good for both parties (utilitarian) vs. do no harm (deontology)โ€, though the latter interpretation typically requires differential weighing of own benefits versus the benefits of others(Fehr & Schmidt, 1999; Saez et al., 2015). In fact, it could be argued that the utilitarianism account applies not only to the third partyโ€™s well-being, but also to our own well-being, or to โ€œthat of those near or dear to usโ€ (Kahane et al., 2018).

      We acknowledge that there may lack a direct analogy between our task and the trolley dilemma and therefore have deleted the trolley example in the discussion.

      (5) Related to the above point, the sample size of Study 2 was calculated based on the main effect of oxytocin. However, the authors also reported several regression models that seem to me more like exploratory analyses. Their sample size may not be sufficient for these analyses. The authors should: a) explicitly distinguish between their hypothesis-driven analysis and exploratory analysis; b) report achieved power of their analysis.

      We appreciate the reviewerโ€™s thorough reading of our manuscript. Following the reviewerโ€™s suggestions, we have explicitly stated in the revised manuscript which analyses were exploratory, and which were hypothesis driven. Following the reviewerโ€™s request, we added the achieved power into the main texts (Lines 274-279):

      โ€œThe effect size (Cohenโ€™s f<sup>2</sup>) for this exploratory analysis was calculated to be 0.491 and 0.379 for the placebo and oxytocin conditions, respectively. The post hoc power analysis with a significance level of ฮฑ = 0.05, 7 regressors (IH, IB, EC, decision context, IHร—context, IBร—context, and ECร—context), and sample size of N = 46 yielded achieved power of 0.910 (placebo treatment) and 0.808 (oxytocin treatment).โ€

      (6) Do the authors collect reaction times (RT) information? Did the decision context and oxytocin modulate RT? Based on their procedure, it seems that the authors adopted a speeded response task, therefore the RT may reflect some psychological processes independent of choice. It is also possible (and recommended) that the authors use the drift-diffusion model to quantify latent psychological processes underlying moral decision-making. It would be interesting to see if their manipulations have any impact on those latent psychological processes, in addition to explicit choice, which is the endpoint product of the latent psychological processes. There are some examples of applying DDM to this task, which the authors could refer to if they decide to go down this route (Yu et al, 2021, How peer influence shapes value computation in moral decision-making. Cognition.)

      We did collect the RT information for this experiment. As demonstrated in the figure below, participants exhibited significantly longer RT in the loss context compared to the gain context (Study1: the main effect of decision context: F<sub>1,79</sub>=20.043, P < 0.001, ฮท<sup>2</sup> =0.202; Study2-placebo: F<sub>1.45</sub>=17.177, P < 0.001, ฮท<sup>2</sup> =0.276). In addition to this effect of context, decisions were significantly slower in the other-condition compared to the self-condition

      (Study1: the main effect of recipient: F<sub>1,79</sub>=4.352, P < 0.040, ฮท<sup>2</sup> =0.052; Study2-placebo: F<sub>1,45</sub>=5.601, P < 0.022, ฮท<sup>2</sup> =0.111) which replicates previous research findings (Crockett et al., 2014). However, the differences in response time between recipients was not modulated by decision context (Study1: context ร— recipient interaction: F<sub>1,79</sub>=1.538, P < 0.219, ฮท<sup>2</sup> =0.019; Study2-placebo: F<sub>1,45</sub>=2.631, P < 0.112, ฮท<sup>2</sup> =0.055). Additionally, the results in the oxytocin study (study 2) revealed no evidence supporting any effect of oxytocin on reaction time. Neither the main effect (treatment: placebo vs. oxytocin) nor the interaction effect of oxytocin on response time was statistically significant (main effect of OXT treatment: F<sub>1,45</sub>=2.380, P < 0.230, ฮท<sup>2</sup> =0.050; treatment ร— context: F<sub>1,45</sub>=2.075, P < 0.157ฮท<sup>2</sup> =0.044; treatment ร— recipient: F<sub>1,45</sub>=0.266, P < 0.609, ฮท<sup>2</sup> =0.006; treatment ร— context ร— recipient: F<sub>1,45</sub>=2.909, P < 0.095, ฮท<sup>2</sup> =0.061).;

      Author response image 4.

      We also agree that it would be interesting to also investigate how the OXT might impact the dynamics of the decision process using a drift-diffusion model (DDM). However, we have already showed in the original manuscript that the OXT increased subjectsโ€™ relative harm sensitivities. If a canonical DDM is adopted here, then such an OXT effect is more likely to correspond to the increased drift rate for the relative harm sensitivity, which we feel still aligns with the current framework in general. In future studies, including further manipulations such as time pressure might be a more comprehensive approach to investigate the effect of OXT on DDM related decision variables such as attribute drift rate, initial bias, decision threshold and attribute synchrony.

      (7) This is just a personal preference, but I would avoid metaphoric language in a scientific paper (e.g., rescue, salvage, obliterate). Plain, neutral English terms can express the same meaning clearly (e.g., restore, vanish, eliminate).

      Again, we thank the reviewer for the suggestion and have since modified the terms.

      Reviewer #3:

      The primary weakness of the paper concerns its framing. Although it purports to be measuring "hyper-altruism" it does not provide evidence to support why any of the behavior being measured is extreme enough to warrant the modifier "hyper" (and indeed throughout I believe the writing tends toward hyperbole, using, e.g., verbs like "obliterate" rather than "reduce"). More seriously, I do not believe that the task constitutes altruism, but rather the decision to engage, or not engage, in instrumental aggression.

      We agree with the reviewer (and reviewer # 2) that plain and clear English should be used to describe our results and have since modified those terms.

      However, the term โ€œhyperaltruismโ€, which is the main theme of our study, was originally proposed by a seminal paper (Crockett et al., 2014) and has since been widely adopted in related studies (Crockett et al., 2014, 2015, 2017; Volz et al., 2017; Zhan et al., 2020). The term โ€œhyperaltruismโ€ was introduced to emphasize the difference from altruism (Chen et al., 2024; FeldmanHall et al., 2015; Hu et al., 2021; Hutcherson et al., 2015; Lockwood et al., 2017; Xiong et al., 2020). Hyperaltruism does not indicate extreme altruism. Instead, it simply reflects the fact that โ€œwe are more willing to sacrifice gains to spare others from harm than to spare ourselves from harmโ€ (Volz et al., 2017). In other words, altruism refers to peopleโ€™s unselfish regard for or devotion to the welfare of others, and hyperaltruism concerns subjectโ€™s own cost-benefit preference as the reference point and highlights the โ€œadditionalโ€ altruistic preference when considering otherโ€™s welfare. For example, in the altruistic experimental design, altruism is characterized by the degree to which subjects take other peopleโ€™s welfare into account (left panel). However, in a typical hyperaltruism task design (right panel), hyperaltruistic preference is operationally defined as the difference (ฮบ<sub>other</sub> - ฮบ<sub>self</sub>) between the degrees to which subjects value othersโ€™ harm (ฮบ<sub>other</sub>) and their own harm (ฮบ<sub>self</sub>).

      Author response image 5.

      I found it surprising that a paradigm that entails deciding to hurt or not hurt someone else for personal benefit (whether acquiring a financial gain or avoiding a loss) would be described as measuring "altruism." Deciding to hurt someone for personal benefit is the definition of instrumental aggression. I did not see that in any of the studies was there a possibility of acting to benefit the other participant in any condition. Altruism is not equivalent to refraining from engaging in instrumental aggression.ย True altruismย would be to accept shocks to the self for the other's benefit (e.g., money).ย  The interpretation of this task as assessing instrumental aggression is supported by the fact that only the Instrumental Harm subscale of the OUS was associated with outcomes in the task, but not the Impartial Benevolence subscale. By contrast, the IB subscale is the one more consistently associated with altruism (e.g,. Kahane et al 2018; Amormino at al, 2022) I believe it is important for scientific accuracy for theย paper, including the title, to be re-written to reflect what it is testing.

      Again, as we mentioned in the previous response, hyperaltruism is a term coined almost a decade ago and has since been widely adopted in the research field. We are afraid that switching such a term would be more likely to cause confusion (instead of clarity) among audience.

      Also, from the utilitarian perspective, the gain or loss (or harm) occurred to someone else is aligned on the same dimension and there is no discontinuity between gains and losses. Therefore, taking actions to avoid someone elseโ€™s loss can also be viewed as altruistic behavior, similar to choices increasing otherโ€™s welfare (Liu et al., 2020).

      Relatedly: in the introduction I believe it would be important to discuss the non-symmetry of moral obligations related to help/harm--we have obligations not to harm strangers but no obligation to help strangers. This is another reason I do not think the term "hyper altruism" is a good description for this task--given it is typically viewed as morally obligatory not to harm strangers, choosing not to harm them is not "hyper" altruistic (and again, I do not view it as obviously altruism at all).

      We agree with the reviewerโ€™s point that we have the moral obligations not to harm others but no obligation to help strangers (Liu et al., 2020). In fact, this is exactly what we argued in our manuscript: by switching the decision context from gains to losses, subjects were less likely to perceive the decisions as โ€œharming othersโ€. Furthermore, after the administration of OXT, making decisions in both the gain and loss contexts were more perceived by subjects as harming others (Fig. 6A).

      The framing of the role of OT also felt incomplete. In introducing the potential relevance of OT to behavior in this task, it is important to pull in evidence from non-human animals on origins of OT as a hormone selected for its role in maternal care and defense (including defensive aggression). The non-human animal literature regarding the effects of OT is on the whole much more robust and definitive than the human literature. The evidence is abundant that OT motivates the defensive care of offspring of all kinds. My read of the present OT findings is that they increase participants' willingness to refrain from shocking strangers even when incurring a loss (that is, in a context where the participant is weighing harm to themselves versus harm to the other). It will be important to explain why OT would be relevant to refraining from instrumental aggression, again, drawing on the non-human animal literature.

      We thank the reviewerโ€™s comments and agree that the current understanding of the link between our results of OT with animal literature can be at best described as vague and intriguing. Current literature on OT in animal research suggests that the nucleus accumbens (NAc) oxytocin might play the critical role in social cognition and reinforcing social interactions (Dรถlen et al., 2013; Dรถlen & Malenka, 2014; Insel, 2010). Though much insight has already been gained from animal studies, in humans, social interactions can take a variety of different forms, and the consociate recognition can also be rather dynamic. For example, male human participants with self-administered OT showed higher trust and cooperation towards in-group members but more defensive aggression towards out-group members (De Dreu et al., 2010). In another human study, participants administered with OT showed more coordinated out-group attack behavior, suggesting that OT might increase in-group efficiency at the cost of harming out-group members (Zhang et al., 2019). It is worth pointing out that in both experiments, the participantโ€™s group membership was artificially assigned, thus highlighting the context-dependent nature of OT effect in humans.

      In our experiment, more complex and higher-level social cognitive processes such as moral framing and moral perception are involved, and OT seems to play an important role in affecting these processes. Therefore, we admit that this study, like the ones mentioned above, is rather hard to find non-human animal counterpart, unfortunately. Instead of relating OT to instrumental aggression, we aimed to provide a parsimonious framework to explain why the โ€œhyperaltruismโ€ disappeared in the loss condition, and, with the OT administration, reappeared in both the gain and loss conditions while also considering the effects of other relevant variables. ย 

      We concur with the reviewerโ€™s comments about the importance of animal research and have since added the following paragraph into the revised manuscript (Line 86~90) as well as in the discussion:

      โ€œOxytocin has been shown to play a critical role in social interactions such as maternal attachment, pair bonding, consociate attachment and aggression in a variety of animal models[42,43]. Humans are endowed with higher cognitive and affective capacities and exhibit far more complex social cognitive patterns[44].โ€

      Another important limitation is the use of only male participants in Study 2. This was not an essential exclusion. It should be clear throughout sections of the manuscript that this study's effects can be generalized only to male participants.

      We thank the reviewerโ€™s comments. Prior research has shown sex differences in oxytocinโ€™s effects (Fischer-Shofty et al., 2013; Hoge et al., 2014; Lynn et al., 2014; Ma et al., 2016; MacDonald, 2013). Furthermore, with the potential confounds of OT effect due to the menstrual cycles and potential pregnancy in female subjects, most human OT studies have only recruited male subjects (Berends et al., 2019; De Dreu et al., 2010; Fischer-Shofty et al., 2010; Ma et al., 2016; Zhang et al., 2019). We have modified our manuscript to emphasize that study 2 only recruited male subjects.

      Recommendations:

      I believe the authors have provided an interesting and valuable dataset related to the willingness to engage in instrumental aggression - this is not the authors' aim, although also an important aim. Future researchers aiming to build on this paper would benefit from it being framed more accurately.

      Thus, I believe the paper must be reframed to accurately describe the nature of the task as assessing instrumental aggression. This is also an important goal, as well-designed laboratory models of instrumental aggression are somewhat lacking.

      Please see our response above that to have better connections with previous research, we believe that the term hyperaltruism might align better with the main theme for this study.

      The research literature on other aggression tasks should also be brought in, as I believe these are more relevant to the present study than research studies on altruism that are primarily donation-type tasks. It should be added to the limitations of how different aggression in a laboratory task such as this one is from real-world immoral forms of aggression. Arguably, aggression in a laboratory task in which all participants are taking part voluntarily under a defined set of rules, and in which aggression constrained by rules is mutual, is similar to aggression in sports, which is not considered immoral. Whether responses in this task would generalize to immoral forms of aggression cannot be determined without linking responses in the task to some real-world outcome.

      We agree with the reviewer that โ€œaggression in a lab task โ€ฆ. is similar to aggression in sportsโ€. Our starting point was to investigate the boundary conditions for the hyperaltruism (though we donโ€™t deny that there is an aggression component in hyperaltruism, given the experiment design we used). In other words, the dependent variable we were interested in was the difference between โ€œotherโ€ and โ€œselfโ€ aggression, not the aggression itself. Our results showed that by switching the decision context from the monetary gain environment to the loss condition, human participants were willing to bear similar amounts of monetary loss to spare others and themselves from harm. That is, hyperaltruism disappeared in the loss condition. We interpreted this result as the loss condition prompted subjects to adopt a different moral framework (help vs. harm, Fig. 6A) and subjects were less influenced by their instrumental harm personality trait due to the change of moral framework (Fig. 3C). In the following study (study 2), we further tested this hypothesis and verified that the administration of OT indeed increased subjectsโ€™ perception of the task as harming others for both gain and loss conditions (Fig. 6A), and such moral perception mediated the relationship between subjectโ€™s personality traits (instrumental harm) and their relative harm sensitivities (the difference of aggression between the other- and self-conditions). We believe the moral perception framework and that OT directly modulates moral perception better account for subjectsโ€™ context-dependent choices than hypothesizing OTโ€™s context-dependent modulation effects on aggression.

      The language should also be toned down--the use of phrases like "hyper altruism" (without independent evidence to support that designation) and "obliterate" rather than "reduce" or "eliminate" are overly hyperbolic.

      We have changed terms such as โ€œobliterateโ€ and โ€œeliminateโ€ to plain English, as the reviewer suggested.

      Reference

      Abu-Akel, A., Palgi, S., Klein, E., Decety, J., & Shamay-Tsoory, S. (2015). Oxytocin increases empathy to pain when adopting the other- but not the self-perspective. Social Neuroscience, 10(1), 7โ€“15.

      Barchi-Ferreira, A., & Osรณrio, F. (2021). Associations between oxytocin and empathy in humans: A systematic literature review. Psychoneuroendocrinology, 129, 105268.

      Berends, Y. R., Tulen, J. H. M., Wierdsma, A. I., van Pelt, J., Feldman, R., Zagoory-Sharon, O., de Rijke, Y. B., Kushner, S. A., & van Marle, H. J. C. (2019). Intranasal administration of oxytocin decreases task-related aggressive responses in healthy young males. Psychoneuroendocrinology, 106, 147โ€“154.

      Chen, J., Putkinen, V., Seppรคlรค, K., Hirvonen, J., Ioumpa, K., Gazzola, V., Keysers, C., & Nummenmaa, L. (2024). Endogenous opioid receptor system mediates costly altruism in the human brain. Communications Biology, 7(1), 1โ€“11.

      Crockett, M. J., Kurth-Nelson, Z., Siegel, J. Z., Dayan, P., & Dolan, R. J. (2014). Harm to others outweighs harm to self in moral decision making. Proceedings of the National Academy of Sciences of the United States of America, 111(48), 17320โ€“17325.

      Crockett, M. J., Siegel, J. Z., Kurth-Nelson, Z., Dayan, P., & Dolan, R. J. (2017). Moral transgressions corrupt neural representations of value. Nature Neuroscience, 20(6), 879โ€“885.

      Crockett, M. J., Siegel, J. Z., Kurth-Nelson, Z., Ousdal, O. T., Story, G., Frieband, C., Grosse-Rueskamp, J. M., Dayan, P., & Dolan, R. J. (2015). Dissociable Effects of Serotonin and Dopamine on the Valuation of Harm in Moral Decision Making. Current Biology, 25(14), 1852โ€“1859.

      De Dreu, C. K. W., Greer, L. L., Handgraaf, M. J. J., Shalvi, S., Van Kleef, G. A., Baas, M., Ten Velden, F. S., Van Dijk, E., & Feith, S. W. W. (2010). The Neuropeptide Oxytocin Regulates Parochial Altruism in Intergroup Conflict Among Humans. Science, 328(5984), 1408โ€“1411.

      De Dreu, C. K. W., Gross, J., Fariรฑa, A., & Ma, Y. (2020). Group Cooperation, Carrying-Capacity Stress, and Intergroup Conflict. Trends in Cognitive Sciences, 24(9), 760โ€“776.

      Dรถlen, G., Darvishzadeh, A., Huang, K. W., & Malenka, R. C. (2013). Social reward requires coordinated activity of nucleus accumbens oxytocin and serotonin. Nature, 501(7466), 179โ€“184.

      Dรถlen, G., & Malenka, R. C. (2014). The Emerging Role of Nucleus Accumbens Oxytocin in Social Cognition. Biological Psychiatry, 76(5), 354โ€“355.

      Evans, S., Shergill, S. S., & Averbeck, B. B. (2010). Oxytocin Decreases Aversion to Angry Faces in an Associative Learning Task. Neuropsychopharmacology, 35(13), 2502โ€“2509.

      Fehr, E., & Schmidt, K. M. (1999). A Theory of Fairness, Competition, and Cooperation*. The Quarterly Journal of Economics, 114(3), 817โ€“868.

      FeldmanHall, O., Dalgleish, T., Evans, D., & Mobbs, D. (2015). Empathic concern drives costly altruism. Neuroimage, 105, 347โ€“356.

      Fischer-Shofty, M., Levkovitz, Y., & Shamay-Tsoory, S. G. (2013). Oxytocin facilitates accurate perception of competition in men and kinship in women. Social Cognitive and Affective Neuroscience, 8(3), 313โ€“317.

      Fischer-Shofty, M., Shamay-Tsoory, S. G., Harari, H., & Levkovitz, Y. (2010). The effect of intranasal administration of oxytocin on fear recognition. Neuropsychologia, 48(1), 179โ€“184.

      Glenn, A. L., Koleva, S., Iyer, R., Graham, J., & Ditto, P. H. (2010). Moral identity in psychopathy. Judgment and Decision Making, 5(7), 497โ€“505.

      Hoge, E. A., Anderson, E., Lawson, E. A., Bui, E., Fischer, L. E., Khadge, S. D., Barrett, L. F., & Simon, N. M. (2014). Gender moderates the effect of oxytocin on social judgments. Human Psychopharmacology: Clinical and Experimental, 29(3), 299โ€“304.

      Hu, J., Hu, Y., Li, Y., & Zhou, X. (2021). Computational and Neurobiological Substrates of Cost-Benefit Integration in Altruistic Helping Decision. Journal of Neuroscience, 41(15), 3545โ€“3561.

      Hutcherson, C. A., Bushong, B., & Rangel, A. (2015). A Neurocomputational Model of Altruistic Choice and Its Implications. Neuron, 87(2), 451โ€“462.

      Insel, T. R. (2010). The Challenge of Translation in Social Neuroscience: A Review of Oxytocin, Vasopressin, and Affiliative Behavior. Neuron, 65(6), 768โ€“779.

      Kahane, G., Everett, J. A. C., Earp, B. D., Caviola, L., Faber, N. S., Crockett, M. J., & Savulescu, J. (2018). Beyond sacrificial harm: A two-dimensional model of utilitarian psychology. Psychological Review, 125(2), 131โ€“164.

      Kahane, G., Everett, J. A. C., Earp, B. D., Farias, M., & Savulescu, J. (2015). โ€˜Utilitarianโ€™ judgments in sacrificial moral dilemmas do not reflect impartial concern for the greater good. Cognition, 134, 193โ€“209.

      Kahneman, D., & Tversky, A. (1979). Prospect Theory: An Analysis of Decision under Risk. Econometrica, 47(2), 263.

      Kapetaniou, G. E., Reinhard, M. A., Christian, P., Jobst, A., Tobler, P. N., Padberg, F., & Soutschek, A. (2021). The role of oxytocin in delay of gratification and flexibility in non-social decision making. eLife, 10, e61844.

      Kirsch, P., Esslinger, C., Chen, Q., Mier, D., Lis, S., Siddhanti, S., Gruppe, H., Mattay, V. S., Gallhofer, B., & Meyer-Lindenberg, A. (2005). Oxytocin Modulates Neural Circuitry for Social Cognition and Fear in Humans. The Journal of Neuroscience, 25(49), 11489โ€“11493.

      Liu, J., Gu, R., Liao, C., Lu, J., Fang, Y., Xu, P., Luo, Y., & Cui, F. (2020). The Neural Mechanism of the Social Framing Effect: Evidence from fMRI and tDCS Studies. The Journal of Neuroscience, 40(18), 3646โ€“3656.

      Liu, Y., Li, L., Zheng, L., & Guo, X. (2017). Punish the Perpetrator or Compensate the Victim? Gain vs. Loss Context Modulate Third-Party Altruistic Behaviors. Frontiers in Psychology, 8, 2066.

      Lockwood, P. L., Hamonet, M., Zhang, S. H., Ratnavel, A., Salmony, F. U., Husain, M., & Maj, A. (2017). Prosocial apathy for helping others when effort is required. Nature Human Behaviour, 1(7), 131โ€“131.

      Losecaat Vermeer, A. B., Boksem, M. A. S., & Sanfey, A. G. (2020). Third-party decision-making under risk as a function of prior gains and losses. Journal of Economic Psychology, 77, 102206.

      Lynn, S. K., Hoge, E. A., Fischer, L. E., Barrett, L. F., & Simon, N. M. (2014). Gender differences in oxytocin-associated disruption of decision bias during emotion perception. Psychiatry Research, 219(1), 198โ€“203.

      Ma, Y., Liu, Y., Rand, D. G., Heatherton, T. F., & Han, S. (2015). Opposing Oxytocin Effects on Intergroup Cooperative Behavior in Intuitive and Reflective Minds. Neuropsychopharmacology, 40(10), 2379โ€“2387.

      Ma, Y., Shamay-Tsoory, S., Han, S., & Zink, C. F. (2016). Oxytocin and Social Adaptation: Insights from Neuroimaging Studies of Healthy and Clinical Populations. Trends in Cognitive Sciences, 20(2), 133โ€“145.

      MacDonald, K. S. (2013). Sex, Receptors, and Attachment: A Review of Individual Factors Influencing Response to Oxytocin. Frontiers in Neuroscience, 6. 194.

      Markiewicz, ล., & Czupryna, M. (2018). Cheating: One Common Morality for Gain and Losses, but Two Components of Morality Itself. Journal of Behavior Decision Making. 33(2), 166-179.

      Pachur, T., Schulte-Mecklenbeck, M., Murphy, R. O., & Hertwig, R. (2018). Prospect theory reflects selective allocation of attention. Journal of Experimental Psychology: General, 147(2), 147โ€“169.

      Radke, S., Roelofs, K., & De Bruijn, E. R. A. (2013). Acting on Anger: Social Anxiety Modulates Approach-Avoidance Tendencies After Oxytocin Administration. Psychological Science, 24(8), 1573โ€“1578.

      Saez, I., Zhu, L., Set, E., Kayser, A., & Hsu, M. (2015). Dopamine modulates egalitarian behavior in humans. Current Biology, 25(7), 912โ€“919.

      Teoh, Y. Y., Yao, Z., Cunningham, W. A., & Hutcherson, C. A. (2020). Attentional priorities drive effects of time pressure on altruistic choice. Nature Communications, 11(1), 3534.

      Tom, S. M., Fox, C. R., Trepel, C., & Poldrack, R. A. (2007). The neural basis of loss aversion in decision-making under risk. Science, 315(5811), 515โ€“518.

      Usher, M., & McClelland, J. L. (2004). Loss Aversion and Inhibition in Dynamical Models of Multialternative Choice. Psychological Review, 111(3), 757โ€“769.

      Volz, L. J., Welborn, B. L., Gobel, M. S., Gazzaniga, M. S., & Grafton, S. T. (2017). Harm to self outweighs benefit to others in moral decision making. Proceedings of the National Academy of Sciences of the United States of America, 114(30), 7963โ€“7968.

      Wu, Q., Mao, J., & Li, J. (2020). Oxytocin alters the effect of payoff but not base rate in emotion perception. Psychoneuroendocrinology, 114, 104608.

      Wu, S., Cai, W., & Jin, S. (2018). Gain or non-loss: The message matching effect of regulatory focus on moral judgements of other-orientation lies. International Journal of Psychology, 53(3), 223-227.

      Xiong, W., Gao, X., He, Z., Yu, H., Liu, H., & Zhou, X. (2020). Affective evaluation of othersโ€™ altruistic decisions under risk and ambiguity. Neuroimage, 218, 116996.

      Yechiam, E., & Hochman, G. (2013). Losses as modulators of attention: Review and analysis of the unique effects of losses over gains. Psychological Bulletin, 139(2), 497โ€“518.

      Zhan, Y., Xiao, X., Tan, Q., Li, J., Fan, W., Chen, J., & Zhong, Y. (2020). Neural correlations of the influence of self-relevance on moral decision-making involving a trade-off between harm and reward. Psychophysiology, 57(9), e13590.

      Zhang, H., Gross, J., De Dreu, C., & Ma, Y. (2019). Oxytocin promotes coordinated out-group attack during intergroup conflict in humans. eLife, 8, e40698.

    1. eLife Assessment

      This important study suggests that adolescent mice exhibit less accuracy than adult mice in a sound discrimination task when the sound frequencies are very similar. The evidence supporting this observation is solid and suggests that it arises from cognitive control differences between adolescent and adult mice. The adolescent period is largely understudied, despite its contribution to shaping the adult brain, which makes this study interesting for a broad range of neuroscientists.

    2. Reviewer #1 (Public review):

      Summary:

      Praegel et al. explore the differences in learning an auditory discrimination task between adolescent and adult mice. Using freely-moving (Educage) and head-fixed paradigms, they compare behavioral performance and neuronal responses over the course of learning. The mice were initially trained for seven days on an easy pure frequency tone Go/No-go task (frequency difference of one octave), followed by seven days of a harder version (frequency difference of 0.25 octave). While adolescents and adults showed similar performance on the easy task, adults performed significantly better on the harder task. Quantifying the lick bias of both groups, the authors then argue that the difference in performance is not due to a difference in perception, but rather to a difference in cognitive control. The authors then used neuropixel recordings across 4 auditory cortical regions to quantify the neuronal activity related to the behavior. At the single cell level, the data shows earlier stimulus-related discrimination for adults compared to adolescents in both the easy and hard tasks. At the neuronal population level, adults displayed a higher decoding accuracy and lower onset latency in the hard task as compared to adolescents. Such differences were not only due to learning, but also to age as concluded from recordings in novice mice. After learning, neuronal tuning properties had changed in adults but not in adolescent. Overall, the differences between adolescent and adult neuronal data correlates with the behavior results in showing that learning a difficult task is more challenging for younger mice.

      Strengths:

      - The behavioral task is well designed, with the comparison of easy and difficult tasks allowing for a refined conclusion regarding learning across age. The experiments with optogenetics and novice mice are completing the research question in a convincing way.<br /> - The analysis, including the systematic comparison of task performance across the two age groups, is most interesting, and reveals differences in learning (or learning strategies?) that are compelling.<br /> - Neuronal recording during both behavioral training and passive sound exposure is particularly powerful, and allows interesting conclusions.

      Weaknesses:<br /> - The presentation of the paper must be strengthened. Inconsistencies, missing information or confusing descriptions should be fixed.<br /> - The recording electrodes cover regions in the primary and secondary cortices. It is well known that these two regions process sounds quite differently (for example, one has tonotopy, the other not), and separating recordings from both regions is important to conclude anything about sound representations. The authors show that the conclusions are the same across regions for Figure 4, but is it also the case for the subsequent analysis? Comparing to the original manuscript, the authors have now done the analysis for AuDp and AUDv separately, and say that the differences are similar in both regions. The data however shows that this is not the case (Fig S7). And even if it were the case, how would it compatible with the published literature?

    3. Reviewer #2 (Public review):

      Summary:

      The authors aimed to find out how and how well adult and adolescent mice discriminate tones of different frequencies and whether there are differences in processing at the level of the auditory cortex that might explain differences in behavior between the two groups. Adolescent mice were found to be worse at sound frequency discrimination than adult mice. The performance difference between the groups was most pronounced when the sounds are close in frequency and thus difficult to distinguish and could, at least in part, be attributed to the younger mice' inability to withhold licking in no-go trials. By recording the activity of individual neurons in the auditory cortex when mice performed the task or were passively listening as well as in untrained mice the authors identified differences in the way that the adult and adolescent brains encode sounds and the animals' choice that could potentially contribute to the differences in behavior.

      Strengths:

      The study combines behavioural testing in freely-moving and head-fixed mice, optogenetic manipulation and high density electrophysiological recordings in behaving mice to address important open questions about age differences in sound-guided behavior and sound representation in the auditory cortex.

      Weaknesses:

      For some of the analyses that the authors conducted it is unclear what the rationale behind them is and, consequently, what conclusion we can draw from them.

    4. Author response:

      The following is the authorsโ€™ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      A) The presentation of the paper must be strengthened. Inconsistencies, mislabelling, duplicated text, typos, and inappropriate colour code should be changed.

      We spotted and corrected several inconsistencies and mislabelling issues throughout the text and figures. Thanks! ย 

      B) Some claims are not supported by the data. For example, the sentence that says that "adolescent mice showed lower discrimination performance than adults (l.22) should be rewritten, as the data does not show that for the easy task (Figure 1F and Figure 1H).

      We carefully reviewed the specific claims and fixed some of the wording so it adheres to the data shown.

      C) In Figure 7 for example, are the quantified properties not distinct across primary and secondary areas?

      We now carried out additional analysis to test this. We found that while AUDp and AUDv exhibit distinct tuning properties, they show similar differences between adolescent and adult neurons (see Supplementary Table 6, Fig. S7-1a-h). Note that TEa and AUDd could not be evaluated due to low numbers of modulated neurons in this protocol.

      D) Some analysis interpretations should be more cautious. (..) A lower lick rate in general could reflect a weaker ability to withhold licking- as indicated on l.164, but also so many other things, like a lower frustration threshold, lower satiation, more energy, etc).

      That is a fair comment, and we refined our interpretations. Moreover, we also addressed whether impulsiveness impacted lick rates. In the Educage, we found that adolescent mice had shorter ITIs only after FAs (Fig. S2-1). In the head-fixed setup, we examined (1) the proportion of ITIs where licks occurred (Fig. S3-1c) and (2) the number of licks in these ITIs (Fig. S3-1d). We found no differences between adolescents and adults, indicating that the differences observed in the main task are not due to general differences in impulsiveness (Fig. S2-1, Fig. S3-1c, d). Finally, we note that potential differences in satiation were already addressed in the original manuscript by carefully examining the number of trials completed across the session. See also Review 3, comment #1 below.

      Reviewer #2 (Public review):

      A) For some of the analyses that the authors conducted it is unclear what the rationale behind them is and, consequently, what conclusion we can draw from them.

      We reviewed the manuscript carefully and revised the relevant sections to clarify the rationale behind the analyses. See detailed responses to all the reviewerโ€™s specific comments.

      B) The results of optogenetic manipulation, while very interesting, warrant a more in-depth discussion.

      We expanded our discussion on these experiments (L495-511) and also added an additional analysis to strengthen our findings (Fig. S3-2e).

      Reviewer #3 (Public review):

      (1) The authors report that "adolescent mice showed lower auditory discrimination performance compared to adults" and that this performance deficit was due to (among other things) "weaker cognitive control". I'm not fully convinced of this interpretation, for a few reasons. First, the adolescents may simply have been thirstier, and therefore more willing to lick indiscriminately. The high false alarm rates in that case would not reflect a "weaker cognitive control" but rather, an elevated homeostatic drive to obtain water. Second, even the adult animals had relatively high (~40%) false alarm rates on the freely moving version of the task, suggesting that their behavior was not particularly well controlled either. One fact that could help shed light on this would be to know how often the animals licked the spout in between trials. Finally, for the head-fixed version of the task, only d' values are reported. Without the corresponding hit and false alarm rates (and frequency of licking in the intertrial interval), it's hard to know what exactly the animals were doing.

      irst, as requested, we added the Hit rates and FA rates for the head-fixed task (Fig. S3-1a). Second, as requested by the reviewr, we performed additional analyses in both the Educage and head-fixed versions of the task. Specifically, we analyzed the ITI duration following each trial outcome. We found that adolescent mice had shorter ITIs only after Fas (Fig. S2-1). In the head-fixed setup, we examined (1) the proportion of ITIs during which licks occurred (Fig. S3-1c) and (2) the number of licks in these ITIs (Fig. S3-1d). We found no differences between adolescents and adults, indicating that the differences observed in the main task are not due to general differences in impulsiveness (Fig. S2-1, Fig. S3-1c, d). See also comment #D of reviewer #1 above.

      B) There are some instances where the citations provided do not support the preceding claim. For example, in lines 64-66, the authors highlight the fact that the critical period for pure tone processing in the auditory cortex closes relatively early (by ~P15). However, one of the references cited (ref 14) used FM sweeps, not pure tones, and even provided evidence that the critical period for this more complex stimulus occurred later in development (P31-38). Similarly, on lines 72-74, the authors state that "ACx neurons in adolescents exhibit high neuronal variability and lower tone sensitivity as compared to adults." The reference cited here (ref 4) used AM noise with a broadband carrier, not tones.

      We carefully checked the text to ensure that each claim is accurately supported by the corresponding reference.

      C) Given that the authors report that neuronal firing properties differ across auditory cortical subregions (as many others have previously reported), why did the authors choose to pool neurons indiscriminately across so many different brain regions?

      We appreciate the reviewerโ€™s concern. While we acknowledge that pooling neurons across auditory cortical subregions may obscure region-specific effects, our primary focus in this study is on developmental differences between adolescents and adults, which were far more pronounced than subregional differences.

      To address this potential limitation: (1) We analyzed firing differences across subregions during task engagement (see Fig. S4-1, S4-2, S4-3; Supplementary Tables 2 and 3). (2) We have now added new analyses for the passive listening condition in AUDp and AUDv (Fig. S7-1; Supplementary Table 6).

      These analyses support our conclusion that developmental stage has a greater impact on auditory cortical activity than subregional location in the contexts examined. For clarity and cohesion, the main text emphasizes developmental differences, while subregional analyses are presented in the Supplement.

      D) And why did they focus on layers 5/6? (Is there some reason to think that age-related differences would be more pronounced in the output layers of the auditory cortex than in other layers?)

      We agree that other cortical layers, particularly supragranular layers, are important for auditory processing and plasticity. Our focus on layers 5/6 was driven by both methodological and biological considerations. Methodologically, our electrode penetrations were optimized to span multiple auditory cortical areas, and deeper layers provided greater mechanical stability for chronic recordings. Biologically, layers 5/6 contain the principal output neurons of the auditory cortex and are well-positioned to influence downstream decision-making circuits. We acknowledge the limitation of our recordings to these layers in the manuscript (L268; L464-8).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The presentation of the paper must be strengthened. As it is now, it makes it difficult to appreciate the strengths of the results. Here are some points that should be addressed:

      a) The manuscript is full of inconsistencies that should be fixed to improve the reader's understanding. For example, the description on l.217 and the Figure. S3-1b, the D' value of 0 rounded to 0.01 on l. 735 (isn't it rather the z-scored value that is rounded? A D' of 0 is not a problem), the definition of lick bias on l. 750 and the values in Fig.2, the legend of Figure 7F and what is displayed on the graph (is it population sparseness or responsiveness?), etc.

      We adjusted the legend and description of former Fig. S3-1b (now Fig. S3-2b).

      We now clarify that the rounded values refer to z-scored hit and false alarm rates that we used in the dโ€™ calculation. We adjusted the definition of the lick bias in Fig. 2 and Fig. S3-1b (L804).

      We replaced โ€˜population responsivenessโ€™ with โ€˜population sparsenessโ€™ throughout the figures, legend and the text.

      b) References to figures are sometimes wrong (for example on l. 737,739).

      c) Some text is duplicated (for example l. 814 and l. 837).

      d) Typos should be corrected (for example l. 127, 'the', l. 787, 'upto').

      We deleted the incorrect references of this section, removed the duplicated text, and corrected the typos.

      e) Color code should be changed (for example the shades of blue for easy and hard tasks - they are extremely difficult to differentiate).

      After consideration, we decided to retain the blue color code (i.e., Fig. 1d, Fig. 3d, Fig. 4e-g, Fig. 5c, Fig. 6dโ€“g), where the distinction between the shades of blue appears sufficiently clear and maintains visual consistency and aesthetic appeal. We did however, made changes in the other color codes (Fig. 4, Fig. 5, Fig. 6, Fig. 7).

      f) Figure design should be improved. For example, why is a different logic used for displaying Figure 5A or B and Figure 1E?

      We adjusted the color scheme in Fig. 5. We chose to represent the data in Fig. 5 according to task difficulty, as this arrangement best illustrates the more pronounced deficits in population decoding in adolescents during the hard task.

      f) Why use a 3D representation in Figure 4G? (2)

      The 3D representation in Fig. 4g was chosen to illustrate the 3-way interactions between onset-latency, maximal discriminability, and duration of discrimination.

      g) Figure 1A, lower right panel- should "response" not be completed by "lick", "no lick"?

      We changed the labels to โ€œLickโ€ and โ€œNo Lickโ€ in Fig. 1a.

      h) l.18 the age mentioned is misleading, because the learning itself actually started 20 days earlier than what is cited here.

      Corrected.

      i) Explain what AAV5-... is on l.212.

      We added an explanation of virus components (see L216-220).

      (2) The comparison of CV in Figure 2 H-J is interesting. I am curious to know whether the differences in the easy and hard tasks could be due to a decrease in CV in adults, rather than an increase in CV in adolescents? Also, could the difference in J be due to 3 outliers?

      We agree that the observed CV differences may reflect a reduction in variability in adults rather than an increase in adolescents. We have revised the Results section accordingly to acknowledge this interpretation.

      Regarding the concern about potential outliers in Fig. 2J, we tested the data for outliers using the isoutlier function in MATLAB (defining outliers as values exceeding three standard deviations from the mean) and found no such cases.

      (3) Figure 2c shows that there is no difference in perceptual sensitivity between adolescents and adults, whereas the conclusion from Figure 4 is that adolescents exhibit lower discriminability in stimulus-related activity. Aren't these results contradictory?

      This is a nuanced point. The similar slopes of the psychometric functions (Fig. 2c) indicating comparable perceptual sensitivity and the lower AUC observed in the ACx of adolescents (Fig. 4) do not necessarily contradict each other. These two measures capture related but distinct issues: psychometric slopes reflect behavioral output, which integrates both sensory encoding and processing downstream to ACx, while the AUC analysis reflects stimulus-related neural activity in ACx, which may still include decision-related components.<br /> Note that stimulus-related neural discriminability outside the context of the task is not different between adolescent and adult experts (Fig. 7h; p = 0.9374, Kruskal Willis Test after Tukey-Kramer correction for multiple comparisons; not discussed in the manuscript). This suggests that there are differences that emerge when we measure during behavior. Also note that behavior may rely on processing beyond ACx, and it is possible that downstream areas compensate for weaker cortical discriminability in adolescents โ€” but this issue merits further investigation.

      (4) Why do you think that the discrimination in hard tasks decreases with learning (Figure 6D vs Figure 6F)?

      This is another nuanced point, and we can only speculate at this stage. While it may appear counterintuitive that single-neuron discriminability (AUC) for the hard task is reduced after learning (Fig. 6D vs. 6F), we believe this may reflect a shift in sensory coding in expert animals. In a recent study (Haimson et al., 2024; Science Advances), we found that learning alters single-neuron responses in the easy versus hard task in complex and distinct ways, which may account for this result. It is also possible that, in expert mice, top-down mechanisms such as feedback from higher-order areas act to suppress or stabilize sensory responses in auditory cortex, reducing the apparent stimulus selectivity of single neurons (e.g., AUC), even as behaviorally relevant information is preserved or enhanced at the population level.

      Reviewer #2 (Recommendations for the authors):

      This is very interesting work and I enjoyed reading the manuscript. See below for my comments, queries and suggestions, which I hope will help you improve an already very good paper.

      We thank the reviewer for the meticulous and thoughtful review.

      (1) Line 107: x-axis of panel 1e says 'pre-adolescent'.

      (2) Line 130: replace 'less' with 'fewer'.

      (3) Line 153: 'both learned and catch trials': I find the terminology here a bit confusing. I would typically understand a catch trial to be a trial without a stimulus but these 'catch' trials here have a stimulus. It's just that they are not rewarded/punished. What about calling them probe trials instead?

      We corrected the labelling (1), reworded to โ€˜fewerโ€™ and โ€˜probe trialsโ€™ (2,3).

      (4) Line 210: The results of the optogenetics experiments are very interesting. In particular, because the effect is so dramatic and much bigger than what has been reported in the literature previously, I believe. Lick rates are dramatically reduced suggesting that the mice have pretty much stopped engaging in the task and the authors very rightly state that the 'execution' of the behavior is affected. I think it would be worth discussing the implications of these results more thoroughly, perhaps also with respect to some of the lesion work. Useful discussions on the topic can be found, for instance, in Otchy et al., 2015; Hong et al., 2018; O'Sullivan et al., 2019; Ceballo et al., 2019 and Lee et al., 2024. Are the mice unable to hear anything in laser trials and that is why they stopped licking? If they merely had trouble distinguishing them then we would perhaps expect the psychometric curves to approach chance level, i.e. to be flat near the line indicating a lick rate of 0.5. Could the dramatic decrease in lick rate be a motor issue? Can we rule out spillover of the virus to relevant motor areas? (I understand all of the 200nL of the virus were injected at a single location) Or are the effects much more dramatic than what has been reported previously simply because the GtACR2 is much more effective at silencing the auditory cortex? Could the effect be down to off-target effects, e.g. by removing excitation from a target area of the auditory cortex, rather than the disruption of cortical processing?

      We have now expanded the discussion in the manuscript to more thoroughly consider alternative interpretations of the strong behavioral effect observed during ACx silencing (L495โ€“511). In particular, we acknowledge that the suppression of licking may reflect not only impaired sensory discrimination but also broader disruptions to arousal, motivation, or motor readiness. We also discuss the potential impact of viral spread, circuit-level off-target effects, and the potency of GtACR2 as possible contributors. We highlight the need for future work using more graded or temporally precise manipulations to resolve these issues.

      (5) Line 226: Reference 19 (Talwar and Gerstein 2001) is not particularly relevant as it is mostly concerned with microstimulation-induced A1 plasticity. There are, however, several other papers that should be cited (and potentially discussed) in this context. In particular, O'Sullivan et al., 2019 and Ceballo et al., 2019 as these papers investigate the effects of optogenetic silencing on frequency discrimination in head-fixed mice and find relatively modest impairments. Also relevant may be Kato et al., 2015 and Lee et al., 2024, although they look at sound detection rather than discrimination.

      We changed the references and pointed the reader to the (new section) Discussion.

      (6) Line 253: 'engaged [in] the task.

      (7) Figure 4: It appears that panel S4-1d is not referred to anywhere in the main text.

      Fixed.

      (8) Line 260: Might be useful to explain a bit more about the motivation behind focusing on L5/L6. Are there mostly theoretical considerations, i.e. would we expect the infragranular layers to be more relevant for understanding the difference in task performance? Or were there also practical considerations, e. g. did the data set contain mostly L5/L6 neurons because those were easier to record from given the angle at which the probe was inserted? If those kinds of practical considerations played a role, then there is nothing wrong with that but it would be helpful to explain them for the benefit of others who might try a similar recording approach.

      There were no deep theoretical considerations for targeting L5/6. ย Our focus on layers 5/6 was driven by both methodological and biological considerations. Methodologically, our electrode penetrations were optimized to span multiple auditory cortical areas, and deeper layers provided greater mechanical stability for chronic recordings. Biologically, layers 5/6 contain the principal output neurons of the auditory cortex and are well-positioned to influence downstream decision-making circuits. We acknowledge the limitation of our recordings to these layers in the manuscript (L268; L463โ€“467). See also comment D of reviewer 3.

      (9) Supplementary Table 2: The numbers in brackets indicate fractions rather than percentages.

      Fixed.

      (10) Figure S4-3: The figure legend implies that the number of neurons with significant discriminability for the hard stimulus and significant discriminability for choice was identical. (adolescent neurons = 368, mice = 5, recordings = 10; adult n = 544, mice = 6, recordings = 12 in both cases). Presumably, that is not actually the case and rather the result of a copy/paste operation gone wrong. Furthermore, I think it would be helpful to state the fractions of neurons that can discriminate between the stimuli and between the choices that the animal made in the main text.

      Thank you for spotting the mistake. We corrected the nโ€™s and added the percentage of neurons that discriminate stimulus and choice in the main text and the figure legend.

      (11) Line 301: 'We used a ... decoder to quantify hit versus correct reject trial outcomes': I'm not sure I understand the rationale here. For the single unit analysis hit and false alarm trials were compared to assess their ability to discriminate the stimuli. FA and CR trials were compared to assess whether neurons can encode the choice of the mice. But the hit and CR trials which are contrasted here differ in terms of both stimulus and behavior/choice so what is supposed to be decoded here, what is supposed to be achieved with this analysis?

      Thank you for this important point. You're correct that comparing hit and CR trials captures differences in both stimulus and choice, or task-related differences. We chose this contrast for the population decoding analysis to achieve higher trial counts per session and similar number of trials which are necessary for the reliability of the analysis. While this approach does not isolate stimulus from choice encoding, it provides an overall measure of how well population activity distinguishes task-relevant outcomes. We explicitly acknowledge this issue in L313-314.

      (12) Line 332: What do you mean when you say the novice mice were 'otherwise fully engaged' in the task when they were not trained to do the task and are not doing the task?

      By "otherwise fully engaged," we mean that novice mice were actively participating in the task environment, similar to expert mice โ€” they were motivated by thirst and licked the spout to obtain water. The key distinction is that novice mice had not yet learned the task rules and likely relied on trial-and-error strategies, rather than performing the task proficiently.

      (13) Line 334: 'regardless of trial outcome': Why is the trial outcome not taken into account? What is the rationale for this analysis? Furthermore, in novice mice a substantial proportion of the 'go' trials are misses. In expert mice, however, the proportion of 'miss trials' (and presumably false alarms) will by definition be much smaller. Given this, I find it difficult to interpret the results of this section.

      This approach was chosen to reliably decode a sufficient number of trials for each task difficulty (i.e. expert mice predominantly performed CRs on No-Go trials and novice mice often showed FAs). Utilizing all trial outcomes ensured that we had enough trials for each stimulus type to accurately estimate the AUCs. This approach avoids introducing biases due to uneven trial numbers across learning stages.

      (14) Line 378: 'differences between adolescents and adults arise primarily from age': Are there differences in any of the metrics shown in 7e-h between adolescents and adults?

      We confirm that differences between adolescents and adults are indeed present in some metrics but not others in Figure 7eโ€“h. Specifically, while tuning bandwidth was similar in novice animals, it was significantly lower in adult experts (Fig. 7e; novice: p = 0.0882; expert: p = 0.0001 Kruskal Willis Test after Tukey-Kramer correction for multiple comparisons; not discussed in the manuscript). The population sparseness was similar in both novice and expert adolescent and adult neurons (Fig. 7f; novice: p = 0.2873; expert: p = 0.1017, Kruskal Willis Test after Tukey-Kramer correction for multiple comparisons; not discussed in the manuscript). The distance to the easy go stimulus was similar in novice animals, but lower in adult experts (Fig. 7g; novice: p = 0.7727; expert: p = 0.0001, Kruskal Willis Test after Tukey-Kramer correction for multiple comparisons; not discussed in the manuscript). The neuronal d-prime was similar in both novice and expert adolescent and adult neurons (Fig. 7h; novice: p = 0.7727; expert: p = 0.0001, Kruskal Willis Test after Tukey-Kramer correction for multiple comparisons; not discussed in the manuscript).

      (15) Line 475: '...well and beyond...': something seems to be missing in this statement.

      (16) Line 487: 'onto' should be 'into', I think.

      (17) Line 610 and 613: '3 seconds' ... '2.5 seconds': Was the response window 3s or 2.5s?

      (18) Line 638: 'set' should be 'setup', I believe.

      All the mistakes mentioned above, were fixed. Thanks.

      (19) Line 643: 'Reward-reinforcement was delayed to 0.5 seconds after the tone offset': Presumably, if they completed their fifth lick later than 0.5 seconds after the tone, the reward delivery was also delayed?

      Apologies for the lack of clarity. In the head-fixed version, there was no lick threshold. Mice were reinforced after a single lick. If that lick occurred after the 0.5-second reinforcement delay following tone offset, the reward or punishment was delivered immediately upon licking.

      (20) Line 661: 'effect [of] ACx'.

      (21) Line 680: 'a base-station connected to chassis'. The sentence sounds incomplete.

      (22) Line 746: 'infliction', I believe, should say 'inflection'.

      (23) Line 769: 'non-auditory responsive units': Shouldn't that simply say 'non-responsive units'? The way it is currently written I understand it to mean that these units were responsive (to some other modality perhaps) but not to auditory stimulation.

      (24) Line 791: 'bins [of] 50ms'.

      (25) Line 811: 'all of' > 'of all'.

      (26) Line 814: Looks like the previous paragraph on single unit analysis was accidentally repeated under the wrong heading.

      (27) Line 817: 'encoded' should say 'calculated', I believe.

      All the mistakes mentioned above were fixed. Thanks.

      (28) Line 869: 'bandwidth of excited units': Not sure I understand how exactly the bandwidth, i.e. tuning width was measured.

      We acknowledge that our previous answer was unclear and expanded the Methods section. To calculate bandwidth, we identified significant tone-evoked responses by comparing activity during the tone window to baseline firing rates at 62 dB SPL (p < 0.05). For each neuron, we counted the number of contiguous frequencies with significant excitatory responses, subtracting isolated false positives to correct for chance. We then converted this count into an octave-based bandwidth by multiplying the number of frequency bins by the octave spacing between them (0.1661 octaves per step).

      (29) Line 871: 'population sparseness': Is that the fraction of tone frequencies that produced a significant response? I would have thought that this measure is very highly correlated to your measure of bandwidth, to the point of being redundant, but I may have misunderstood how one or the other is calculated. Furthermore, the Y label of Figure 7f says 'responsiveness' rather than sparseness and that would seem to be the more appropriate term because, unless I am misunderstanding this, a larger value here implies that the neuron responded to more frequencies, i.e. in a less sparse manner.

      We have clarified the use of the term "population sparseness" and updated the Y-axis label in Figure 7f to better reflect this measure. This metric reflects the fraction of toneโ€“attenuation combinations that elicited a significant excitatory response across the entire population of neurons, not within individual units.

      While this measure is related to bandwidth, it captures a distinct property of the data. Bandwidth quantifies how broadly or narrowly a single neuron responds across frequencies at a fixed intensity, whereas population sparseness reflects how distributed responsiveness is across the population as a whole. Although the two measures are related, since broadly tuned neurons often contribute to lower population sparseness, they capture distinct aspects of neural coding and are not redundant.

      (30) Line 881: I think this line should refer to Figure 7h rather than 7g.

      Fixed.

      Reviewer #3 (Recommendations for the authors):

      (1) In the Educage, water was only available when animals engaged in the task; however, there is no mention of whether/how animal weight was monitored.

      In the Educage, mice had continuous access to water by voluntarily engaging in the task, which they could perform at any time. Although body weight was not directly monitored, water access was essentially ad libitum, and mice performed hundreds of trials per day, thereby ensuring sufficient daily intake. This approach allowed us to monitor hydration (ad libitum food is supplied in the home cage). The 24/7 setup, including automated monitoring of trial counts and water consumption, was reviewed and approved by our institutional animal care and use committee (IACUC).

      (2) In Figure 2B-C and Figure 2E, the y-axis reads "lick rate". At first glance, I took this to mean "the frequency of licking" (i.e. an animal typically licks at a rate of 5 Hz). However, what the authors actually are plotting here is the proportion of trials on which an animal elicited >= 5 licks during the response window (i.e. the proportion of "yes" responses). I recommend editing the y-axis and the text for clarity.

      We replaced the y-label and adjusted the figure legend (Fig. 2).

      (3) I didn't see any examples of raw (filtered) voltage traces. It would be worth including some to demonstrate the quality of the data.

      We have added an example of a filtered voltage trace aligned to tone onset in Fig. S4-1a to illustrate data quality. In addition, all raw and processed voltage traces, along with relevant analysis code, are available through our GitHub repository and the corresponding dataset on Zenodo.

      (4) The description of the calculation of bias (C) in the methods section (lines 749-750) is incorrect. The correct formula is C = -0.5 * [z(hit rate) + z(fa rate)]. I believe this is the formula that the authors used, as they report negative C values. Please clarify or correct.

      Thanks for spotting this. It is now corrected.

      (5) The authors use the terms 'naรฏve' and 'novice' interchangeably. I suggest sticking with one term to avoid potential confusion.

      (6) Multiple instances: "less trials/day" should be "fewer trials/day"

      (7) Supplementary Table 2: The values reported are proportions, not percentages. Please correct.

      (8) Line 270: Table 2 does not show the number of neurons in the dataset categorized by region. Perhaps the authors meant Supplementary Table 2?

      Fixed. Thank you for pointing these mistakes out.

      (9) Figure 5C: the data from the hard task are entirely obscured by the data from the easy task. I recommend splitting it into two different plots.

      We agree and split the decoding of the easy and the hard task into two graphs (left: easy task; right: hard task). Thank you!

      (10) How many mice contributed to each analyzed data set? Could the authors provide a breakdown in a table somewhere of how many neurons were recorded in each mouse and which ones were included in which analyses?

      We added an overview of the analyzed datasets in supplementary Table 7. Please note that the number of mice and neurons used in each analysis is also reported in the main text and legends. Importantly, all primary analyses were conducted using LME models, which explicitly account for hierarchical data structure and inter-mouse variability, thereby addressing potential concerns about data imbalance or bias.

    1. eLife Assessment

      This study presents valuable findings on the role of dopamine receptor D2R in dopaminergic neurons DAN-c1 and mushroom body neurons (Y201-GAL4 pattern) on aversive and appetitive conditioning. The evidence supporting the claims of the authors is solid in the context of their behavioural paradigm. Controls using a reciprocal training protocol would have broadened the scope of their conclusions. The work will be of interest to researchers studying the role of dopamine during learning and memory.

    2. Reviewer #1 (Public review):

      Summary:

      Both flies and mammals have D1-like and D2-like dopamine receptors, yet the role of D2-like receptors in Drosophila learning and memory remains underexplored. This paper investigates the role of the D2-like dopamine receptor D2R in single pairs of dopaminergic neurons (DANs) during single-odor aversive learning in the Drosophila larva. First, confocal imaging is used to screen GAL4 driver strains that drive GFP expression in just single pairs of dopaminergic neurons. Next, thermogenetic manipulations of one pair of DANs (DAN-c1) suggest that DAN-c1 activity during larval aversive learning is important. Confocal imaging is then used to reveal expression of D2R in the DANs and mushroom body of the larval brain. Finally, optogenetic activation during training phenocopies D2R knockdown in these neurons: aversive learning is impaired when DAN-c1 is targeted, while appetitive and aversive learning are impaired when the mushroom body is manipulated. Finally, a model is proposed in which D2R limits excessive dopamine release to facilitate successful olfactory learning.

      Strengths:

      The paper convincingly reproduces prior findings that demonstrated D2R knockdown in DL1 DANs or the mushroom body impairs aversive olfactory learning in Drosophila larvae (Qi and Lee, 2014; doi:10.3390/biology3040831). These previous findings were built upon and extended with a comprehensive confocal imaging screen of 57 GAL4 drivers that identified tools driving GFP expression in individual DANs. One of the drivers, R76F02-AD; R55C10-DBD, was consistently shown to label DAN-c1 neurons and no other DANs in the larval brain. Confocal imaging is also used to demonstrate that GFP-tagged D2R is expressed in most DANs and the mushroom body. Behavioral experiments demonstrate that driving D2R knockdown in DAN-c1 neurons impairs aversive learning, as do other loss-of-function manipulations of DAN-c1 neurons.

      Limitations:

      (1) The single-odor paradigm used to train larvae does not have the advantages of a more conventional balanced or reciprocal training paradigm. The paper describes how the single-odor experimental design could be controlled for non-associative effects, but does not provide an independent validation of the control experiments performed by a different research group using different odors and genotypes 15 to 20 years ago (see Honjo and Furukubo-Tokunaga, 2005; doi:10.1523/jneurosci.2135-05.2005 and Honjo and Furukubo-Tokunaga, 2009; doi:10.1523/jneurosci.1315-08.2009). Whether the involvement of DAN-c1 for aversive learning generalizes to standard paradigms remains unclear (see Eschbach et al., 2020; doi:10.1038/s41593-020-0607-9 and Weber et al., 2023; doi:10.7554/elife.91387.1).

      (2) In 11 of 22 larval brains examined in the paper, R76F02-AD; R55C10-DBD appears to drive GFP expression in 1 to 8 additional non-dopaminergic neurons (Figure S1P and Table S3). Of the remaining 11 brains, 4 of their corresponding ventral nerve cords also have expression in 2 to 4 neurons (Table S3). Therefore, experiments involving with the R76F02-AD; R55C10-DBD driver could be manipulating the activity of additional neurons in around 60% of larvae. The conclusions of the paper would be strengthened if key experiments were repeated with other GAL4 drivers that may label DAN-c1 with even greater specificity, such as SS03066 (Truman et al., 2023; doi:10.7554/elife.80594) or MB320C (Hige et al., 2015; doi:10.1016/j.neuron.2015.11.003).

      (3) Successful immunostaining with an anti-D2R antibody (Draper et al., 2007; doi:10.1002/dneu.20355 and Love et al., 2023; doi:10.1111/gbb.12836) could validate GFP-tagged D2R expression (Figure 3) in the same way that TH immunostaining was used throughout the paper to determine whether neurons were dopaminergic.

      (4) The paper proposes a model in which DAN-c1 activity conveys an aversive teaching signal (Figure 2f) but excessive artificial DAN-c1 activation causes excessive dopamine release that impairs aversive learning (Figures 2i and 5b). According to this model, thermogenetic DAN-c1 activation during training with water or sucrose conveys an aversive teaching signal that reduces performance (Figure 2i) whereas optogenetic DAN-c1 activation does not due to excessive dopamine release (Figures 5c and 5d). The model suggests that optogenetic DAN-c1 activation is strong enough to cause excessive dopamine release by itself whereas thermogenetic DAN-c1 activation can only achieve the same outcome when it occurs in conjunction with natural DAN-c1 activation evoked by quinine. Therefore, an experiment with weaker optogenetic DAN-c1 activation (with lower intensity light or pulsed at a lower frequency) during water or sucrose training would be expected to convey an aversive teaching signal rather than excessive dopamine release, reducing performance. Such an experiment could reconcile the differing thermogenetic and optogenetic results of the paper.

    3. Reviewer #2 (Public review):

      Summary:

      The study wanted to functionally identify individual DANs that mediate larval olfactory learning. Then search for DAN-specific driver strains that mark single dopaminergic neurons, which subsequently can be used to target genetic manipulations of those neurons. 56 GAL4 drivers identifying dopaminergic neurons were found (Table 1) and three of them drive the expression of GFP to a single dopaminergic neuron in the third-instar larval brain hemisphere. The DAN driver R76F02-AD;R55C10-DBD appears to drive the expression to a dopaminergic neuron innervating the lower peduncle (LP), which would be DAN-c1.

      Split-GFP reconstitution across synaptic partners (GRASP) technique was used to investigate the "direct" synaptic connections from DANs to the mushroom body. Potential synaptic contact between DAN-c1 and MB neurons (at the lower peduncle) were detected.

      Then single odor associative learning was performed and thermogenetic tools were used (Shi-ts1 and TrpA1). When trained at 34{degree sign}C, the complete inactivation of dopamine release from DAN-c1 with Shibirets1 impaired aversive learning (Figure 2h), while Shibirets1 did not affect learning when trained at room temperature (22 {degree sign}C). When paired with a gustatory stimulus (QUI or SUC), activation of DAN-c1 during training impairs both aversive and appetitive learning (Figure 2k).<br /> Then examined the expression pattern of D2R in fly brains and were found in dopaminergic neurons and the mushroom body (Figure 3). To inspect whether the pattern of GFP signals indeed reflected the expression of D2R, three D2R enhancer driver strains (R72C04, R72C08, and R72D03-GAL4) were crossed with the GFP-tagged D2R strain.

      D2R knockdown (UAS-RNAi) in dopaminergic neurons driven by TH-GAL4 impaired larval aversive learning. Using a microRNA strain (UAS-D2R-miR), a similar deficit was observed. Crossing the GFP-tagged D2R strain with a DAN-c1-mCherry strain demonstrated the expression of D2R in DAN-c1 (Figure 4a). Knockdown of D2R in DAN-c1 impaired aversive learning with the odorant pentyl acetate, while appetitive learning was unaffected (Figure 4e). Sensory and motor functions appear not affected by D2R suppression.

      To exclude possible chronic effects of D2R knockdown during development, optogenetics was applied at distinct stages of the learning protocol. ChR2 was expressed in DAN-c1, and blue light was applied at distinct stages of the learning protocol. Optogenetic activation of DAN-c1 during training impaired aversive learning, not appetitive learning (Figure 5b-d).

      Knockdown of D2Rs in MB neurons by D2R-miR impaired both appetitive and aversive learning (Figure 6a). Activation of MBNs during training impairs both larval aversive and appetitive learning.

      Finally, based on the data the authors propose a model where the effective learning requires a balanced level of activity between D1R and D2R (Figure 7).

      Strengths:

      The work is well written, clear, and concise. They use well documented strategies to examine GAL4 drivers with expression in a single DAN, behavioral performance in larvae with distinct genetic tools including those to do thermo and optogenetics in behaving flies. Altogether, the study was able to expand our understanding of the role of D2R in DAN-c1 and MB neurons in the larva brain.

      The study successfully examined the role of D2R in DAN-c1 and MB neurons in olfactory conditioning. The conclusions are well supported by the data and the model of adequate levels of cAMP (Figure 7b) appears to be able to explain a poor memory after insufficient or excessive cAMP signaling. The study provides insight into the role of D2R in associative learning expanding our understanding and might be a reference similarly to previous key findings (Qi and Lee, 2014, https://doi.org/10.3390/biology3040831).

    4. Author response:

      The following is the authorsโ€™ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Weakness#1: The authors claim to have identified drivers that label single DANs in Figure 1, but their confocal images in Figure S1 suggest that many of those drivers label additional neurons in the larval brain. It is also not clear why only some of the 57 drivers are displayed in Figure S1.

      As described in the Results section, we screened 57 GAL4 driver lines based on previous reports. These included drivers that had been shown to label a single dopaminergic neuron (DAN) or a small subset of DANs in the larval or adult brain hemisphere, suggesting potential for specific DAN labeling in larvae.

      In Figure 1, TH-GAL4 was used to cover all neurons in the DL1 cluster, while R58E02 and R30G08 were well known drivers for pPAM. Fly strains in Figure 1h, k, l, and m were reported as single DAN strains in larvae[1], while strains in Figure 1e, f, g were reported identifying only several DANs in adult brains[2,3]. We examined these strains and only some of them labeled single DANs in 3rd instar larval brain hemisphere (Figure 1f, g, h, l and m). Among them, only strains in Figure 1f and h labeled single DAN in the brain hemisphere, without labeling other non-DANs. Other strains labeled non-DANs in addition to single DANs (Figure 1g, l and m). Taking ventral nerve cord (VNC) into consideration, strain in Figure 1h also labeled neurons in VNC (Figure S1e), while strain in Figure 1f did not (Figure S1c).

      In summary, the driver shown in Figure 1f (R76F02AD;R55C10DBD, labeling DAN-c1) is the only line we identified that labels a single DAN in the 3rd instar larval brain hemisphere without additional labeling. The other lines shown in Figure 1 (g, h, l, m) label a single DAN but also include some non-DANs. Figure 1 focuses on strains that label a single or a pair of DANs.

      Labeling patterns for all 57 driver lines are summarized in Table 1. Figure S1 includes representative examples; full confocal images for all screened strains are available upon request, as stated in the figure legend.

      Weakness #2: Critically, R76F02-AD; R55C10-DBD labels more than one neuron per hemisphere in Figure S1c, and the authors cite Xie et al. (2018) to note that this driver labels two DANs in adult brains. Therefore, the authors cannot argue that the experiments throughout their paper using this driver exclusively target DAN-c1.

      Figure S1c shows a single dopaminergic (DA) neuron in each brain hemisphere. While additional GFP-positive signals were occasionally observed, they did not originate from the cell bodies of DA neurons, as these were not labeled by the tyrosine hydroxylase (TH) antibody. These additional GFP signals primarily appeared to be neurites, including axonal terminals, although we cannot rule out the possibility that some represent false-positive signals or weakly stained non-neuronal cell bodies. This interpretation is based on the analysis of 22 third-instar larval brains.

      To clarify this point in the manuscript, we added the following sentence to the Results section: โ€œBased on the analysis of 22 brain samples, we observed this driver strain labels one neuron per hemisphere in the third-instar larval brain (Figure 2aโ€“d, Figure S1c, Table S3).โ€ Additionally, Table S3 was included to summarize the DAN-c1 labeling pattern across all 22 samples. An enlarged inset highlighting GFP-positive signals was also added to Figure S1c.

      Weakness #3: Missing from the screen of 57 drivers is the driver MB320C, which typically labels only PPL1-ฮณ1pedc in the adult and should label DAN-c1 in the larva. If MB320C labels DAN-c1 exclusively in the larva, then the authors should repeat their key experiments with MB320C to provide more evidence for DAN-c1 involvement specifically.

      We thank the reviewer for this insightful suggestion. The MB320C driver primarily labels the PPL1-ฮณ1pedc neuron in the adult brain, along with one or two additional weakly labeled cells. It would indeed be interesting to examine the expression pattern of this driver in third-instar larval brains. If it is found to label only DAN-c1 at this stage, we could consider using it to knock down D2R and assess whether this recapitulates our current findings.

      While we agree that this is a promising direction for future studies, we believe it is not essential for the current manuscript, given the specificity of the DAN-c1 driver (please see our response to Reviewer #3 for details). Nonetheless, we appreciate the reviewerโ€™s suggestion, and we recognize that MB320C could be a valuable tool for future experiments.

      Weakness #4: The authors claim that the SS02160 driver used by Eschbach et al. (2020) labels other neurons in addition to DAN-c1. Could the authors use confocal imaging to show how many other neurons SS02160 labels? Given that both Eschbach et al. and Weber et al. (2023) found no evidence that DAN-c1 plays a role in larval aversive learning, it would be informative to see how SS02160 expression compares with the driver the authors use to label DAN-c1.

      We did not have our own images showing DANs in brains of SS02160 driver cross line. However, Extended Data Figure 1 in the paper of Eschbach et al. shows strongly labeled four neurons on each brain hemisphere[4], indicating that this driver is not a strain only labeling one neuron, DAN-c1.

      Weakness #5: The claim that DAN-c1 is both necessary and sufficient in larval aversive learning should be reworded. Such a claim would logically exclude any other neuron or even the training stimuli from being involved in aversive learning (see Yoshihara and Yoshihara (2018) for a detailed discussion of the logic), which is presumably not what the authors intended because they describe the possible roles of other DANs during aversive learning in the discussion.

      We agree with the reviewer that the terms โ€œnecessaryโ€ and โ€œsufficientโ€ may be too exclusive and could unintentionally exclude contributions from other neurons. As noted in the Discussion section, we acknowledge that additional dopaminergic neurons may also play roles in larval aversive learning. To reflect this, we have revised our wording to use โ€œimportantโ€ and โ€œmediatesโ€ instead of the more definitive terms โ€œnecessaryโ€ and โ€œsufficient,โ€ making our conclusions more accurate and appropriately measured.

      Weakness #6: Moreover, if DAN-c1 artificial activation conveyed an aversive teaching signal irrespective of the gustatory stimulus, then it should not impair aversive learning after quinine training (Figure 2k). While the authors interpret Figure 2k (and Figure 5) to indicate that artificial activation causes excessive DAN-c1 dopamine release, an alternative explanation is that artificial activation compromises aversive learning by overriding DAN-c1 activity that could be evoked by quinine.

      This is an excellent point, and we agree that we cannot rule out the possibility that artificial activation interferes with aversive learning by overriding the natural activity of DAN-c1 that would normally be evoked by quinine. The observed results with TRPA1 could potentially be attributed to dopamine depletion, inactivation due to prolonged depolarization, or neural adaptation. However, we believe that our hypothesis - that over-excitation of DAN-c1 impairs learning - is more consistent with our experimental findings and with previously published data. Our rationale is as follows: (1) Associative learning in larvae occurs only when the conditioned stimulus (CS, e.g., an odor such as pentyl acetate) and unconditioned stimulus (US, e.g., quinine) are paired. In wild-type larvae, the CS depolarizes a subset of Kenyon cells in the mushroom body (MB), while the US induces dopamine (DA) release from DAN-c1 into the lower peduncle (LP) compartment (Figure 7a). When both stimuli coincide, calcium influx from CS activation and Gฮฑs signaling via D1-type dopamine receptors activate the MB-specific adenylyl cyclase, rutabaga, which functions as a coincidence detector (Figure 7d). (2) Rutabaga converts ATP to cAMP, activating the PKA signaling pathway and modifying synaptic strength between Kenyon cells and mushroom body output neurons (MBONs) (Figure 7d). These changes in synaptic strength underlie learned behavioral responses to future presentations of the same odor. (3) Our results show that D2R is expressed in DAN-c1, and that D2R knockdown impairs aversive learning. Since D2Rs typically inhibit neuronal excitability and reduce cAMP levels[5], we hypothesize that D2R acts as an autoreceptor in DAN-c1 to restrict DA release. When D2R is knocked down, this inhibition is lifted, leading to increased DA release in response to the US (quinine). The resulting excess DA, in combination with CS-induced calcium influx, would elevate cAMP levels in Kenyon cells excessively - disrupting normal learning processes (Figure 7b). This is supported by studies showing that dunce mutants, which have elevated cAMP levels, also exhibit aversive learning deficits[6]. (4) The TRPA1 activation results are consistent with our over-excitation model. When DAN-c1 was artificially activated at 34ยฐC in the distilled water group, this mimicked the natural activation by quinine, producing an aversive learning response toward the odor (Figure 2k or new Figure 2i, DW group). Similarly, in the sucrose group, artificial activation mimicked quinine, producing a learning response that reflected both appetitive and aversive conditioning (Figure 2k, SUC group). (5) Over-excitation impairs learning in the quinine group. When DAN-c1 was activated during quinine exposure, both artificial and natural activation combined to produce excessive DA release. This over-excitation likely disrupted the cAMP balance in Kenyon cells, impairing learning and resulting in failure of aversive memory formation (Figure 2k, QUI group). This phenotype closely mirrors the effect of D2R knockdown in DAN-c1. (6) Optogenetic activation of DAN-c1 during aversive training similarly produced elevated DA levels due to both natural and artificial stimulation. This again would result in MBN over-excitation and a corresponding learning deficit. When optogenetic activation occurred during non-training phases (resting or testing), no additional DA was released during training, and aversive learning remained intact (Figure 5b). (7) Notably, when optogenetic activation was applied during training, we observed no aversive learning in the distilled water group and no reduction in the sucrose group (Figure 5c, 5d). We interpret this as evidence that the optogenetic stimulation was strong enough to cause elevated DA release in both groups, impairing learning in a manner similar to D2R knockdown or TRPA1 overactivation. (8) We extended this over-excitation framework to directly activate Kenyon cells (MBNs). Since MBNs are involved in both appetitive and aversive learning, their over-excitation disrupted both types of learning (Figure 6), further supporting our hypothesis. In summary, we propose that DAN-c1 activity is tightly regulated by D2R autoreceptors to ensure appropriate levels of dopamine release during aversive learning. Disruption of this regulation - either through D2R knockdown or artificial overactivation of DAN-c1 - results in excessive DA release, over-excitation of Kenyon cells, and impaired learning. This over-excitation model is consistent with both our experimental results and prior literature.

      Weakness #7: The authors should not necessarily expect that D2R enhancer driver strains would reflect D2R endogenous expression, since it is known that TH-GAL4 does not label p(PAM) dopaminergic neurons.

      Just like the example of TH-GAL4, it is possible that the D2R driver strains may partially reflect the expression pattern of endogenous D2R in larval brains. When we crossed the D2R driver strains with the GFP-tagged D2R strain, however, we observed co-localization in DM1 and DL2b dopaminergic neurons, as well as in mushroom body neurons (Figure S3c to h). In addition, D2R knockdown with D2R-miR directly supported that the GFP-tagged D2R strain reflected the expression pattern of endogenous D2R (Figure 4b to d, signals were reduced in DM1). In summary, we think the D2R driver strains supported the expression pattern we observed from the GFP-tagged D2R strain, especially in DM1 DANs.

      Weakness #8: Their observations of GFP-tagged D2R expression could be strengthened with an anti-D2R antibody such as that used by Lam et al., (1999) or Love et al., (2023).

      Love et al. (2023) used the antibody originally described by Draper et al.[6]. We attempted to use the same antibody in our experiments; however, we were unable to detect clear signals following staining. This may be due to a lack of specificity for neurons in the Drosophila larval brain or incompatibility with our staining protocol. Unfortunately, we were unable to locate a copy of the Lam (1999) paper for further reference.

      Weakness #9: Finally, the authors could consider the possibility other DANs may also mediate aversive learning via D2R. Knockdown of D2R in DAN-g1 appears to cause a defect in aversive quinine learning compared with its genetic control (Figure S4e). It is unclear why the same genetic control has unexpectedly poor aversive quinine learning after training with propionic acid (Figure S5a). The authors could comment on why RNAi knockdown of D2R in DAN-g1 does not similarly impair aversive quinine learning (Figure S5b).

      We re-analyzed the data related to DAN-g1. Interestingly, knockdown of D2R in DAN-g1 larvae trained with quinine (QUI) showed a significant difference in response index (R.I.) compared to the distilled water (DW) control group. However, it also differed significantly from the DAN-g1 genetic control group trained with QUI (two-way ANOVA with Tukeyโ€™s multiple comparisons, p = 0.0002), while it was not significantly different from the UAS-D2R-miR genetic control group (p = 0.2724). Furthermore, knockdown of D2R in DAN-g1 did not lead to aversive learning deficits when larvae were trained with a different odorant, propionic acid (ProA; Figure S5a). Similarly, using an RNAi line to knock down D2R in DAN-g1 did not result in learning impairment when larvae were trained with pentyl acetate (PA; Figure S5b). These inconsistencies may stem from differences in stimulus intensity across odorants, as well as the variable efficiency of the knockdown strategies (microRNA vs. RNAi). Based on these results, we propose that D2Rs in DAN-g1 may modulate larval aversive learning in a quantitative manner but do not play as critical a role as those in DAN-c1, where knockdown produces a clear qualitative effect. We have added this paragraph to the Discussion section of the manuscript.

      Reviewer #2 (Public review):

      Weakness#1: Is not completely clear how the system DAN-c1, MB neurons and Behavioral performance work. We can be quite sure that DAN-c1;Shits1 were reducing dopamine release and impairing aversive memory (Figure 2h). Similarly, DAN-c1;ChR2 were increasing dopamine release and also impaired aversive memory (Figure 5b). However, is not clear what is happening with DAN-c1;TrpA1 (Figure 2K). In this case the thermos-induction appears to impair the behavioral performance of all three conditions (QUI, DW and SUC) and the behavior is quite distinct from the increase and decrease of dopamine tone (Figure 2h and 5b).

      The study successfully examined the role of D2R in DAN-c1 and MB neurons in olfactory conditioning. The conclusions are well supported by the data, with the exception of the claim that dopamine release from DAN-c1 is sufficient for aversive learning in the absence of unconditional stimulus (Figure 2K). Alternatively, the authors need to provide a better explanation of this point.

      Please refer to our response to Weakness #6 of Reviewer #1 above.

      Reviewer #3 (Public review):

      Weakness #1: It is a strength of the paper that it analyses the function of dopamine neurons (DANs) at the level of single, identified neurons, and uses tools to address specific dopamine receptors (DopRs), exploiting the unique experimental possibilities available in larval Drosophila as a model system. Indeed, the result of their screening for transgenic drivers covering single or small groups of DANs and their histological characterization provides the community with a very valuable resource. In particular the transgenic driver to cover the DANc1 neuron might turn out useful. However, I wonder in which fraction of the preparations an expression pattern as in Figure 1f/ S1c is observed, and how many preparations the authors have analyzed. Also, given the function of DANs throughout the body, in addition to the expression pattern in the mushroom body region (Figure 1f) and in the central nervous system (Figure S1c) maybe attempts can be made to assess expression from this driver throughout the larval body (same for Dop2R distribution).

      We thank the reviewer for the positive comments and thoughtful suggestions.

      Regarding the R76F02AD; R55C10DBD strain, we examined 22 third instar larval brains expressing GFP, Syt-GFP, or Den-mCherry. All brains clearly labeled DAN-c1. In approximately half of the samples, only DAN-c1 was labeled. In the remaining samples, 1 to 5 additional weakly labeled soma were observed, typically without associated neurites. Only 1 or 2 strongly labeled non-DAN-c1 cells were occasionally detected. These additional labeled neurons were rarely dopaminergic. In the ventral nerve cord (VNC), 8 out of 12 samples showed no labeled cells. The remaining 4 samples had 2โ€“4 strongly labeled cells. These results support our conclusion that the R76F02AD; R55C10DBD combination predominantly and specifically labels DAN-c1 in the third instar larval brain. As for the reviewerโ€™s question about the expression pattern of R76F02AD; R55C10DBD and D2R in the larval body, we agree that this is a very interesting avenue for further investigation. However, our current study is focused on the central nervous system and larval learning behaviors. We hope to explore this question more fully in future work.

      We added the following sentence to the Results section: โ€œBased on analysis of 22 brain samples, we believe this driver strain consistently labels one neuron per hemisphere in the third-instar larval brain (Figure 2a - d, Figure S1c, Table S3).โ€ In addition, we included Table S3 to summarize the DAN-c1 labeling patterns observed across these samples.

      Weakness #2: A first major weakness is that the main conclusion of the paper, which pertains to associative memory (last sentence of the abstract, and throughout the manuscript), is not justified by their evidence. Why so? Consider the paradigm in Figure 2g, and the data in Figure 2h (22 degrees, the control condition), where the assay and the experimental rationale used throughout the manuscript are introduced. Different groups of larvae are exposed, for 30min, to an odour paired with either i) quinine solution (red bar), ii) distilled water (yellow bar), or iii) sucrose solution (blue bar); in all cases this is followed by a choice test for the odour on one side and a distilled-water blank on the other side of a testing Petri dish. The authors observe that odour preference is low after odour-quinine pairing, intermediate after odour-water pairing and high after odour-sucrose pairing. The differences in odour preference relative to the odour-water case are interpreted as reflecting odour-quinine aversive associations and odour-sucrose appetitive associations, respectively. However, these differences could just as well reflect non-associative effects of the 30-min quinine or sucrose exposure per se (for a classical discussion of such types of issues see Rescorla 1988, Annu Rev Neurosci, or regarding Drosophila Tully 1988, Behav Genetics, or with some reference to the original paper by Honjo & Furukubo-Tokunaga 2005, J Neurosci that the authors reference, also Gerber & Stocker 2007, Chem Sens).

      As it stands, therefore, the current 3-group type of comparison does not allow conclusions about associative learning.

      We adopted the single-odor larval learning paradigm from Honjo et al., who first developed and validated this method for studying larval olfactory associative learning7,8. To address the reviewerโ€™s concern regarding potential non-associative effects from 30-minute exposure to quinine or sucrose, we refer to multiple lines of evidence provided in Honjoโ€™s studies: (1) Honjo et al. demonstrated that only larvae receiving paired presentations of odor and unconditioned stimulus (quinine or sucrose) exhibited learned responses. Exposure to either stimulus alone, or temporally dissociated presentations, failed to induce any learning response. (2) When tested with a second, non-trained odorant, larvae only responded to the odorant previously paired with the unconditioned stimulus. This rules out generalized olfactory suppression and confirms odor-specific associative learning. (3) Well-characterized learning mutants (e.g., rutabaga, dunce) that show deficits in adult reciprocal odor learning also failed to exhibit learned responses in this single-odor paradigm, further supporting its validity. (4) In our study, we used two distinct odorants (pentyl acetate and propionic acid) and two independent D2R knockdown approaches (UAS-miR and UAS-RNAi). We consistently observed that D2R knockdown in DAN-c1 impaired aversive learning. Importantly, naรฏve olfactory, gustatory, and locomotor assays ruled out general sensory or motor defects. Comparisons with control groups (odor paired with distilled water) also ruled out non-associative effects such as habituation. Taken together, these results strongly support that the single-odor paradigm is a robust and reliable assay for assessing larval olfactory associative learning in Drosophila. We have added a section in the Discussion to clarify and defend the use of this paradigm in our study.

      Weakness #3: A second major weakness is apparent when considering the sketch in Figure 2g and the equation defining the response index (R.I.) (line 480). The point is that the larvae that are located in the middle zone are not included in the denominator. This can inflate scores and is not appropriate. That is, suppose from a group of 30 animals (line 471) only 1 chooses the odor side and 29, bedazzled after 30-min quinine or sucrose exposure or otherwise confused by a given opto- or thermogenetic treatment, stay in the middle zone... a P.I. of 1.0 would result.

      We gave 5 min during the testing stage to allow the larvae to wander on the testing plate. Under most conditions, more than half of larvae (>50%) will explore around, and the rest may stay in the middle zone (will not be calculated). We used 25-50 larvae in each learning assay, so finally around 10-30 larvae will locate in two semicircular areas. Indeed, based on our raw data, a R.I. of 1 seldom appears. Most of the R.I.s fall into a region from -0.2 to 0.8. We should admit that the calculation equation of R. I. is not linear, so it would be sharper (change steeply) when it approaches -1 and 1. However, as most of the values fall into the region from -0.2 to 0.8, we think โ€˜border effectsโ€™ can be neglected if we have enough numbers of larvae in the calculation (10-30).

      Weakness #4: Unless experimentally demonstrated, claims that the thermogenetic effector shibire/ts reduces dopamine release from DANs are questionable. This is because firstly, there might be shibire/ts-insensitive ways of dopamine release, and secondly because shibire/ts may affect co-transmitter release from DANs.

      Shibire<sup>ts1</sup> gene encodes a thermosensitive mutant of dynamin, expressing this mutant version in target neurons will block neurotransmitter release at the ambient temperature higher than 30C, as it represses vesicle recycling[7]. It is a widely used tool to examine whether the target neuron is involved in a specific physiological function. We cannot rule out that there might be Shibire<sup>ts1</sup> insensitive ways of dopamine release exist. However, blocking dopamine release from DAN-c1 with Shibire<sup>ts1</sup> has already led to learning responses changing (Figure 2h). This result indicated that the dopamine release from DAN-c1 during training is important for larval aversive learning, which has already supported our hypothesis.

      For the second question about the potential co-transmitter release, we think it is a great question. Recently Yamazaki et al. reported co-neurotransmitters in dopaminergic system modulate adult olfactory memories in Drosophila[9], and we cannot rule out the roles of co-released neurotransmitters/neuropeptides in larval learning. Ideally, if we could observe the real time changes of dopamine release from DAN-c1 in wild type and TH knockdown larvae would answer this question. However, live imaging of dopamine release from one dopaminergic neuron is not practical for us at this time. On the other hand, the roles of dopamine receptors in olfactory associative learning support that dopamine is important for Drosophila learning. D1 receptor, dDA1, has been proven to be involved in both adult and larval appetitive and aversive learning[10,11]. In our work, D2R in the mushroom body showed important roles in both larval appetitive and aversive learning (Figure 6a). All this evidence reveals the importance of dopamine in Drosophila olfactory associative learning. In addition, there is too much unknow information about the co-release neurotransmitter/neuropeptides, as well as their potential complex โ€˜interaction/crosstalkโ€™ relations. We believe that investigation of co-released neurotransmitter/neuropeptides is beyond the scope of this study at this time.

      Weakness #5: It is not clear whether the genetic controls when using the Gal4/ UAS system are the homozygous, parental strains (XY-Gal4/ XY-Gal4 and UAS-effector/ UAS-effector), or as is standard in the field the heterozygous driver (XY-Gal4/ wildtype) and effector controls (UAS-effector/ wildtype) (in some cases effector controls appear to be missing, e.g. Figure 4d, Figure S4e, Figure S5c).

      Almost all controls we used were homozygous parental strains. They did not show abnormal behaviors in either learnings or naรฏve sensory or locomotion assays. The only exception is the control for DAN-c1, the larvae from homozygous R76F02AD; R55C10DBD strain showed much reduced locomotion speed (Figure S6). To prevent this reduced locomotion speed affecting the learning ability, we used heterozygous R76F02AD; R55C10DBD/wildtype as control, which showed normal learning, naรฏve sensory and locomotion abilities (Figure 4e to i).

      For Figure 4d, it is a column graph to quantify the efficiency of D2R knockdown with miR. Because we need to induce and quantify the knockdown effect in specific DANs (DM1), only TH-GAL4 can be used as the control group, rather than UAS-D2R-miR. For the missing control groups in Figure S4e and S5c, we have shown them in other Figures (Figure 4e).

      We described this in the Materials and Methods part, โ€œAll control strains used in learning assays were homozygous (except DAN-c1ร—WT), while all experimental groups (D2R knockdown and thermogenetics) used were heterozygous by crossing the corresponding control strainsโ€.

      We also re-organized the Figure S4e and S5c along with the control groups to make it easier to understand.

      Weakness #6: As recently suggested by Yamada et al 2024, bioRxiv, high cAMP can lead to synaptic depression (sic). That would call into question the interpretation of low-Dop2R leading to high-cAMP, leading to high-dopamine release, and thus the authors interpretation of the matching effects of low-Dop2R and driving DANs.

      We appreciate the reviewerโ€™s suggestion. We read through this literature, which also addresses the question we mentioned in the Discussion section, about the discrepancy between the cAMP elevation in the mushroom body neurons and the reduced MBN-MBON synaptic plasticity after olfactory associative learning in Drosophila. The author gave an explanation to the existing D1R-cAMP elevation-MBN-MBON LTD axis, which is really helpful to our understanding about the learning mechanism. However, unfortunately, we do not think this offers a possible explanation for our D2R-related mechanisms. We added this literature into our citation.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Throughout the behavioral experiments, a defect in aversive learning is defined as a relative increase in the response index (RI) after olfactory training with quinine (red) and a defect in appetitive learning as a relative decrease in RI after training with sucrose (blue). Training with distilled water (yellow) is intended to be a control for comparisons within genotypes/treatment groups but causes interpretation issues if it is also affected by experimental manipulations.

      The authors typically make comparisons between quinine, water, and sucrose within each group, but this often forces readers to infer the key comparisons of interest. For example, the key comparison in Figure 2h is the statistically significant difference between the red groups, which differ only in the temperature used during training. Many other figure panels in the paper would also benefit from more direct statistical comparisons, particularly Figure 2k.

      While I recognize the value of the water control, I strongly recommend that the authors make statistical comparisons directly between genotypes/treatment groups where possible and to interpret results with more caution when the water RI score differs substantially between groups. Also, since the authors are conducting two-way ANOVAs before Dunnett's multiple comparisons tests, they ideally should report the p-value for the main effect of each factor, plus the interaction p-value between the two factors before making multiple comparisons.

      We appreciate the reviewerโ€™s suggestion. In response, we re-analyzed all learning assay data in Figures 2 and 4 using two-way ANOVA followed by Tukeyโ€™s multiple comparisons test. Unlike our previous analysis, which only compared each experimental group to its corresponding DW control, we now compared all groups against one another. First, we found that most R.I. values from different temperature conditions (Figure 2) or genotypes (Figure 4) trained with DW were not significantly different, with the exception of the data in Figure 2i (formerly Figure 2k; discussed further below). The R.I. from DAN-c1 ร— D2R-miR larvae trained with QUI was significantly different from both genotype control groups (DAN-c1 ร— WT and UAS-D2R-miR), while no significant difference was observed between the two controls trained with QUI. Thus, this more comprehensive statistical approach supports the conclusions we previously reported. Second, as the reviewer noted, the new analysis allows for a more direct interpretation of our findings. For example, in the thermogenetic experiments using the Shibire<sup>ts1</sup> strain, the R.I. of DAN-c1 ร— UAS-Shibire<sup>ts1</sup> larvae trained with QUI at 34ยฐC was not significantly different from the DW group at 34ยฐC, but was significantly different from the QUI group at 22ยฐC. Both findings support our conclusion that blocking dopamine release from DAN-c1 impairs larval aversive learning (Figure 2f).

      In the dTRPA1 activation experiments, the R.I. of DAN-c1 ร— UAS-dTRPA1 larvae trained with DW at 34ยฐC was significantly lower than that of the DW group at 22ยฐC and the QUI group at 34ยฐC, but not significantly different from the QUI group at 22ยฐC (Figure 2i). These results indicate that activating DAN-c1 during training is sufficient to drive aversive learning even in the absence of QUI. Interestingly, when DAN-c1 ร— UAS-dTRPA1 larvae were trained with QUI at 34ยฐC, their R.I. was significantly higher than that of the DW group at 34ยฐC and significantly different from the QUI group at 22ยฐC, but not significantly different from the DW group at 22ยฐC (Figure 2i). We interpret this as evidence that simultaneous activation of DAN-c1 by both QUI and dTRPA1 leads to over-excitation, which in turn impairs aversive learning.

      We have revised the figures (Figures 2, 4, 5, and 6) and updated the corresponding Results sections to reflect this new statistical analysis. Additionally, we now report the p-values for interaction, row factor, and column factor - either in Table S4 (for Figure 2) or in the figure captions for Figures 4, 5, 6, S4, S5, and S7.

      (2) The authors' motivation to find tools that label DANs other than DAN-c1 was unclear until much later in the paper when I saw the screening experiments in Figures S4 and S5. The authors could provide a clearer justification for why they focus on DAN-c1 in Figure 2 rather than another DAN for which they found a specific driver in Figure 1. The motivation for looking at individual pPAM neurons was also unclear.

      We sincerely appreciate the reviewerโ€™s thoughtful suggestion. Our study was initially motivated by the goal of characterizing the expression pattern of D2R in the larval brain. From there, we aimed to identify DAN drivers that label specific pairs of dopaminergic neurons, enabling us to assess the functional role of D2R in distinct DAN subtypes through targeted knockdown experiments. This approach ultimately led us to focus on DAN-c1, as it was the only neuronal population for which D2R knockdown resulted in a learning deficit. We then returned to examine the functional significance of DAN-c1 in aversive learning. While we recognize that a more comprehensive narrative might be desirable, the current structure of our manuscript reflects the most logical progression of our work based on our research priorities and experimental outcomes. We did explore alternative manuscript structures - such as beginning with the D2R expression pattern - but found that the current format best conveys our findings and rtionale.

      Regarding our motivation to study individual PAM neurons: we aimed to identify whether D2R plays a role in a specific pair of pPAM neurons involved in larval appetitive learning. However, we were unable to find a driver that exclusively labels DAN-j1, which we believe to be the key neuron in this context (see Figure 1). As a result, our investigation into appetitive learning did not progress beyond the observation of D2R expression in pPAM neurons (Figure 3d), and we did not proceed with learning assays in this context. While we acknowledge the limitations of our study, we believe that our focus on DAN-c1 is well-justified based on both our findings and the tools currently available. We respectfully note that a major restructuring of the manuscript would not necessarily clarify the rationale for focusing on DAN-c1, and therefore we have maintained the current organization.

      (3) The authors should also double-check and update the expression patterns of the drivers in Table 1 using references such as the FlyLight online resource. For example, MB438B labels PPL1-ฮฑ'2ฮฑ2, PPL1-ฮฑ3, PPL1-ฮณ1pedc according to FlyLight, not just PPL1-ฮณ1pedc as initially reported by Aso and Hattori et al. (2014).

      We appreciate the reviewerโ€™s suggestion. We have double-checked and updated the driver expression patterns in Table 1, using FlyLight data as a reference.

      (4) Interpreting overlaid green-and-red fluorescence confocal images would be difficult for any colorblind readers; I suggest that the authors consider using a more friendly color set.

      We thank the reviewer for the suggestion. In our study, we need three distinct colors to represent different channels. We also tested an alternative color scheme using and cyan , magenta, and yellow (CMY) instead of the standard red, green, and blue (RGB). As a comparison (see below), we used a R76F02AD;R55C10DBD (DAN-c1) GFP-labeled brain as an example. In our evaluation, the RGB combination provided clearer visualization and appeared more natural, while the CMY scheme looked somewhat artificial. Therefore, we decided to retain the original RGB color scheme and did not modify the colors in the figures.

      Author response image 1.

      (5) For Figure 4d, counting each DAN as an individual N would violate the assumption of independence made by the unpaired t test, since multiple DANs are found in each brain and therefore are not independent. Instead, it would be better to count each individual N as the average intensity of the four DANs measured in each brain.

      We revised the analysis of microRNA efficiency by averaging the fluorescence intensity of DANs within each brain, treating each brain as a single sample. Based on this approach, we re-plotted Figure 4d.

      (6) Finally, the authors ought to make it clearer throughout the paper that they have implicated a pair of DAN-c1 neurons in aversive learning, not just a single DAN as currently stated in the title.

      We thank the reviewer for the suggestion about the phrase we are using under this scenario. We have changed all โ€œsingle neuronโ€ to โ€œa pair of neuronsโ€.

      Reviewer #2 (Recommendations for the authors):

      (1) The results section presents: "Activation of DAN-c1 with dTRPA1 at 34ยฐC during training induced repulsion to PA in the distilled water group (Figure 2k). These data suggested that DAN-c1 excitation and presumably increased dopamine release is sufficient for larval aversive learning in the absence of gustatory pairing."<br /> An alternative interpretation is that 30 min of TrpA activation depletes synaptic vesicle pool, or inactivates neurons because of prolonged depolarization, or DAN shows firing rate adaptation (e.g. see Pulver et al. 2009; doi:10.1152/jn.00071.2009). In such a case DA release would be reduced and not increased. Therefore, the interpretation that DAN-c1 activation is both necessary and sufficient in larval aversive learning is difficult to be sustained.

      In this regard it is important to know how the sensory motor abilities are during a thermos-induction at 34ยฐC during 30 min.

      We thank the reviewer for the thoughtful suggestion. Regarding the concern about potential dopamine depletion or neuronal inactivation, we believe a comparison with the Shibire<sup>ts1</sup> experiments helps clarify the interpretation. Activation of Shibire<sup>ts1</sup> during training with distilled water did not result in aversive learning (Figure 2f), which is a distinct phenotype from that observed with dTRPA1 activation (Figure 2i). This suggests that the phenotypes seen with dTRPA1 activation are not due to reduced dopamine release. Additionally, as the reviewer suggested, we have revised our conclusion to state that โ€œDAN-c1 is important for larval aversive learning,โ€ rather than claiming it is both necessary and sufficient.

      (2) The GRASP system can label the contact of a cell in close proximity like synaptic contacts, but also other situations like no synaptic contact. It would be useful to use a more specific synaptic labelling tool, like the trans-synaptic tracing system (Talay et al., 2017ย https://doi.org/10.1016/j.neuron.2017.10.011), which provides a better label of synaptic contact.

      We really appreciate the reviewerโ€™s suggestion. First, we acknowledge that there are four general methods to reveal synaptic connections between neurons: immunohistochemistry (IHC), neuron labeling, viral tracing, GRASP, and electron microscopy (EM). Among these, IHC is not sufficiently convincing, viral tracing is challenging and rarely used in Drosophila, and EM, while the most accurate, is prohibitively expensive for our current goals. For these reasons, we chose the GRASP system to demonstrate the synaptic connections from dopaminergic neurons to the mushroom body. Second, we utilized an activity-dependent version of the GRASP system, linking split-GFP1-10 with synaptic proteins (e.g., synaptobrevin)[12] rather than with cell surface proteins like CD4 or CD8. This version significantly reduces false positive signals compared to the previous version, which was tagged with cell surface proteins. While we admit that this method does not provide as solid evidence of synaptic connections as EM, it is the most efficient method available to us for showing the synaptic connections from dopaminergic neurons to the mushroom body. Finally, we thank the reviewer for suggesting the literature on trans-synaptic tracing methods. Unfortunately, this method is not suitable for our goal, as it labels the entire postsynaptic neuron. In our study, we use GRASP to identify the specific dopaminergic neurons based on the synaptic locations and compartments within the mushroom body lobe. We require a labeling system at the subcellular level because, as noted, DAN-c1 forms synapses specifically in the lower peduncle (LP) of the mushroom body lobe, which is part of the axonal bundles from mushroom body neurons. Using the trans-synaptic tracing method would label the entire mushroom body, making it impossible to distinguish DAN-c1 from other DL1 dopaminergic neurons.

      (3) Previously, Honjo et al (2009) used a petri dish of 8.5 cm and a filter paper for reinforcement of 5.5 cm. In this study the petri dish was 10 cm and the size of the filter paper was not informed. That is important information because it will determine the probability of conditioning.

      A piece of filter paper (0.25cm<sup>2</sup> square) was used to hold odorants in this study. We have added this information to the Materials and Methods.

      (4) Statistic analysis of Behavioral performance of Fig 2H-I was made by ANOVA followed by Dunnett multiple comparisons test. Which was the control group? In each graph 2 independent Dunnett tests were performed against the DW control group?

      We have re-analyzed the data using a two-way ANOVA followed by Tukeyโ€™s multiple comparison test, as suggested by Reviewer #1. In Figure 2f-j (previously Figure 2h-l), the DW groups serve as the control groups. In our new analysis, we compared data across all groups using Tukeyโ€™s multiple comparison test, with particular focus on comparisons to the corresponding DW control groups.

      (5) The sample size in staining experiments of figures 1-4 were not informed.

      We have added Table S2 in the supplementary materials to provide the N numbers for brain samples used in the figures.

      (6) Color code in Fig 5 is missing, I assumed that is the same as in figure 4e

      We added color code in the figure legend of Figure 5.

      (7) Line 506 "0.1% QH solutions" should be 0.1% QUI solutions

      Changed.

      (8) There is no information on the availability of data

      We added Data Availability Statement: Data will be made available on request.

      Reviewer #3 (Recommendations for the authors):

      (1) Axes of behavioural experiments should better show the full span of possible values (-1;1) to allow a fair assessment.

      We have adjusted the axes in all learning assay graphs to a range from -1 to 1 for consistency and clarity.

      (2) Ns should better be given within the figures.

      We have added Table S2 in the supplementary materials to provide the N numbers for brain samples used in the figures. Additionally, Tables S4 to S6 include the N numbers for the learning assays. While we initially considered including the N numbers within the figure captions, we found it challenging to present this information clearly and efficiently. Therefore, we decided to summarize the N numbers in the tables instead.

      (3) Dot- or box-plots would be better for visualizing the data than means and SEMs.

      We agree with the reviewerโ€™s suggestion. In the behavioral assay graphs, both dot plots and mean ยฑ SEM have been included for better visualization of the data.

      (4) The paper reads as if Dop2R would reduce neuronal activity, rather than "just" cAMP levels. Such a misunderstanding should be avoided.

      We appreciate the reviewerโ€™s comment. Under most conditions, dopamine binding to D2Rs activates the Gฮฑi/o pathway, which inhibits adenylyl cyclase (AC) and reduces cAMP levels. This reduction in cAMP ultimately leads to decreased neuronal activity. In other words, D2R activation typically has an inhibitory effect on neurons. Additionally, D2R can exert inhibitory effects through other signaling pathways, such as the inhibition of voltage-gated associative learning, we continue to emphasize the importance of the D2R-mediated AC-cAMP-PKA signaling pathway. However, we do not rule out the potential involvement of additional signaling pathways, such as inhibition of voltage-gated calcium channels via Gฮฒฮณ subunits[5]. As noted in the Introduction, dopamine receptors are also involved in other signaling cascades, including PKC, MAPK, and CaMKII pathways. In the context of our study, based on current understanding of molecular signaling in Drosophila olfactory, we still think D2R mediated AC-cAMP-PKA signaling pathway would be the most important one. However, we cannot rule out the involvement of other signaling pathways.

      (5) It would be better if citations were more clearly separated into ones that refer to adult flies versus work on larvae.

      We separated the citations related to adult flies from those working on larvae.

      (6) Line 81-83. DopECR is not found in mammals, is it?

      You are correct. DopECR is not found in mammals. This non-canonical receptor shares structural homology with vertebrate ฮฒ-adrenergic-like receptors. It can be activated rapidly by dopamine as well as insect ecdysteroids[13,14].

      (7) Line 99: Better "a" learning center (some forms of learning work without mushroom bodies).

      We have revised the text from "the learning center" to "a learning center," as suggested by the reviewer.

      (8) Supplemental figures should be numbered according to the sequence in which they are mentioned in the text.

      We have rearranged the sequence of supplemental figures to match the order in which they are referenced in the text.

      (9) It is striking that dTRPA1-driving DANc1 is punishing in the water condition but that this effect does not summate with quinine punishment (but rather seems to impair it). Maybe you can back this up by ChR- or Chrimson-driving DANc1? Or by silencing DANc1 by GtACR1?

      We appreciate the reviewerโ€™s suggestion. Indeed, we observed similar but not identical results when we used ChR2 to activate DAN-c1 during the training stage (Figure 5b and c). We found that activating DAN-c1 with quinine (QUI) impaired aversive learning (Figure 5b), consistent with our findings using dTRPA1 activation of DAN-c1 when trained in QUI at 34ยฐC (Figure 2i). We propose that the over-excitation of DAN-c1, whether induced by QUI or artificial manipulation (optogenetics and thermogenetics), impairs aversive learning, which aligns with our findings for D2R knockdown (Figure 4e). However, there are some differences between dTRPA1 and ChR2 activation. While dTRPA1 activation induced aversive learning when trained with distilled water (DW) at 34ยฐC (Figure 2i), ChR2 did not induce aversive learning under the same conditions (Figure 5c). We believe this difference is due to the varying activation levels between the two manipulations. Our optogenetic stimulus may have been stronger than the thermogenetic one, potentially leading to over-excitation in the DW group, preventing aversive learning. In the QUI group, the more severe over-excitation impaired aversive learning, producing a phenotype similar to that observed with other over-excitation methods (e.g., thermogenetics or D2R knockdown), where the phenotype reached a maximum level. We have also addressed these points in the Discussion section.

      (10) Unless I got the experimental procedure wrong, isn't it surprising that Figure S7b does not uncover a punishing effect of driving TH-Gals neurons?

      This optogenetic experiment with ChR2 expression in TH-GAL4 neurons was a pioneering attempt to activate DAN-c1 using ChR2. As explained in response to question (9), the failure to observe a punishing effect in the DW group when TH-GAL4 neurons were activated during training may be due to our optogenetic stimulus being too strong. This likely resulted in over-excitation of DAN-c1 (among the neurons labeled by TH-GAL4), impairing aversive learning and preventing the appearance of typical aversive behaviors.

      (11) It seems that Figure1fยด is repeated, in a mirrored manner, in Figure 2e.

      We have removed Figure 2e, as it was deemed redundant and not necessary for this section.

      Reference

      (1) Saumweber, T. et al. Functional architecture of reward learning in mushroom body extrinsic neurons of larval Drosophila. Nat Commun 9, 1104 (2018). https://doi.org/10.1038/s41467-018-03130-1

      (2) Aso, Y. & Rubin, G. M. Dopaminergic neurons write and update memories with cell-type-specific rules. Elife 5 (2016). https://doi.org/10.7554/eLife.16135

      (3) Xie, T. et al. A Genetic Toolkit for Dissecting Dopamine Circuit Function in Drosophila. Cell Rep 23, 652-665 (2018). https://doi.org/10.1016/j.celrep.2018.03.068

      (4) Eschbach, C. et al. Recurrent architecture for adaptive regulation of learning in the insect brain. Nat Neurosci 23, 544-555 (2020). https://doi.org/10.1038/s41593-020-0607-9

      (5) Neve, K. A., Seamans, J. K. & Trantham-Davidson, H. Dopamine receptor signaling. J Recept Signal Transduct Res 24, 165-205 (2004). https://doi.org/10.1081/rrs-200029981

      (6) Draper, I., Kurshan, P. T., McBride, E., Jackson, F. R. & Kopin, A. S. Locomotor activity is regulated by D2-like receptors in Drosophila: an anatomic and functional analysis. Dev Neurobiol 67, 378-393 (2007). https://doi.org/10.1002/dneu.20355

      (7) Honjo, K. & Furukubo-Tokunaga, K. Induction of cAMP response element-binding protein-dependent medium-term memory by appetitive gustatory reinforcement in Drosophila larvae. J Neurosci 25, 7905-7913 (2005). https://doi.org/10.1523/JNEUROSCI.2135-05.2005

      (8) Honjo, K. & Furukubo-Tokunaga, K. Distinctive neuronal networks and biochemical pathways for appetitive and aversive memory in Drosophila larvae. J Neurosci 29, 852-862 (2009). https://doi.org/10.1523/JNEUROSCI.1315-08.2009

      (9) Yamazaki, D., Maeyama, Y. & Tabata, T. Combinatory Actions of Co-transmitters in Dopaminergic Systems Modulate Drosophila Olfactory Memories. J Neurosci 43, 8294-8305 (2023). https://doi.org/10.1523/jneurosci.2152-22.2023

      (10) Selcho, M., Pauls, D., Han, K. A., Stocker, R. F. & Thum, A. S. The role of dopamine in Drosophila larval classical olfactory conditioning. PLoS One 4, e5897 (2009). https://doi.org/10.1371/journal.pone.0005897

      (11) Kim, Y. C., Lee, H. G. & Han, K. A. D1 dopamine receptor dDA1 is required in the mushroom body neurons for aversive and appetitive learning in Drosophila. J Neurosci 27, 7640-7647 (2007). https://doi.org/10.1523/JNEUROSCI.1167-07.2007

      (12) Macpherson, L. J. et al. Dynamic labelling of neural connections in multiple colours by trans-synaptic fluorescence complementation. Nat Commun 6, 10024 (2015). https://doi.org/10.1038/ncomms10024

      (13) Abrieux, A., Duportets, L., Debernard, S., Gadenne, C. & Anton, S. The GPCR membrane receptor, DopEcR, mediates the actions of both dopamine and ecdysone to control sex pheromone perception in an insect. Front Behav Neurosci 8, 312 (2014). https://doi.org/10.3389/fnbeh.2014.00312

      (14) Lark, A., Kitamoto, T. & Martin, J. R. Modulation of neuronal activity in the Drosophila mushroom body by DopEcR, a unique dual receptor for ecdysone and dopamine. Biochim Biophys Acta Mol Cell Res 1864, 1578-1588 (2017). https://doi.org/10.1016/j.bbamcr.2017.05.015

    1. eLife Assessment

      In this highly innovative study, Carpenet C et al explore the use of nanobody-based PET imaging to track proliferative cells after in vivo transplantation in mice, in a fully immunocompetent setting. The development of a unique set of PET tracers and mouse strains to track genetically-unmodified transplanted cells in vivo is an important novel asset that could potentially facilitate cell tracking in different research fields. The evidence provided is compelling as the new method proposed might facilitate overcoming certain limitations of alternative approaches, such as full sized immunoglobulins and small molecules.

    2. Reviewer #1 (Public review):

      Summary:

      The topic of nanobody-based PET imaging is important, and holds great potential for real-world applications since nanobodies have many advantages over full sized immunoglobulins and small molecules.

      Strengths:

      The submitted manuscript contains quite a bit of interesting data from a collaborative team of well-respected researchers. The authors are to be congratulated for presenting results that may not have turned out the way they had hoped, and doing so in a transparent fashion.

      Weaknesses:

      However, the manuscript could be considered to be a collection of exploratory findings rather than a complete and mature scientific exposition. Most of the sample sizes were 3 per group, which is fine for exploratory work, but insufficient to draw strong, statistically robust conclusions for definitive results.

      Overall, the following specific limitations are noted as suggestions for future work:

      (1) The authors used DFO, which is well known to leak Zr, rather than the current standard for 89Zr PET which is DFO* (DFO-star)

      (2) The brain tissues were not capillary depleted, which limits interpretation. Capillary depletion, with quantitative assessment of the completion of the depletion process, is the standard in the field.

      (3) The authors have not experimentally tested the hypothesis that the PEG adduct reduced BBB transcytosis.

      (4) The results in Fig. 7 involving the placenta are interesting, but need confirmation using constructs with 18F labeling and without the PEG adduct.

      (5) If this line of investigation were to be translated to humans, an important consideration would be the relative safety of 89Zr and 64Cu. It is likely to be quite a bit worse than for 18F, since the 89Zr and 64Cu have longer half-lives, dissociate from their chelators, and lodge in off-target tissues.

      (6) A surprising and somewhat disappointing finding was the modest amount of BBB transcytosis. Clearly additional work will be needed before nanobody-based brain PET becomes feasible.

    3. Reviewer #2 (Public review):

      Summary:

      In this study the authors described a previously developed set of VHH-based PET tracers to track transplants (cancer cells, embryo's) in a murine immune-competent environment.

      Strengths:

      Unique set of PET tracer and mouse strain to track transplanted cells in vivo without genetic modification of the transplanted cells. This is a unique asset and a first-in-kind.

      Weaknesses:

      None

    4. Author Response:

      The following is the authorsโ€™ response to the original reviews.

      Reviewer #1 (Recommendations for the authors):ย 

      Overall, the manuscript could be clearer and more beneficial to the readers with the following suggested revisions: ย 

      (1) The abstract should include information on the comparative performance of 89Zr 64Cu and 18F labeled nanobodies, especially noting the challenges with DFO-89Zr and NOTA-64Cu.ย 

      (2) The abstract should explicitly note the types of transplants assessed and the specific PET findings.

      (3) The abstract should note the negative results in terms of brain PET findings.ย 

      We thank reviewer 1 for these three suggestions. We have now included this information in the abstract.

      (4)ย  Based on the data shown in Fig. 1 and Table 1, it seems that the nanobodies bind to quite a few proteins other than TfR. This should be discussed frankly as a limitation.ย 

      The presence of multiple other bands and proteins identified by LC/MS in Figure 1 is typical for immunoprecipitation experiments, as performed under the conditions used: all proteins other than TfR that are identified in Table 1 are abundant cytoplasmic (cytoskeletal) and/or nuclear proteins.ย  More rigorous washing would perhaps have removed some of these contaminants at the risk of losing some of the specific signal as well. We have added a comment to this effect.ย  In an in vivo setting, this would be of minor concern, as these proteins would be inaccessible to our nanobodies. In fact, when VHH123 radioconjugates are injected in huTfr+/+ mice (or VHH188 in C57BL/6), we observe no specific signal โ€“ which supports this conclusion.ย 

      We therefore state: โ€œWe show that both V<sub>H</sub>Hs bind only to the appropriate TfR, with no obvious cross-reactivity to other surface-expressed proteins by immunoblot, LC/MSMS analysis of immunoprecipitates, SDS-PAGE of <sup>35</sup>S-labelled proteins and flow cytometry (Fig 1;Table 1).โ€. We have added some clarification to make this clearer, and we also include the full LC/MSMS data tables are also added in supplemental materials, as supplementary Table 1. We have included subcellular localization information for each protein identified through LC/MSMS in Table 1 as well.

      (5)ย ย Why did the authors use DFO, which is well known to leak Zr, rather than the current standard for 89Zr PET, DFO* (DFO-star)?ย 

      We used DFO rather than DFO-star for several reasons: 1) because we had already conducted and published numerous other studies using DFO-conjugated nanobodies and not observed any release of <sup>89</sup>Zr, 2) commercially sourced clickchemistry enabled DFO-star (such as DFO*-DBCO) was not available at the time of the study.ย 

      (6) Figure 2B appears to show complex structures, more complex than just GGG-DFOazide, and GGG-NOTA-azide. This should be explained in detail.ย 

      We have added two supplemental figures and methods that recapitulate how we generated what we have termed as GGG-DFO-Azide and GGG-NOTA-Azide. We have updated the legend of Figure 2B.ย 

      (7)ย Why is there a double band in Suppl. Fig 9 for VHH123-NOTA-Azide?ย 

      Under optimal conditions, sortase A-mediated transpeptidation is efficient,ย  resulting in the formation of a peptide bond between the C-terminally LPETG-tagged protein and the GGG-probe. However, extended reaction times or suboptimal concentrations of modified GGG-probes (which are often in limited supply) in the reaction mixture, allow hydrolysis of the sortase A-LPET-protein intermediate. The hydrolysis product can no longer participate in a sortase A reaction. This is what explains the doublet in the reaction used to generate VHH123-NOTA-N<sub>3</sub> โ€“ the upper band is VHH123-NOTA-N<sub>3</sub> and the lower band is the hydrolysis product.ย  VHH123-LPET, is unable to react with PEG<sub>20kDa</sub>-DBCO (the lower band that appears at the same position of migration in the next lane on the gel). We noticed that an adjacent lane was mislabelled as โ€˜VHH188-NOTA-PEG<sub>20kDa</sub>โ€™ when in fact it was โ€˜VHH123-NOTA-PEG<sub>20kDa</sub>โ€™. This has been corrected.

      The hydrolysis product, VHH123-LPET, has a short circulatory half-life and obviously lacks the PEG moiety as well as the chelator. It therefore cannot chelate <sup>64</sup>Cu. Its presence should not interfere with PET imaging.ย  Since all animals were injected with the same measured dose of <sup>64</sup>Cu labeled-conjugate, the presence of an unlabeled TfRbinding competitor in the form of VHH123-LPET - at a << 1:1 molar ratio to the labelled nanobody โ€“ would be of no consequence.

      (8) More details should be provided about the tetrazine-TCO click chemistry for 18F labeling.ย 

      We have added supplementary methods and figures that detail how <sup>18</sup>F-TCO was generated. For the principle of TCO-tetrazine click-chemistry, a brief description was added in the text, as well as a reference to a review on the subject.

      (9) For the data shown in Figure 3H, the authors should state whether the brain tissues were capillary depleted, and if so, how this was performed and how complete the procedure was.ย 

      No capillary depletion of the brain tissues was performed, as this was challenging to perform in compliance with the radiosafety protocols in place at our institution. We have updated the legend of figure 3H and methods to include this important detail. Whole blood gamma-counting did not show any obvious diย  erence of activity across the 4 groups in figure 3G (same mice as in figure 3H), which would go against the interpretation that activity diย  erences in the brain (figure 3H) are solely attributable to residual activity from blood in the capillaries.ย 

      (10) The authors should experimentally test the hypotheses that the PEG adduct reduced BBB transcytosis.ย 

      Reviewer 1 is correct to point out that we have not tested un-PEGylated conjugates of <sup>64</sup>Cu and <sup>89</sup>Zr with the anti-TfR nanobodies and we currently do not have the means to perform additional experiments. However, the <sup>18</sup>F conjugates were not PEGylated, and these also fail to show any detectable signal in the CNS by PET/CT (see figure 4A). PEGylation alone cannot be the sole factor that limits transcytosis across the BBB.

      (11) It was interesting to note that the Cu appears to dissociate from the NOTA chelator. The authors should provide more information about the kinetics of this process. ย 

      We have not tested the kinetics of dissociation between <sup>64</sup>Cu and the NOTA conjugates in vitro, like we have done for <sup>89</sup>Zr and DFO (supplemental figure 2), because previous work (see references 35 and 36 by Dearling JL and Mirick GR and colleagues) has shown that NOTA and other copper chelators tend to release free copper radioisotopes in the liver, a commonly reported artifact. We have also included a new set of images that show the biodistribution of VHH123-NOTA-<sup>64</sup>Cu in huTfR+/+ mice, where we still observe a substantial signal in the liver, indicating release of <sup>64</sup>Cu from NOTA, in the absence of the anti-TfR VHH binding to its target. This was clearly not seen using the DFO-<sup>89</sup>Zr conjugates.ย  Binding of the VHH to TfR, followed by internalization, appears to be required for the release of <sup>89</sup>Zr from DFO, prompting us to investigate this phenomenon further.

      (12) The authors should increase the sample size, and test two different radiolabels for the transplant imaging results (Figs. 5 and 6), since these seem to be the ones they feel are the most important, based on the title and abstract.ย 

      We agree with reviewer 1 that more repeats would increase the significance of our findings, but we unfortunately do not have the means of performing additional experiments at this time (the lab at Boston Childrenโ€™s Hospital has closed as Dr. Ploegh has retired). We believe that the results are compelling and will be of use to the in vivo imaging community.

      (13) Fig. 6G appears to show a false positive result for the kidney imaging. Is this real, or an artifact of small sample size?

      We agree with reviewer 1 that the kidney signals in figure 6 are somewhat puzzling. The difference between the tumor-bearing mice that received VHH123 and VHHEnh conjugates is not significant โ€“ with the obvious caveat that the VHHEnh group is comprised of only 2 mice, so sample size may well be a factor here. If we compare the signals of the VHH123 conjugate in tumor-bearing mice vs. tumor-free mice, the VHH123 conjugates would have cleared much faster in the tumor-free mice over 24 hours (since no epitope is present for VHH123 to bind to), thus weakening the kidney signal observed after 24 hours. The same would be true for all the other tissues โ€“ except for the liver (where free <sup>64</sup>Cu that leaks from NOTA accumulates). VHHEnh conjugates in tumor-bearing mice show a significant kidney signal โ€“ although no VHH123 target epitope is present in these mice. B16.F10 tumors at 4 weeks of growth tend to be necrotic and can passively retain any radiotracer โ€“ this generates the weak lung signal visible in Fig 6D โ€“ thus the radiotracer would clear at a slower rate than VHH123 conjugates in tumor-free mice giving a higher kidney signal at 24 hours.ย 

      No tumors were found in the kidneys post-necropsy. We attribute the differences in kidney signals to di erent kinetics of clearance of the radioconjugates. We have added this explanation to the results and discussion.

      (14) Are the results shown in Fig. 7 generalizable? The authors should the constructs with 18F labeling and without the PEG adduct.ย 

      We agree with reviewer 1 that it would be very interesting to confirm these observations using 18F radioconjugates. The results should be generalizable, as the difference between signals can only be attributed to the presence of the recognized epitope in the placentaโ€“ which is in fact the only variable that differs between the two groups. At the time of conducting the study, we had not planned to perform the same experiments with 18F radioconjugates โ€“ partly because synthesis of 18F radioconjugates is more challenging (and costly) than the production of 89Zr-labeled nanobodies. ย 

      (15) The authors should discuss the relative safety of 89Zr and 64Cu. It is likely to be quite a bit worse than for 18F, since the 89Zr and 64Cu have longer half-lives, dissociate from their chelators, and lodge in off-target tissues. An alternative interpretation of the authors' data could be that 89Zr and 64Cu labeling in this context are unsuitable for the stated purposes of PET imaging. In this case, the key experiments shown in Figs. 5-7 should be repeated with the 18F labeled nanobody constructs.ย 

      Our vision was to o er a tool to the scientific community interested in in vivo tracking of cells in di erent preclinical disease models. The question of safety regarding 89Zr and 64Cu for clinical use was therefore not a factor we then considered. However, we have now included a section in the discussion about the potential safety issue of <sup>89</sup>Zr release and bone accumulation in clinical settings, especially for radioconjugates that target an internalizing surface protein.ย 

      (16) The authors should remark on the somewhat surprisingly modest amount of BBB transcytosis in the discussion. What were the a inities of the nanobodies?ย 

      The a inities and binding kinetics of both nanobodies was described in a separate work that is referenced in the introduction (references 21 and 22 by Wouters Y and colleagues). Through other methods that rely on a highly sensitive bio-assay, it was shown that both VHH123 and VHH188 are capable of transcytosis: both nanobodies coupled to a neurotensin peptide induced a drop of temperature after i.v. injection in matching mouse strains (VHH123 in C57BL/6 and VHH188 in huTfr +/+). The lack of any compelling CNS signal by PET/CT is discussed in the manuscript.

      (17) More details of the methods should be provided in the supplement.ย 

      a.ย  What was the source of the penta-mutant Sortase A-His6?ย 

      Sortase A pentamutant is produced in-house, by cytoplasmic expression in E.coli (BL21 strain), using a plasmid vector encoding a truncated and mutated version of Sortase A. References were added, as well as the Addgene repository number (51140).

      b.ย  What was the yield of the sortase reactions?ย 

      For small proteins, such as nanobodies/ V<sub>H</sub>Hs, we find that the yield of a sortase A reaction typically is > 75%. This is what we observed for all our conjugations. The methods section was updated to include this information.

      c.ย  What was the source of the GGG-Azide-DFO and GGG-Azide NOTA? Based on the structures shown in Fig. 2, these appear to be more complex that was noted in the text.ย 

      We have now detailed the synthesis of GGG-DFO-Azide and GGG-NOTA-Azide in the supplementary methods.

      d.ย  More details about the source and purity of the tetrazine and TCO labeling reagents should be provided.ย 

      We have included information on the synthesis of GGG-tetrazine in the supplementary methods. Concerning the synthesis of <sup>18</sup>F-TCO, we have also included a detailed description of the compound in supplementary methods. The reaction between GGG-tetrazine and <sup>18</sup>F-TCO is now further detailed in the manuscript.ย 

      e.ย  The TCO-agarose slurry purification should be explained in more detail, and the results should be shown.ย 

      We have included a detailed procedure of how the TCO-agarose slurry purification was performed in the methods sections. We had already included the Radio-Thin Layer Chromatography QC data of the final VHH123-18F and VHH188-18F purifications in the supplementary figures โ€“ which are obtained immediately after TCOagarose slurry purification. The detailed yields of the TCO-agarose slurry purification in terms of activity of each collected fraction is now detailed in the methods section.

      f.ย ย  The CT parameters should be provided. ย 

      We have now added more information about the PET/CT imaging procedure in the methods section of the manuscript.

      Reviewer #2 (Recommendations for the authors):ย 

      Authors should discuss the possibility of the TfR as a rejection antigen. Murine TfR is foreign for hTfR+/+ mice and vice versa.ย 

      We have not discussed this possibility, as we believe the risk of rejection of huTfR+ cells in moTfR+ mice (or vice versa) is negligible. The cells and mice are of the same genetic background โ€“ save for the coding region of ectodomain of the TfR (spanning amino acids ~194 to 390 of the full length TfR, which is 763 AA). The pairwise identity of both human and mouse TfR ectodomains is of 73% after alignment of both AA sequences using Clustal Omega. We agree that we cannot formally exclude the possibility of an immune rejection, and have now mentioned this possibility in the discussion.

      Is there any clinical use of the anti-human TfR receptor PET tracer?ย 

      We do not currently envision an application for the anti-human TfR VHH in PET/CT in a clinical setting. ย 

      Why is the in vivo anti-mouse TfR uptake level in C57BL/6 mice consistently higher than the anti-human TfR receptor PET tracer in hTfR+/+ mice? Is this due to differences in characteristics of the VHH's (e.g. a inity, internalization properties), or rather due to a different biological behavior of the hTfR-transgene (e.g. reduced internalization properties)?ย 

      We indeed observed that VHH123 uptake and binding appears to be more robust than that of VHH188 to their respective targets. Moreover, after later times post-injection (> 48h), VHH188 appears to display a very low reactivity to C57BL/6 (moTfR+) cells (see Figure 3B). We attribute this to the respective affinities and specificities of both VHHs. We have not investigated the VHH binding kinetics of the mouse versus humanectodomain TfR proteins in vitro. Internalization should be mildly different at best, as <sup>89</sup>Zr release from DFO occurs with both VHHs in both C57BL/6 and huTfR +/+ mouse models (when injected in a matched configuration). The huTfR +/+ mice rely exclusively on the huTfr for their iron supply. They are healthy with no obvious pathological features. The behavior of the huTfr is therefore presumably similar, if not identical to that of the mouse Tfr, bearing in mind that the huTfr and the mouse Tfr are both reliant on mouse Tf as their ligand

      The anti-TfR VHHs were initially developed as a carrier for BBB-transport of VHH-based drug conjugates (previous publications). The data shown here reduces enthusiasm towards this application. Uptake in the brain is several log-factors lower than physiological uptake elsewhere. Potential consequences of off-brain uptake on potential toxicity of VHH-based drug-conjugates could be better emphasized in the discussion.ย 

      We did not observe a significant presence of the anti-TfR VHHs in the CNS by PET/CT. We have addressed several possibilities: longer circulation times post-injection may favor transcytosis of the VHHs through the BBB. However, because transcytosis requires endocytosis โ€“<sup>89</sup>Zr may be released by their chelating moiety at this step. The only radiotracers with a covalent bond between the radio-isotope and the VHHs in our work are the <sup>18</sup>F VHHs, but the signal acquisition window may have been too short to observe transcytosis and accumulation in the CNS. Another possible caveat is that PEGylation of the radiotracers may be an obstacle to transcytosis. The circulatory halflife of unpegylated VHHs is too low to allow adequate visualization after 24 hours postinjection, as the conjugates rapidly clear from the circulation (t ยฝ = 30 minutes or less). We have updated the discussion to address these points.

      In several locations (I have counted 5) a space is missing between words, please double-check.ย 

      We carefully checked the manuscript to remove any remaining typos.

      It is unclear to me why for the melanoma-tracking experiment the tracer is switched from the 89Zr-labeled variant to the 64Cu-labeled variant.ย 

      The decision to switch to the <sup>64</sup>Cu labeled VHHs for the melanoma experiment stemmed from a wish to 1) evaluate the performance of the <sup>64</sup>Cu-radioconjugates in detecting transplanted cells as we had done with the <sup>89</sup>Zr conjugates and 2) assess how the (non-specific) liver signal seen with <sup>64</sup>Cu contrasts with a specific signal. ย 

      typo in discussion: C57BL/6 instead of C57B/6ย ย  ย ย ย ย ย ย 

      We have corrected the typo.

      It is unclear to me why in FIG1B cells are labeled with 35S. Is it correct that the signals seen are due to staining membranes with anti-TfR mAbs? Or is this an autoradiography of the gel?ย 

      In Figure 1B cells were labeled with 35S-Met/Cys, while the images shown are indeed those of Western Blots, using an anti-TfR monoclonal antibody as the primary antibody to detect human and mouse TfR retrieved by the anti Tfr VHHs. Autoradiography using the same lysates showed the presence of contaminants in the VHH eluates, as commonly seen in immunoprecipitates from metabolically labeled cells (as distinct from IP/Westerns). For this reason, we performed a Western Blot on the same samples to confirm TfR pull-down. As written in the results section, we also performed LCMS analysis of the immunoprecipitated proteins to better characterize contaminating proteins (Table 1). To clarify this, we have now added the autoradiographs in supplementary data (supplementary figure 15) and added a reference to these observation in the results.ย 

      ROI quantifications in all figures: these should be expressed as %ID/cc instead of %ID/g. Ex vivo tissue counts should be in %ID/g instead of cpm.ย 

      We have converted all ROI quantification figures as %ID/cc based on the assumption that 1mL (1cc) = 1g. For ex vivo tissue counts, %ID/g has been calculated based on injected dose (except for figure 3G, where the comparisons in %ID/G are not possible due to the uncertain nature of bone marrow and whole blood). All figures have now been updated.

      Fig4: it would be good to also see respective mouse controls (C57BL6 vs hTfR+/+) for the 64Cu- and 18F-labeled VHH123 tracers. Each radiolabeling methodology changes in vivo biodistribution and specificity, which can be better assessed by using appropriate controls.ย 

      We had performed these controls but they were not included in the manuscript as deemed redundant with the results of Figure 3. We have now separated Figure 4 in two panels (Figure 4A and 4B) with figure 4A showing the 1h timepoint post-injection of VHH123 radiotracers in C57BL/6 vs huTfr<sup>+/+</sup> and Figure 4B showing the 24h timepoint in the same configuration. ROI analyses were also done on the huTfR<sup>+/+</sup> controls and were included in Figure 4C as well.

      Fig7: is it correct that mouse imaging is performed at 24h p.i. and dissected embryo's at 72h p.i.? Why are there 2 days between each procedure of the same animals?ย 

      We acquired images at di erent timepoints, specifically at 1h, 24h, 48h and 72 hours after radio-tracer injection. As 72 h was the last timepoint, the mice were sacrificed the same day and embryo dissection performed thereafter, at 72 hours post radiotracer injection. We decided to show the 24h timepoint images as they were the most representative of the series, o ering the best signal-to-noise ratio. The signal pattern did not change over the course from 24h to 72h. We have now added those timepoints in the supplementary data.

    1. eLife Assessment

      This study focuses on a previously reported positive correlation between translational efficiency and protein noise. Using mathematical modeling and analysis of experimental data the authors reach the valuable conclusion that this phenomenon arises due to ribosomal demand. While some aspects of the work appear to be incomplete, the results have the potential to be of value and interest to the field of gene expression.

    2. Reviewer #1 (Public review):

      Summary:

      The authors use analysis of existing data, mathematical modelling and new experiments to explore the relationship between protein expression noise, translation efficiency and transcriptional bursting.

      Strengths:

      The analysis of the old data and the new data presented is interesting and mostly convincing.

      Weaknesses:

      My main concern is the analysis presented in Figure 4. This is the core of mechanistic analysis that suggests ribosomal demand can explain the observed phenomenon. Revisions have improved clarity but I am both confused by the assumptions used here in the mathematical modelling of this section. I said before, the authors assumption that the fluctuations of a single gene mRNA levels will significantly affect ribosome demand is puzzling. The author's seem to dismiss this and maybe I am missing something. However, the specific forms used in equations of table S1 seem very phenomenological and I am not sure how these can be taken as good approximations for modelling ribosome demand. Why kc has this specific form, why such a sharp hill number is appropriate. how many total ribosomes per mRNA is assumed here (if this assumption is indeed needed). Again, my intuition is that on average the total level of mRNA across all genes would stay constant and therefore there are not big fluctuations in the ribosome demand due to the burstiness of transcription of individual genes (as this on average is compensated with drop in level of other transcripts). Should not one be considering all transcripts and total ribosomes to be able to model ribosome demand?

    3. Reviewer #2 (Public review):

      This work by Pal et al. studied the relationship between protein expression noise and translational efficiency. They proposed a model based on ribosome demand to explain the positive correlation between them, which is new as far as I realize. Nevertheless, I found the evidence of the main idea that it is the ribosome demand generating this correlation is weak. Below are my major and minor comments.

      Major comments:

      (1) Besides a hypothetical numerical model, I did not find any direct experimental evidence supporting the ribosome demand model. Therefore, I think the main conclusions of this work are a bit overstated.

      (2) I found that the enhancement of protein noise due to high translational efficiency is quite mild, as shown in Figure 6A-B, which makes the biological significance of this effect unclear.

      (3) The captions for most of the figures are short and do not provide much explanation, making the figures difficult to read.

      (4) It would be helpful if the authors could define the meanings of noise (e.g., coefficient of variation?) and translational efficiency in the very beginning to avoid any confusion. It is also unclear to me whether the noise from the experimental data is defined according to protein numbers or concentrations, which is presumably important since budding yeasts are growing cells.

      (5) The conclusions from Figure 1D and 1E are not new. For example, the constant protein noise as a function of mean protein expression is a known result of the two-state model of gene expression, e.g., see Eq. (4) in Paulsson, Physics of Life Reviews 2005.

      (6) In Figure 4C-D, it is unclear to me how the authors changed the mean protein expression if the translation initiation rate is a function of variation in mRNA number and other random variables.

      (7) If I understand correctly, the authors somehow changed the translation initiation rate to change the mean protein expression in Figure 4C-D. However, the authors changed the protein sequences in the experimental data of Figure 6. I am not sure if the comparison between simulations and experimental data is appropriate.

      Comments on revisions:

      Updated Review: The authors have satisfactorily answered all of my questions and comments. The current manuscript is much clearer and stronger than the previous one. I do not have any other questions.

    4. Author Response:

      The following is the authorsโ€™ response to the original reviews.

      Reviewer #1 (Public review):ย 

      Summary:

      The authors use analysis of existing data, mathematical modelling, and new experiments, to explore the relationship between protein expression noise, translation efficiency, and transcriptional bursting.

      Strengths:

      The analysis of the old data and the new data presented is interesting and mostly convincing.

      Thank you for the constructive suggestions and comments. We address the individual comments below.ย 

      Weaknesses:

      (1) My main concern is the analysis presented in Figure 4. This is the core of mechanistic analysis that suggests ribosomal demand can explain the observed phenomenon. I am both confused by the assumptions used here and the details of the mathematical modelling used in this section. Firstly, the authors' assumption that the fluctuations of a single gene mRNA levels will significantly affect ribosome demand is puzzling. On average the total level of mRNA across all genes would stay very constant and therefore there are no big fluctuations in the ribosome demand due to the burstiness of transcription of individual genes. Secondly, the analysis uses 19 mathematical functions that are in Table S1, but there are not really enough details for me to understand how this is used, are these included in a TASEP simulation? In what way are mRNA-prev and mRNA-curr used? What is the mechanistic meaning of different terms and exponents? As the authors use this analysis to argue ribosomal demand is at play, I would like this section to be very much clarified.

      Thank you for raising two important points. Regarding the first point, we agree that the overall ribosome demand in a cell will remain mostly the same even with fluctuations in mRNA levels of a few genes. However, what we refer to in the manuscript is the demand for ribosomes for translating mRNA molecules of a single gene. This demand will vary with the changes in the number of mRNA molecules of that gene. When the mRNA copy number of the gene is low, the number of ribosomes required for translation is low. At a subsequent timepoint when the mRNA number of the same gene goes up rapidly due to transcriptional bursting, the number of ribosomes required would also increase rapidly. This would increase ribosome demand. The process of allocation of ribosomes for translation of these mRNA molecules will vary between cells, and this process can lead to increased expression variation of that gene among cells. We have now rephrased the section between the lines 321 and 331 to clarify this point.

      Regarding the second point, each of the 19 mathematical functions was individually tested in the TASEP model and stochastic simulation. The parameters โ€˜mRNA-currโ€™ and โ€˜mRNA-prevโ€™ are the mRNA copy numbers at the present time point and the previous time point in the stochastic simulations, respectively. These numbers were calculated from the rate of production of mRNA, which is influenced by the transcriptional burst frequency and the burst size, as well as the rate of mRNA removal. We have now incorporated more details about the modelling part along with explanation for parameters and terms in the revised manuscript (lines 390 to 411; lines 795 to lines 807).ย 

      (2)ย Overall, the paper is very long and as there are analytical expressions for protein noise (e.g. see Paulsson Nature 2004), some of these results do not need to rely on Gillespie simulations. Protein CV (noise) can be written as three terms representing protein noise contribution, mRNA expression contribution, and bursty transcription contribution. For example, the results in panel 1 are fully consistent with the parameter regime, protein noise is negligible compared to transcriptional noise.ย 

      Thank you for referring to the paper on analytical expressions for protein noise. We introduced translational bursting and ribosome demand in our model, and these are linked to stochastic fluctuations in mRNA and ribosome numbers. In addition, our model couples transcriptional bursting with translational bursting and ribosome demand. Since these processes are all stochastic in nature, we felt that the stochastic simulation would be able to better capture the fluctuations in mRNA and protein expression levels originating from these processes. For consistency, we used stochastic simulations throughout even when the coupling between transcription and translation were not considered.ย 

      Reviewer #1 (Recommendations for the authors): ย 

      (1) Figure 1B shows noise as Distance to Median (DM) that can be positive or negative. It is therefore misleading that the authors say there is a 10-fold increase in noise (this would be relevant if the quantity was strictly positive). How is the 10-fold estimated? Similar comments apply to Figure 1F and the estimated 37-fold. I also wonder if the datasets combined from different studies are necessarily compatible.

      We have now changed the statements and mentioned the actual noise values for different classes of genes rather than the fold-changes (lines 111-113 and 143-145). We agree that the measurements for mRNA expression levels, protein synthesis rates and protein noise were obtained from experiments done by different research labs, and this could introduce more variation in the data. However, it is unlikely the experimental variations are likely to be random and do not bias any specific class of genes (in Fig. 1B and Fig. 1F) more than others. ย 

      (2)ย ย  How Figure 1D has been generated seems confusing, the authors state this is based on the Gillespie algorithm, but in panel 1C and also in the methods, they are writing ODEs and Equations 3 and 4 stating the Euler method for the solution of ODEs. Also, I am concerned if this has been done at steady-state. The protein noise for the two-state model can be analytically obtained, and instead of simulations, the authors could have just used the expression. Also, Figure 1D shows CV while the corresponding data Figure 1B is showing mean adjusted DM. So, I am not sure if the comparison is valid. I am also very confused about the fact that the authors show CV does not depend on the mean expression of proteins and mRNA. Analytical solutions suggested there is always an inverse relationship exists between CV and mean and this has also been experimentally observed (see for example Newman et al 2006).

      We used Gillespie algorithm for stochastic simulations and identified the time points when an event (for example, switching to ON or OFF states during transcriptional bursting) occurred. If an event occurred at a time point, the rates of the reactions were guided by the equations 3 and 4, as the rates of reactions were dependent on the number of mRNA (or protein) molecules present, production rates and removal rates.ย 

      For all published datasets where we had measurements from many genes/promoters, we used the measures of adjusted noise (for mRNA noise) and Distance-to-median (DM, for protein noise). These measures of noise are corrected mean-dependence of expression noise (Newman et al., 2006). For simulations, which we performed for a single gene, and for experiments that we performed on a limited number of promoters, we used the measure of coefficient of variation (CV) to quantify noise, as calculation of adjusted noise or DM was not possible for a single gene.ย 

      The work of Newman et al. (2006) measures noise values of different genes with different transcriptional burst characteristics and different mRNA and protein removal rates. We also see similar results in our simulations (Fig. 1E), where as we increase the mean expression by changing the transcriptional burst frequency, the protein noise goes down.ย ย ย ย ย 

      (3) Estimating parameters of gene expression using reference 44 ignores the effect of variability in capture efficiency and cell size. In a recent paper, Tang et al Bioinformatics 39 (7), btad395 2023 addressed this issue.

      Thank you for referring to the work of Tang et al. (2023). We note that the cell size and capture efficiency have a small effect on the burst frequency (Kon) but has a more pronounced effect on burst size (Tang et al., 2023). In our analysis, we considered only burst frequency and even with likely small inaccuracies in our estimation of Kon, we can capture interesting association of burst frequency with noise trends.ย 

      (4)ย In the methods "ฮฑp = 0.007 per mRNA molecule per unit time", I believe it should be per protein molecule per unit time.

      Corrected.

      (5)ย  Figure 3 uses TASEP modelling but the details of this modelling are not described well.

      We have now expanded the description of the modelling approach in the revised manuscript (lines 391-412; lines 693-776 and lines 797-809). In addition, we have also added more details in the figure captions.ย 

      (6) Another overall issue is that when the authors talk about changes in burst frequency or changes in translation efficiency, it is not always clear, is this done while keeping all the other parameters constant therefore changing mean expressions, or is this done by keeping the mean expressions constant?

      To test for the association between mean protein expression and protein noise, we have varied the mean expression by changing the translation initiation rate (TLinit) for the most part of the manuscript while keeping other parameters constant. In figure 5, where we decoupled TLinit from ribosome traversal rate (V), we changed the mean protein expression by changing the ribosome traversal rate while keeping other parameters constant. We have now mentioned this in the manuscript.ย 

      (7)ย ย  I believe Figures 5 and 6 present the same data in different ways, I wonder if these can be combined or if some aspect of the data in Figure 5 could go to supplementary. Also, the statistical tests in Figure 5E and F are not clear what they are testing.

      We have now moved figures 5E and 5F to the supplement (Fig. S20). We have also added details of the statistical test in the figure caption.ย 

      Reviewer #2 (Public review):ย 

      This work by Pal et al. studied the relationship between protein expression noise and translational efficiency. They proposed a model based on ribosome demand to explain the positive correlation between them, which is new as far as I realize. Nevertheless, I found the evidence of the main idea that it is the ribosome demand generating this correlation is weak. Below are my major and minor comments.

      Thank you for your helpful suggestions and comments. We note that the direct experimental support required for the ribosome demand model would need experimental setups that are beyond the currently available methodologies. We address the individual comments below.ย 

      Major comments:ย 

      (1) Besides a hypothetical numerical model, I did not find any direct experimental evidence supporting the ribosome demand model. Therefore, I think the main conclusions of this work are a bit overstated.

      Direct experimental evidence of the hypothesis would require generation of ribosome occupancy maps of mRNA molecules of specific genes at the level of single cells and at time intervals that closely match the burst frequency of the genes. This is beyond the currently available methodologies. However, there are other evidences that support our model. For example, earlier work in cell-free systems have showed that constraining cellular resources required for transcription or translation can increase expression heterogeneity (Caveney et al., 2017). In addition, the ribosome demand model had two predictions both of which could be validated through modelling as well as from our experiments.ย 

      To further investigate whether removing ribosome demand from our model could eliminate the positive mean-noise correlation for a gene, we have now tested two additional sets of models where we decoupled the translation initiation rate (TLinit) from the ribosome traversal speed (V). In the first model, we changed the mean protein expression by changing the translation initiation rate but keeping the ribosome traversal speed constant. Thus, in this scenario, ribosome demand varied according to the variation in the translation initiation rate. As expected, the positive correlation between mean expression and protein noise was maintained in this condition (Fig. 5B). In the second model, we changed the mean expression by changing the ribosome traversal speed but keeping the translation initiation rate (and therefore, the ribosome demand) constant. In this situation, the relationship between mean expression and protein noise turned negative (Fig. 5B and fig. S16). These results further pointed that the ribosome demand was indeed driving the positive relationship between mean expression and protein noise.ย 

      (2) I found that the enhancement of protein noise due to high translational efficiency is quite mild, as shown in Figure 6A-B, which makes the biological significance of this effect unclear.

      We agree with the reviewerโ€™s comment that the effect of translational efficiency on protein noise may not be as substantial as the effect of transcriptional bursting, but it has been observed in studies across bacteria, yeast, and Arabidopsis (Ozbudak et al., 2003; Blake et al., 2003; Wu et al., 2022). In addition, the relationship between translational efficiency and protein noise is in contrast with the inverse relationship observed between mean expression and noise (Newman et al., 2006; Silander et al., 2012). We also note that the goal of the manuscript was not to evaluate the relative strength of these associations, but to understand the molecular basis of the influence of translational efficiency on protein noise.ย 

      (3)ย The captions for most of the figures are short and do not provide much explanation, making the figures difficult to read.

      We have revised the figure captions to include more details as per the reviewerโ€™s suggestion.ย 

      (4)ย  It would be helpful if the authors could define the meanings of noise (e.g., coefficient of variation?) and translational efficiency in the very beginning to avoid any confusion. It is also unclear to me whether the noise from the experimental data is defined according to protein numbers or concentrations, which is presumably important since budding yeasts are growing cells.ย 

      For all published datasets where we had measurements from many genes/promoters, we used the measures of adjusted noise (for mRNA noise) and Distance-tomedian (DM, for protein noise). These measures of noise are corrected mean-dependence of expression noise. For simulations, which we performed for a single gene, and for experiments that we performed on a limited number of promoters, we used the measure of coefficient of variation (CV) to quantify noise, as calculation of adjusted noise or DM was not possible for a single gene. We now mention this in line 123-124. We used the measure of protein synthesis rate per mRNA as the measure of translational efficiency (Riba et al., 2019; line 100). Alternatively, we also used tRNA adaptation index (tAI) as a measure of translational efficiency, as codon choice could also influence the translation rate per mRNA molecule (Tuller et al., 2010) (line 193).ย 

      The protein noise was quantified from the signal intensity of GFP tagged proteins (Newman et al., 2006; and our data), which was proportional to protein numbers without considering cell volume. For quantification of noise at the mRNA level, single-cell RNA-seq data was used, which provided mRNA numbers in individual cells. ย 

      (5) The conclusions from Figures 1D and 1E are not new. For example, the constant protein noise as a function of mean protein expression is a known result of the two-state model of gene expression, e.g., see Equation (4) in Paulsson, Physics of Life Reviews 2005.

      Yes, they may not be new, but we included these results for setting the baseline for comparison with simulation results that appear in the later part of the manuscript where we included translational bursting and ribosome demand in our models.ย 

      (6) In Figure 4C-D, it is unclear to me how the authors changed the mean protein expression if the translation initiation rate is a function of variation in mRNA number and other random variables.

      The translation initiation rate varied from a basal translation initiation rate depending on the mRNA numbers and other variables. We changed the basal translation initiation rate to alter the mean protein expression levels. We have now elaborated the modelling section to incorporate these details in the revised manuscript (lines 404 to 412).ย 

      (7) If I understand correctly, the authors somehow changed the translation initiation rate to change the mean protein expression in Figures 4C-D. However, the authors changed the protein sequences in the experimental data of Figure 6. I am not sure if the comparison between simulations and experimental data is appropriate.

      It is an important observation. Even though we changed the basal translation initiation rate to change the mean expression (Fig. 4C-D), we noted in the description of the model that the changes in the translation initiation rate were also linked to changes in the translation elongation rate (Fig. 3D). Thus, an increase in the translation initiation rate was associated with faster ribosome traversal through an mRNA molecule. This has also been observed in an experimental study by Barrington et al. (2023). Therefore, the models can also be expressed in terms of the translation elongation rate or ribosome traversal speed, instead of the translation initiation rate, and this modification will not change the results of the simulations due to interconnectedness of the initiation rate and the elongation rate.ย ย 

      Reviewer #2 (Recommendations for the authors):

      Minor comments:

      (1)ย  The discussion from lines 180 to 182 appears consistent with Figure 1E. It seems that the twostate model can already explain why the genes with high burst frequency and high protein synthesis rate showed a small protein noise. It is unclear to me the purpose of this discussion.

      Yes, the results from Fig. 1E were from stochastic simulations, whereas the results discussed in the lines 191 to 193 (in the revised manuscript) were based on our analysis of experimental data that is shown in Fig. 2D.

      (2)ย  If I understand correctly, "translational efficiency" is the same as "protein synthesis rate" in this work. It would be helpful if the authors could keep the same notation throughout the paper to avoid confusion.

      The protein synthesis rate per mRNA molecule is the best measure of translational efficiency, and we used the experimental data from Riba et al. (2019) for this purpose (line 99-100). Alternatively, we also used tRNA Adaptation Index (tAI) as a measure of translational efficiency, as the codon choice also influences the rate at which an mRNA molecule is translated (Tuller et al., 2010) (line 192).ย 

      (3) On line 227, does "higher translation rate" mean "higher translation initiation rate"? The same issues happen in a few places in this paper.

      Corrected now (line 243 in the revised manuscript and throughout the manuscript).ย 

      (4)ย The discussion from lines 296 to 301 is unclear. It is not obvious to me how the authors obtained the conclusion that lowering translational efficiency would decrease the protein expression noise.

      High translational efficiency will require more ribosomes and hence, will increase ribosome demand. If ribosome demand is the molecular basis of high expression noise for genes with bursty transcription and high translational efficiency, then we can expect a reduction in ribosome demand and a reduction in noise if we lower the translational efficiency. We have rephrased this section for clarity between the lines 334 and 339 in the revised manuscript.ย ย ย 

      (5)ย  On line 324, should slower translation mean a shorter distance between neighboring ribosomes? One can imagine the extreme limit in which ribosomes move very slowly so that the mRNA is fully packed with ribosomes.ย 

      Slower translation or ribosome traversal rate would also lower the translation initiation rate (Barrington et al., 2023). Slower traversal of ribosomes reduces the chances of collision in case of transient slow-down of ribosomes due to occurrence of one or more non-preferred codons. We have now clarified this part in the lines 360 to 369 in the revised manuscript.

      (6)ย The text from lines 423 to 433 can be put in Methods.

      We have already added this part to the methods section (lines 900 to 910) and now minimize this discussion in the results section.ย 

      (7)ย  The discussion from lines 128 to 130 is unclear, and the statement appears to be consistent with the two-state model (see Figure 1E). The meaning of "initial mRNA numbers" is also unclear.

      An earlier study has proposed that essential genes in yeast employs high transcription and low translation strategy for expression, likely to maintain low expression noise in these genes and to prevent detrimental effects of high expression noise (Fraser et al., 2004). However, there has been no direct supportive evidence. Therefore, we were testing whether the differences in mRNA levels and translational efficiency of genes can lead to differences in protein noise through stochastic simulations. The discussion between the lines 130 and 132 in the revised manuscript summarises the results of the simulations.ย 

      Initial mRNA numbers - mRNA copy numbers that are present in the cell at the start of stochastic simulations. However, we have now changed it to โ€˜mRNA levelsโ€™ in the revised manuscript for clarity (line 131 in the revised manuscript).

      (8)ย  On line 212, is the translation initiation rate TL_init the same thing as beta_p in Figure 3A?

      ฮฒp refers to the rate of protein synthesis, which is influenced by the translational burst kinetics as well as the translation initiation rate, whereas TLinit refers to the translation initiation rate. So, these parameters are related, but are not the same.

    1. eLife Assessment

      Floeder and colleagues provide an important investigation that describes the experimental conditions that systematically produce "ramps" in dopamine signaling in the striatum. This somewhat nebulous feature of dopamine has been a significant part of recent theoretical and computational debates attempting to formally describe the different timescales on which dopamine functions. The current results are convincing and add context to that ongoing work.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, Floedder et al report that dopamine ramps in both Pavlovian and Instrumental conditions are shaped by reward interval statistics. Dopamine ramps are an interesting phenomenon because at first glance they do not represent the classical reward prediction errors associated with dopamine signaling. Instead, they seem somewhat to bridge the gap between tonic and phasic dopamine, with an intense discussion still being held in the field about what is their actual behavioral role. Here, in tests with head-fixed mice, and dopamine being recorded with a genetically encoded fluorescent sensor in the nucleus accumbens, the authors find that dopamine ramps were only present when intertrial intervals were relatively short and the structure of the task (Pavlovian cue or progression in a VR corridor) contained elements that indicated progression towards the reward (e.g., a dynamic cue). The authors propose that although these findings can be explained by classical theories of dopamine function, they are better explained by their model of Adjusted Net Contingency of Causal Relation (ANCCR). The results of this study provide constraints on future models of dopamine function, and are of high interest to the field.

    3. Reviewer #2 (Public review):

      In this manuscript by Floeder et al., the authors report a correlation between ITI duration and the strength of a dopamine ramp occurring in the time between a predictive conditioned stimulus and a subsequent reward. They found this relationship occurring within two different tasks with mice, during both a Pavlovian task as well as an instrumental virtual visual navigation task. Additionally, they observed this relationship only in conditions when using a dynamic predictive stimulus. The authors relate this finding to their previously published model ANCCR in which the time constant of the eligibility trace is proportionate to the reward rate within the task.

      The relationship between ITI duration and the extent of a dopamine ramp which the authors have reported is very intriguing and certainly provides an important constraint for models for dopamine function. As such, these findings are potentially highly impactful to the field.

    4. Reviewer #3 (Public review):

      Summary:

      Floeder and colleagues measure dopamine signaling in the nucleus accumbens core using fiber photometry of the dLight sensor, in Pavlovian and instrumental tasks in mice. They test some predictions from a recently proposed model (ANCCR) regarding the existence of "ramps" in dopamine that have been seen in some previous research, the characteristics of which remain poorly understood.

      They find that cues signaling a progression toward rewards (akin to a countdown) specifically promote ramping dopamine signaling in the nucleus accumbens core, but only when the intertrial interval just experienced was short. This work is discussed in the context of ongoing theoretical conceptions of dopamine's role in learning.

      This work is the clearest demonstration to date of concrete training factors that seem to directly impact whether or not dopamine ramps occur. The existence of ramping signals has long been a feature of debates in the dopamine literature and this work adds important context to that. Further, as a practical assessment of the impact of a relatively simple trial structure manipulation on dopamine patterns, this work will be important for guiding future studies. These studies are well done and thoughtfully presented. The additional data, analyses, and discussion in the revised version of the paper add strength and clarity to the conclusions.

      The current results raise interesting questions regarding what, if any potential function cue-reward interval dopamine ramps serve. In the current data, licking behavior was similar on different trial types and was not related to ramping activity.

    5. Author Response:

      The following is the authorsโ€™ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this study, Floedder et al report that dopamine ramps in both Pavlovian and Instrumental conditions are shaped by reward interval statistics. Dopamine ramps are an interesting phenomenon because at first glance they do not represent the classical reward prediction errors associated with dopamine signaling. Instead, they seem somewhat to bridge the gap between tonic and phasic dopamine, with an intense discussion still being held in the field about what is their actual behavioral role. Here, in tests with head-fixed mice, and dopamine being recorded with a genetically encoded fluorescent sensor in the nucleus accumbens, the authors find that dopamine ramps were only present when intertrial intervals were relatively short and the structure of the task (Pavlovian cue or progression in a VR corridor) contained elements that indicated progression towards the reward (e.g., a dynamic cue). The authors show that these findings are well explained by their previously published model of Adjusted Net Contingency of Causal Relation (ANCCR).

      Strengths:

      This descriptive study delineates some fundamental parameters that define dopamine ramps in the studied conditions. The short, objective, and to-the-point format of the manuscript is great and really does a service to potential readers. The authors are very careful with the scope of their conclusions, which is appreciated by this reviewer.

      We thank the reviewer for their overall support of the formatting and scope of the manuscript.ย 

      Weaknesses:

      The discussion of the results is very limited to the conceptual framework of the authors' preferred model (which the authors do recognize, but it still is a limitation). The correlation analysis presented in panel l of Figure 3 seems unnecessary at best and could be misleading, as it is really driven by the categorical differences between the two conditions that were grouped for this analysis. There are some key aspects of the data and their relationship with each other, the previous literature, and the methods used to collect them, that could have been better discussed and explored.

      We agree with the reviewer that a weakness of the discussion was the limited framing of the results within the ANCCR model. To address this, we have expanded our introduction and discussion sections to provide a more thorough explanation of our model and possible leading alternatives.

      We thank the reviewer for pointing out that Figure 3l may be misleading for readers; we removed this panel from the revised Figure 4.

      We have further addressed the specific concerns raised by the reviewer in their comments to the authors. Indeed, we agree with the reviewer that the original manuscript was narrow in its focus regarding relationships between different aspects of the data. To more thoroughly explore how key variables โ€“ including dopamine ramp slope and onset response as well as licking behavior slope โ€“ could relate to each other, we have added Extended Data Figure 8. In this figure, we show that no correlations exist between any of these key variables in either dynamic tone condition; it is our hope that this additional analysis highlights the significance of the clear relationship between dopamine ramp slope and ITI duration.ย 

      Reviewer #2 (Public Review):

      In this manuscript by Floeder et al., the authors report a correlation between ITI duration and the strength of a dopamine ramp occurring in the time between a predictive conditioned stimulus and a subsequent reward. They found this relationship occurring within two different tasks with mice, during both a Pavlovian task as well as an instrumental virtual visual navigation task. Additionally, they observed this relationship only in conditions when using a dynamic predictive stimulus. The authors relate this finding to their previously published model ANCCR in which the time constant of the eligibility trace is proportionate to the reward rate within the task.

      The relationship between ITI duration and the extent of a dopamine ramp which the authors have reported is very intriguing and certainly provides an important constraint for models for dopamine function. As such, these findings are potentially highly impactful to the field. I do have a few questions for the authors which are written below.

      We thank the reviewer for their interest in our findings and belief in their potential to be impactful in the field.ย 

      (1) I was surprised to see a lack of counterbalance within the Pavlovian design for the order of the long vs short ITI. Ramping of the lick rate does increase from the long-duration ITIs to the short-duration ITI sessions. Although of course, this increase in ramping of the licking across the two conditions is not necessarily a function of learning, it doesn't lend support to the opposite possibility that the timing of the dynamic CS hasn't reached asymptotic learning by the end of the long-duration ITI. The authors do reference papers in which overtraining tends to result in a reduction of ramping, which would argue against this possibility, yet differential learning of the dynamic CS would presumably be required to observe this effect. Do the authors have any evidence that the effect is not due to heightened learning of the timing of the dynamic CS across the experiment?

      We appreciate the reviewer expressing their surprise regarding the lack of counterbalance in our Pavlovian experimental design. We previously did not explicitly do this because the ramps disappeared in the short ITI/fixed tone condition, indicating that their presence is not just a matter of total experience in the task. However, we agree that this is incidental, but not direct evidence. To address this drawback, we repeated the Pavlovian experiment in a new cohort of animals with a revised training order, switching conditions such that the short ITI/dynamic tone (SD) condition preceded the long ITI/dynamic tone (LD) condition (see revised Figure 2a). Despite this change in the training order, the main findings remain consistent: positive dLight slopes (i.e., dopamine ramps) are only observed in the SD condition (Figure 2b-d).ย 

      We thank the reviewer for raising these questions regarding licking behavior and learning and their relationship with dopamine ramps. Indeed, a closer look at the average licking behavior reveals subtle differences across conditions (Figure 1f and Extended Data Figure 5a). While the average lick rate during the ramp window does not differ across conditions (Extended Data Figure 5c), the ramping of the lick rate during this window is higher for dynamic tone conditions compared to fixed tone conditions (Extended Data Figure 5d). Despite these differences, we still believe that the main comparison between the dopamine slope in the SD vs LD condition remains valid given their similar lick ramping slopes. Furthermore, our primary measure of learning is not lick slope, but anticipatory lick rate during the 1 s trace preceding reward delivery, which is robustly nonzero across cohorts and conditions (Figure 1g and Extended Data Figure 5b).ย 

      Taken together, we hope that the results from our counterbalanced Pavlovian training and more rigorous analysis of lick behavior across conditions provide sufficient evidence to assuage concerns that the differences in ramping dopamine simply reflect differences in learning.ย 

      (2) The dopamine response, as measured by dLight, seems to drop after the reward is delivered. This reduction in responding also tends to be observed with electrophysiological recordings of dopamine neurons. It seems possible that during the short ITI sessions, particularly on the shorter ITI duration trials, that dopamine levels may still be reduced from the previous trial at the onset of the CS on the subsequent trial. Perhaps the authors can observe the dynamics of the recovery of the dopamine response following a reward delivery on longer-duration ITIs in order to determine how quickly dopamine is recovering following a reward delivery. Are the trials with very short ITIs occurring within this period that dopamine is recovering from the previous trial? If so, how much of the effect may be due to this effect? It should be noted that the lack of observance of a ramp on the condition of shortduration ITIs with fixed CSs provides a potential control for this effect, yet the extent to which a natural ramp might occur following sucrose deliveries should be investigated.

      We thank the reviewer for highlighting the possibility that ramps may be due to the dopamine response recovery following reward delivery. Given that peak reward dopamine responses tend to be larger in long ITI conditions, however, we felt that it was inappropriate to compare post-reward dopamine recovery times across conditions. Instead, we decided to directly compare the dLight slope 2s before cue onset (โ€œpre-cue window,โ€ a proxy for recovery from previous trial) with the dLight slope during our ramp window from 3 to 8s after cue onset (Extended Data Figure 6a). There were no significant differences in pre-cue dLight slope across conditions (Extended Data Figure 6b); this suggests that the ramping slopes seen in the SD condition, but not other conditions, is not simply due to the natural dopamine recovery response following reward delivery. Furthermore, if the dopamine ramps observed in the SD condition were a continuation of the post-reward dopamine recovery from the previous trial, we would expect to see a positive correlation between the dLight slope before and during the cue. However, there is no such correlation between the dLight slopes in the ramp window vs. pre-cue window in the SD condition (Extended Data Figure 6c-d). We believe that this observation, along with the builtin control of the SF condition mentioned by the reviewer, serves as evidence against the possibility of our ramp results being due to a natural ramp after reward delivery.

      (3) The authors primarily relate the finding of the correlation between the ITI and the slope of the ramp to their ANCCR model by suggesting that shorter time constants of the eligibility trace will result in more precisely timed predictors of reward across discrete periods of the dynamic cue. Based on this prediction, would the change in slope be more gradual, and perhaps be more correlated with a broader cumulative estimate of reward rate than just a single trial?

      To clarify, we do not propose that a smaller eligibility trace time constant results in more precise timing per se. Instead, we believe that the rapid eligibility trace decay from smaller time constants gives greater causal predictive power for later periods in the dynamic cue (see Extended Data Figure 1) since the memory of the earlier periods of the cue is weaker.ย 

      We appreciate the reviewerโ€™s curiosity regarding the influence of a broader cumulative estimate of reward vs. only the immediately preceding ITI on dopamine ramp slopes. Indeed, in several instrumental tasks (e.g., Krausz et al., Neuron, 2023), recent reward rate modulates the magnitude of dopamine ramps, making this an important variable to investigate. We chose to use linear regression for each mouse separately to analyze the relationship between the trial dopamine slope and the average previous ITI for the past 1 through 10 most recent trials. In the SD condition, as reported in our earlier manuscript, there was a significantly negative dependence of trial dopamine slope with the single previous ITI (i.e., if the previous ITI was long, the next trial tends to have a weaker ramp). This negative dependence, however, only held for a single previous trial; there was no clear relationship between the per-trial dopamine slope and the average of the past 2 through 10 ITIs (Extended Data Figure 7a). For the LD condition, on the other hand, there is no clear relationship between the per-trial dopamine slope and the average previous ITI for any of the past 1 through 10 trials, with one exception: there is a significantly negative dependence of trial dopamine slope with the average ITI of the previous 2 trials (Extended Data Figure 7b). This longer timescale relationship in the LD condition suggests that the adaptation of the eligibility trace time constant is nuanced and depends on the general ITI length.ย 

      In general, though we reason that the eligibility trace time constant should depend on overall event rates, we do not currently propose a real-time update rule for the eligibility trace time constant depending on recent event rates. Accordingly, we are currently agnostic about the actual time scale of history of recent event rate calculation that mediates the eligibility trace time constant. Our experimental results suggest that when the ITI is generally short for Pavlovian conditioning, the eligibility trace time constant adapts to ITI on a rapid timescale. However, only a small fraction of the variability of this rapid fluctuation is captured by recent ITI history. A more thorough investigation of this real-time update rule would need to be done in the future.

      Reviewer #3 (Public Review):

      Summary:

      Floeder and colleagues measure dopamine signaling in the nucleus accumbens core using fiber photometry of the dLight sensor, in Pavlovian and instrumental tasks in mice. They test some predictions from a recently proposed model (ANCCR) regarding the existence of "ramps" in dopamine that have been seen in some previous research, the characteristics of which remain poorly understood.

      They find that cues signaling a progression toward rewards (akin to a countdown) specifically promote ramping dopamine signaling in the nucleus accumbens core, but only when the intertrial interval just experienced was short. This work is discussed in the context of ongoing theoretical conceptions of dopamine's role in learning.

      Strengths:

      This work is the clearest demonstration to date of concrete training factors that seem to directly impact whether or not dopamine ramps occur. The existence of ramping signals has long been a feature of debates in the dopamine literature and this work adds important context to that. Further, as a practical assessment of the impact of a relatively simple trial structure manipulation on dopamine patterns, this work will be important for guiding future studies. These studies are well done and thoughtfully presented.

      We thank the reviewer for recognizing the context that our study adds to the dopamine literature and the potential for our experiments to guide future work.ย 

      Weaknesses:

      It remains somewhat unclear what limits are in place on the extent to which an eligibility trace is reflected in dopamine signals. In the current study, a specific set of ITIs was used, and one wonders if the relative comparison of ITI/history variables ("shorter" or "longer") is a factor in how the dopamine signal emerges, in addition to the explicit length ("short" or "long") of the ITI. Another experimental condition, where variable ITIs were intermingled, could perhaps help clarify some remaining questions.

      Though we used ITIs of fixed means, due to the exponential nature of their distribution, we did intermingle ITIs of various durations in both our long and short ITI conditions. The distribution of ITI durations is visualized in Figure 1c for Pavlovian conditioning and Extended Data Figure 9b for VR navigation.ย 

      The relative comparison between consecutive ITIs was not something we originally explored, so we thank the reviewer for wondering how it impacts the dopamine signal. To investigate this, we quantified both the change in ITI (+ or - ฮ” ITI for relatively longer or shorter, respectively) and the change in dopamine ramp slope between consecutive trials in the SD condition (Figure 3d). Across each mouse separately, we found a significantly negative relationship between ฮ” slope and ฮ” ITI (Figure 3e-f). Also, the average ฮ” slope was significantly greater for consecutive trials with a ฮ” ITI below -1 s compared to trials with a ฮ” ITI above +1 s (Figure 3g). Altogether, these findings suggest that relative comparison of ITIs does correlate with changes in the dopamine signal; a relatively longer ITI tends to have a weaker ramp, which fits in nicely with the expected inverse relationship between ITI and dopamine ramp slope from our ANCCR model.

      In both tasks, cue onset responses are larger, and longer on long ITI trials. One concern is that this larger signal makes seeing a ramp during the cue-reward interval harder, especially with a fluorescence method like photometry. Examining the traces in Figure 1i - in the long, dynamic cue condition the dopamine trace has not returned to baseline at the time of the "ramp" window onset, but the short dynamic trace has. So one wonders if it's possible the overall return to baseline trend in the long dynamic conditions might wash out a ramp.

      This is a good point, and we thank the reviewer for raising it. Certainly, the cue onset response is significantly larger in long ITI conditions (see Figure 1i-j and Figure 4h-j). To avoid any bleed over effect, we intentionally chose ramp window periods during later portions of the trial (in line with work from others e.g., Kim et al., Cell, 2020). While the cue onset dopamine pulse seems to have flatlined by the start of the ramp window period, the dopamine levels clearly remain elevated relative to pre-cue baseline. This type of signal has been observed with fiber photometry in other Pavlovian conditioning paradigms with long cue durations (e.g., Jeong et al., Science, 2022). Because of the persistently elevated dopamine levels, it is certainly possible that a ramping signal during the cue is getting washed out; with the bulk fluorescence photometry technique we employed in this study, this possibility is unfortunately difficult to completely rule out. However, the long ITI/fixed tone (LF) condition could serve as a potential control given the overall similarity in the dopamine signal between the LF and LD conditions: both conditions have large cue onset responses with elevated dopamine throughout the duration of the cue (see Extended Data Figures 2c and 3c). Critically, the LD condition lacks a noticeable ramp despite the dynamic tone providing information on temporal proximity to reward, which is thought to be necessary for dopamine ramps to occur. Importantly, regardless of whether a ramp is masked in the long ITI dynamic condition, most studies investigate such a condition in isolation and would report the absence of dopamine ramps. Thus, at a descriptive level, we believe it remains true that observable dopamine ramps are only present when the ITI is short.ย 

      Not a weakness of this study, but the current results certainly make one ponder the potential function of cue-reward interval ramps in dopamine (assuming there is a determinable function). In the current data, licking behavior was similar on different trial types, and that is described as specifically not explaining ramp activity.

      We agree that this work naturally raises the question of the function of dopamine ramps. However, selective and precise manipulation of only the dopamine ramps without altering other features such as phasic responses, or inducing dopamine dips, is highly technically challenging at this moment; due to this challenge, we intentionally focused on the conditions that determine the presence or absence of dopamine ramps rather than their function. We agree with the reviewer that studying the specific function of dopamine ramps is an interesting future question.ย 

      Reviewing Editor:

      The reviewers felt the results are of considerable and broad interest to the neuroscience community, but that the framing in terms of ANCCR undermined the scope of the findings as did the brief nature of the formatting of the manuscript. In addition, the reviewers felt that the relationship between ramp dynamics, behavior, and ITI conditions requires more in-depth analyses. Relatedly, the lack of counterbalancing of the ITI durations was considered to be a drawback and needs to be addressed as it may affect the baseline. Addressing these issues in a satisfactory manner would improve the assessment of the manuscript to important/convincing.

      We truly appreciate the valuable feedback provided on this manuscript by all three reviewers and the reviewing editor. Based on this input, we have significantly revised the manuscript to address the issues brought up by the reviewers. Firstly, we have conducted additional experiments to counterbalance the ITI conditions for Pavlovian conditioning; this strengthened our results by confirming our original findings that ITI duration, rather than training order, is the key variable controlling the presence or absence of dopamine ramps. Secondly, we completed more rigorous analyses to further explore the relationship between dopamine dynamics, animal behavior, and ITI duration; we generally found no significant correlations between these variables, with a notable exception being our main finding between ITI duration and dopamine ramp slope. Finally, we revised and expanded our writing to both explain predictions from our ANCCR model in less technical language and explore how alternative theoretical frameworks could potentially explain our findings. In doing so, we hope that our manuscript is now more accessible and of interest to a broad audience of neuroscience readers.

      Reviewer #1 (Recommendations For The Authors):

      The study could be improved if the authors performed a more detailed comparison of how other theoretical frameworks, beyond ANCCR could account for the observed findings. Also, the correlation analysis presented in the panel I of Figure 3 seems unnecessary and potentially spurious, as the slope of the correlation is clearly mostly driven by the categorical differences between the two ITI conditions, which were combined for the analysis - it's not clear what is the value of this analysis beyond the group comparison presented in the following panel.

      Again, we thank the reviewer for elaborating on their concern regarding Figure 3l โ€“ we have removed it from the revised Figure 4.ย 

      The relationship between ramp dynamics with the behavior and the large differences in cue onset responses between short and long ITI conditions could have been better explored. If I understand correctly the overarching proposal of this and other publications by this group, then the differences in cue responses is determined by the spacing of rewards in a somewhat similar way that the ramps are. So, is there a trial-by-trial correlation between the amplitude of the cue responses and the slope of the ramps? Is there a correlation between any of these two measures with the licking behavior, and if so, does it change with the ITI condition? A more thorough exploration of these relationships would help support the proposal of the primacy of inter-event spacing in determining the different types of dopamine responses in learning.

      There are certainly interesting relationships between dopamine dynamics, behavior, and ITI that we failed to explore in our original manuscript โ€“ we appreciate the reviewer bringing them up. We found no correlation between dopamine ramp slope and cue onset response in either the SD or LD condition (Extended Data Fig 8a-b). Moreover, we found no correlation between either of these variables and the trial-by-trial licking behavior (Extended Data Fig 8c-f). Finally, there is no relationship between licking behavior and previous ITI duration (Extended Data Fig 8g-h), suggesting that behavioral differences do not account for differences in the dopamine ramp slope. Together, the lack of significant relationships between these other variables highlights the specific, clear relationship between ITI duration and dopamine ramp slope.ย 

      Finally, another issue I feel could have been better discussed is how the particular settings of both tasks might be biasing the results. For example, there is an issue to be considered about how the dopamine ramp dynamics reported here, especially the requirement of a dynamic cue for ramps to be present, square with the previous published results by one of the authors - Mohebi et al, Nature, 2019. In that manuscript, rats were executing a bandit task where, to this reviewer's understanding, there was no explicit dynamic cue aside from the standard sensory feedback of the rats moving around in the behavior boxes to approach a nose poke port. Is the idea that this sensory feedback could function as a dynamic cue? If that's the case, then this short-scale, movement-related feedback should also function as a dynamic cue in a freely moving Pavlovian condition, when the animals must also move towards a reward delivery port, right? Therefore, could it be that the experimental "requirement" of a dynamic cue is only present in a head-fixed condition? One could phrase this in a different way to Steelman and potentially further the authors' proposal: perhaps in any slightly more naturalistic setting, the interaction of the animals with their environment always functions as a dynamic cue indicating proximity to reward, and this relationship was experimentally isolated by the use of head fixation (but not explicitly compared with a freely moving condition) in the present study. I think that would be an interesting alternative to consider and discuss, and perhaps explore experimentally at some point.

      We thank the reviewer for raising this important point regarding the influence of our experimental settings on our results. At first glance, it could appear that our results demonstrating the necessity of a dynamic cue for ramps in a head-fixed setting do not fit neatly with other results in a freely moving setup (e.g., Collins et al., Scientific Reports, 2016; Mohebi et al., Nature, 2019). Exactly as the reviewer states though, we believe that sensory feedback from the environment in freely moving preparations serves the same function as a dynamic progression of cues. We have considered the implications of methodological differences between head-fixed and freely moving preparations in the discussion section.ย 

      Reviewer #2 (Recommendations For The Authors):

      This comment relates indirectly to comment 3, in that the authors intermix theory throughout the manuscript. I think this would be fine if the experiment was framed directly in terms of ANCCR, but the authors specifically mention that this experiment wasn't developed to distinguish between different theories. As such, it seems difficult to assess the scope of the comments regarding theory within the paper because they tend to be specifically related to ANCCR. For instance, the last comment has broad implications of how the ramp might be related to the overall reward rate, an interesting finding that constrains classes of dopamine models rather than evidence just for ANCCR. Perhaps adding a discussion section that allows the authors to focus more on theory would be beneficial for this manuscript.

      We appreciate this suggestion by the reviewer. We have updated both our introduction and discussion sections to elaborate more thoroughly on theory.

      Reviewer #3 (Recommendations For The Authors):

      The paper could potentially benefit from the use of more accessible language to describe the conceptual basis of the work, and the predictions, and a bit of reformatting away from the brief structure with lots of supplemental discussion.

      For example, in the introduction, the line - "Varying the ITI was critical because our theory predicts that the ITI is a variable controlling the eligibility trace time constant, such that a short ITI would produce a small time constant relative to the cue-reward interval (Supplementary Note 1)". As far as I can tell, this is meant to get across the notion that dopamine represents some aspect of the time between rewards - dopamine signals will differ for cues following short vs long intervals between rewards.

      As written, the language of the paper takes a fair bit of parsing, but the notions are actually pretty simple. This is partly due to the brief format the paper is written in, where familiarity with the previous papers describing ANCCR is assumed.

      From a readability standpoint, and the potential impact of the paper on a broad audience, perhaps this could be considered as a point for revision.

      We thank the reviewer for pointing out the drawbacks of our technical language and brief formatting. To address this, we have removed the majority of the supplementary notes and expanded our introduction and discussion sections. In doing so, we hope that the conceptual foundations of this work, and potential alternative theoretical explanations, are accessible and impactful for a broad audience of readers.

    1. eLife Assessment

      This valuable study by Wu and Zhou combines neurophysiological recordings and computational modelling to address an interesting question regarding the sequence of events from sensing to action. Neurophysiological evidence remains incomplete: explicit mapping of saccade-related activity in the same neurons and a better understanding of the influence of the spatial configuration of stimulus and targets would be required to pinpoint whether such activity might contribute, even partially, to the observed results and interpretations. These results are of interest for neuroscientists investigating decision-making.

    2. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors recorded activity in the posterior parietal cortex (PPC) of monkeys performing a perceptual decision-making task. The monkeys were first shown two choice dots of two different colors. Then, they saw a random dot motion stimulus. They had to learn to categorize the direction of motion as referring to either the right or left dot. However, the rule was based on the color of the dot and not its location. So, the red dot could either be to the right or left, but the rule itself remained the same. It is known from past work that PPC neurons would code the learned categorization. Here, the authors showed that the categorization signal depended on whether the executed saccade was in the same hemifield as the recorded PPC neuron or in the opposite one. That is, if a neuron categorized the two motion directions such that it responded stronger for one than the other, then this differential motion direction coding effect was amplified if the subsequent choice saccade was in the same hemifield. The authors then built a computational RNN to replicate the results and make further tests by simulated "lesions".

      Strengths:

      Linking the results to RNN simulations and simulated lesions.

      Weaknesses:

      Potential interpretational issues due to a lack of explicit evidence on the sizes and locations of the response fields of the neurons. For example, is the contra/ipsi effect explained by the fact that in the contra condition, the response target and the saccade might have infringed on the outer edges of the response fields?

    3. Author response:

      The following is the authorsโ€™ response to the original reviews

      Reviewer #1 (Public Review):

      Summary:

      This valuable study by Wu and Zhou combined neurophysiological recordings and computational modelling to investigate the neural mechanisms that underpin the interaction between sensory evaluation and action selection. The neurophysiological results suggest non-linear modulation of decision-related LIP activity by action selection, but some further analysis would be helpful in order to understand whether these results can be generalised to LIP circuitry or might be dependent on specific spatial task configurations. The authors present solid computational evidence that this might be due to projections from choice target representations. These results are of interest for neuroscientists investigating decision-making.

      Strengths:

      Wu and Zhou combine awake behaving neurophysiology for a sophisticated, flexible visual-motion discrimination task and a recurrent network model to disentangle the contribution of sensory evaluation and action selection to LIP firing patterns. The correct saccade response direction for preferred motion direction choices is randomly interleaved between contralateral and ipsilateral response targets, which allows the dissociation of perceptual choice from saccade direction.

      The neurophysiological recordings from area LIP indicate non-linear interaction between motion categorisation decisions and saccade choice direction.

      The careful investigation of a recurrent network model suggests that feedback from choice target representations to an earlier sensory evaluation stage might be the source for this non-linear modulation and that it is an important circuit component for behavioural performance.

      The paper presents a possible solution to a central controversy about the role of LIP in perceptual decision-making, but see below.

      Weaknesses:

      The paper presents a possible solution to a central controversy about the role of LIP in perceptual decision-making. However, the authors could be more clear and upfront about their interpretational framework and potential alternative interpretations.

      Centrally, the authors' model and experimental data appears to test only that LIP carries out sensory evaluation in its RFs. The model explicitly parks the representation of choice targets outside the "LIP" module receiving sensory input. The feedback from this separate target representation provides then the non-linear modulation that matches the neurophysiology. However, they ignore the neurophysiological results that LIP neurons can also represent motor planning to a saccade target.

      The neurophysiological results with a modulation of the direction tuning by choice direction (contralateral vs ipsilateral) are intriguing. However, the evaluation of the neurophysiological results are difficult, because some of the necessary information is missing to exclude alternative explanations. It would be good to see the actual distributions and sizes of the RF, which were determined based on visual responses not with a delayed saccade task. There might be for example a simple spatial configuration, for example, RF and preferred choice target in the same (contralateral) hemifield, for which there is an increase in firing. It is a shame that we do not see what these neurons would do if only a choice target would be put in the RF, as has been done in so many previous LIP experiments. The authors exclude also some spatial task configurations (vertical direction decisions), which makes it difficult to judge whether these data and models can be generalised. The whole section is difficult to follow, partly also because it appears to mix reporting results with interpretation (e.g. "feedback").

      The model and its investigation is very interesting and thorough, but given the neurophysiological literature on LIP, it is not clear that the target module would need to be in a separate brain area, but could be local circuitry within LIP between different neuron types.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors recorded activity in the posterior parietal cortex (PPC) of monkeys performing a perceptual decision-making task. The monkeys were first shown two choice dots of two different colors. Then, they saw a random dot motion stimulus. They had to learn to categorize the direction of motion as referring to either the right or left dot. However, the rule was based on the color of the dot and not its location. So, the red dot could either be to the right or left, but the rule itself remained the same. It is known from past work that PPC neurons would code the learned categorization. Here, the authors showed that the categorization signal depended on whether the executed saccade was in the same hemifield as the recorded PPC neuron or in the opposite one. That is, if a neuron categorized the two motion directions such that it responded stronger for one than the other, then this differential motion direction coding effect was amplified if the subsequent choice saccade was in the same hemifield. The authors then built a computational RNN to replicate the results and make further tests by simulated "lesions".

      Strengths:

      Linking the results to RNN simulations and simulated lesions.

      Weaknesses:

      Potential interpretational issues due to a lack of evidence on what happens at the time of the saccades.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The neurophysiological results with a modulation of the direction tuning by choice direction are intriguing. However, the evaluation of the neurophysiological results are difficult because some of the necessary information is missing to exclude alternative explanations.

      We thank the reviewer for the helpful comments. We have addressed this point in detail in the following response.

      (a) Clearly state in the results how the response field "RF", where the stimulus was placed, was mapped. The methods give as "MGS"" (i.e., spatial selectivity during stimulus presentation and delay)" task rather than the standard delayed saccade. And also "while for those neurons which did not show a clear RF during the MGS task, we presented motion stimuli in the positions (always in the visual field contralateral to the recorded hemisphere) in which neurons exhibited the strongest response to the motion stimuli." All this sounds more like a sensory receptive field not an eye movement response filed". What was the exact task and criterion?

      We agree with the reviewer that the original description of how we mapped the response fields (RFs) of LIP neurons lacked sufficient detail. In this study, we used the memory-guided saccade (MGS) task to map the RFs of all isolated LIP neurons. Both MGS and delayed saccade tasks are commonly used to map a neuron's response field in previous decision-making studies.

      In the MGS task, monkeys initially fixate on the center of the screen. Subsequently, a dot randomly flashes at one of the eight possible locations surrounding the fixation dot with an eccentricity of 8 degree, requiring the monkeys to memorize the location of the flashed dot. After a delay of 1000 ms, the monkeys are instructed to saccade to the remembered location once the fixation dot disappears. The MGS task is a standard behavior task for mapping visual, memory, and motor RFs, particularly in brain regions involved in eye movement planning and control, such as LIP, FEF, and the superior colliculus.

      We believe the reviewer's confusion may stem from whether we mapped the visual, memory, or motor RFs of LIP neurons in the current study, as these "RFs" are not always consistent across individual neurons. In our study, we primarily mapped the visual and memory RFs of each LIP neuron by analyzing their activity during both the target presentation and delay periods. To focus on sensory evaluation-related activity, we presented the visual motion stimulus within the visual-memory RF of each neuron. For neurons that did not show a significant visual-memory RF, we used a different approach: we tested the neurons with the main task by altering the spatial configuration of the task stimuli to identify the visual field that elicited the strongest response when the motion stimulus was presented within it. This approach was used to guide the placement of the stimulus during the recording sessions.

      Following the reviewerโ€™s suggestion, we have added the following clarification to the results section to better describe how we mapped the RF of LIP neurons:

      โ€˜We used the memory-guided saccade (MGS) task, which is commonly employed in LIP studies, to map the receptive fields (RFs) of all isolated LIP neurons. Specifically, we mapped both the visual and memory RFs of each neuron by analyzing their activity during the target presentation and delay periods of the MGS task (see Methods).โ€™.

      (b) l.85 / l126: What do you mean by "orthogonal to the axis of the neural RF" - was the RF shape asymmetric, if so how did you determine this? OR do you mean the motion direction axis? Please explain.

      We realized that the original description of this point may have been unclear and could lead to confusion. The axis of the neural RF refers to the line connecting the center of the RF (which coincides with the center of the motion stimulus) to the fixation dot. We have revised this sentence in the revised manuscript as follows:

      โ€˜To examine the neural activity related to the evaluation of stimulus motion, we presented the motion stimuli within the RF of each neuron, while positioning the saccade targets at locations orthogonal to the line connecting the center of the RF (which also marks the center of the motion stimulus) and the fixation dot.โ€™

      (c) Behavioural task. Figure 1 - are these example session? Please state this clearly. Can you show the examples (psychometric function and reaction times) separated for trials where correct choice direction aligning with the motion preference (within 90 degrees) and those that did not?

      Figure 1 shows the averaged behavioral results from all recording sessions. We have added this detail in the revised legend of Figure 1.

      We are uncertain about the reviewerโ€™s reference to the โ€œcorrect choice direction aligning with the motion preference,โ€ as the term โ€œmotion preferenceโ€ is specific to the neuron response, which are different for different neurons recorded simultaneously using multichannel recording probe.

      Nonetheless, following the reviewerโ€™s suggestion, we grouped the trials in each recording session into two groups based on the relationship between the saccade direction and the preferred motion direction of the identified LIP neuron during one example single-channel recording. Both the RT and the performance accuracy during one example session were shown in the following figure.

      Author response image 1.

      Give also the performance averaged across all sites included in this study and range.<br /> If performance does differ for different configuration, please, show that the main modulatory effect does not align with this distinction.

      To clarify this point, we have plotted performance accuracy and RTs for horizontal, oblique, and vertical target position configurations separately, which are shown for both monkeys in the following figures. We did not observe any systematic influences of task configurations on the monkeys' performance accuracy. While the RTs did differ across different configurations, we believe these differences are likely attributable to several factors, such as varying levels of familiarity introduced by our training process and the intrinsic RT difference between different saccade directions.

      Author response image 2.

      (d) Show the distribution of RF positions and the direction preferences for the recording sites included in the quantitative analysis of this study. (And if available, separately those excluded).

      Following the reviewerโ€™s suggestion, we have plotted the centers of the RFs for all neurons with identifiable RFs, categorizing them by their preferred motion directions. To determine each neuronโ€™s RF, we analyzed the average firing rates from both the target presentation and delay periods during each trial of the memory-guided saccade (MGS) task. The RF centers of neurons with significant RFs were determined through a two-step process. First, we selected neurons that exhibited significant RFs in the MGS based on the following criteria: 1) there must be a significant activity difference between the eight target locations, and 2) the mean activity during the selected periods should be significantly greater than the baseline activity during the fixation period. Second, we fitted the activity data from the eight conditions to a Gaussian distribution, using the center of the fitted distribution as the RF center. A significant proportion of neurons from both monkeys that exhibited significant response to motion stimuli did not exhibited notable RFs based our current method. The following figures show the distributions of RFs and motion direction preference for all LIP neurons with identifiable RFs separately for each monkey. Since this is not the focus of the current study, we are not planning to include this result in the revised manuscript.

      Author response image 3.

      (e) Following on from d), was there a systematic relationship between RF position or direction preference and modulation by choice direction? For instance could the responses be simply explained by an increase in modulation for choices into the same (contralateral) hemifield as where the stimulus was placed?

      The reviewer raised a good point. To address whether there was a systematic relationship between RF position or direction preference and modulation by choice direction, we calculated a modulation index for each neuron to quantify the influence of saccade direction on neuronal responses to motion stimuli. We then plotted the modulation index against the RF position for each LIP neuron, shown as following:

      Author response image 4.

      As shown in the figures above, neurons with RFs farther from the horizontal meridian were more likely to exhibit stronger modulation by the saccade direction, while neurons with RFs closer to the horizontal meridian showed inconsistent and weaker modulation. This is because when the RFs was on the horizontal meridian, saccade directions were aligned with the vertical axis (with no contralateral or ipsilateral directions). This is consistent with the finding in Figure S3โ€”no significant differences in direction selectivity between the CT and IT conditions in the data sessions where the saccade targets were aligned close to the vertical direction. Since fewer than half of the identified neurons showed clear receptive fields using our method, the figure above did not include all the neurons used in the analysis in the manuscript. Therefore, we chose not to include this figure in the revised manuscript.

      Additionally, we quantified the relationship between the modulation index and direction preference for neurons in sessions where the monkeysโ€™ saccades were aligned to either horizontal or oblique directions. As shown in the following figure, no systematic relationship was found between direction preference and modulation by the choice direction for LIP neurons at the population level.

      Author response image 5.

      We have added this result as Figure S 2 in the revised manuscript.

      Notably, the observed modulation of saccade direction on LIP neuronsโ€™ response to motion stimuli cannot be simply explained by saccade direction selectivity. We presented two more evidence to rule out such possibility in the original manuscript. First, the modulation effect we observed was nonlinear; specifically, the firing rate of neurons increased for the preferred motion direction but decreased for the non-preferred motion direction (Figure 2i and Figure S1A-D). This phenomenon is unlikely to be attributed to a linear gain modulation driven by saccade directions. Second, we plotted the averaged neural activity for contralateral and ipsilateral saccade directions separately, and found that LIP neurons showed similar levels of activity between two saccade directions (revised Figure 2L).

      Additionally, we added a paragraph in the Methods section to describe the way we calculated modulation index as follows:

      โ€œWe have calculated a modulation index for each neuron to reflect the influence of saccade direction on neuronโ€™s response to visual stimuli. The modulation index is calculated as:

      where represents the average firing rate from 50ms to 250ms after sample onset for all contralateral saccade trails with a neuronโ€™s preferred moving direction of visual stimuli. The naming conventions are the same for , , and . An MI value between 0 and 1 indicate higher modulation in contralateral saccade trials, and an MI value between -1 and 0 indicates higher modulation in ipsilateral saccade trials.โ€

      Please split Figures 2G,H,I J,K, by whether the RF was located contralaterally or ipsilaterally. If there are only a small number of ipsilateral RFs, please show these examples, perhaps in an appendix.

      This is a reasonable suggestion; however, it is not applicable to our study. Among all the neurons included in our analysis, only one neuron from each monkey exhibited ipsilateral receptive fields (RFs). Therefore, we believe it may not be necessary to plot the result for this outlier.

      (f) Were the choice targets always equi-distant from the stimulus and at what distance was this? Please give quantitative details in methods.

      The review was correct that the choice targets were always equidistant form the stimulus. The distance between the motion stimulus and the target was typically 12-15 degree. We have added the details in the revised Methods section as follows:

      โ€˜Therefore, the two saccade targets were equidistant from the stimulus, with the distance typically ranging from 12 to 15 degrees.

      (2) For Figure 3E, how do you explain that there is an up regulation of for contralateral choices before the stimulus onset, i.e. before the animal can make a decision? Is this difference larger for error trials?

      This is a good question, which we have attempted to clarify in the revised manuscript. We believe that the observed upregulation in neural activity for contralateral choices may reflect the monkeysโ€™ internal choice bias or expectation (choice between two motion directions) prior to stimulus presentation, which could influence their subsequent decisions. In Figure 3E, we calculated the r-choice to assess the correlation between the neuronโ€™s direction selectivity and the monkeysโ€™ decisions on motion stimuli, separately for contralateral and ipsilateral choice conditions. The increased r-decision during the pre-stimulus period indicates stronger neural activity for trials in which the monkeys later reported that the upcoming stimulus was in the preferred direction, and weaker activity for trials where the stimulus was judged to be in the non-preferred direction. This correlation was more pronounced for contralateral choices than for ipsilateral ones. It is important to note that while the monkeys cannot predict the upcoming stimulus direction with greater-than-chance accuracy, these results suggest that pre-stimulus neural activity in LIP is correlated with the monkeysโ€™ eventual decision for that trial. Furthermore, LIP neural activity was more strongly correlated with the monkeysโ€™ decisions in the contralateral choice condition compared to the ipsilateral one.

      Additionally, we clarify that the r-decision was calculated using both correct and error trials. When comparing Figure 2J with Figure 2K, the correlation between neural activity and the monkeysโ€™ upcoming decision during the pre-stimulus period was most prominent in low- and zero-coherence trials, where the monkeys either made more errors or based decisions on guesswork. We infer that the monkeys' confidence in these decisions was likely lower compared to high-coherence trials. Thus, the decision process appears to be influenced by pre-stimulus neural activity, particularly in low-coherence and zero-coherence trials.

      Although it is unclear precisely what covert process this pre-stimulus activity reflects, similar patterns of choice-predictive pre-stimulus activity have been observed in LIP and other brain areas (Shadlen, M.N. and Newsome,T.W., 2001; Coe, B., at al. 2002; Baso, M.A. and Wurtz, R.H., 1998; Z. M. Williams at al. 2003). We have clarified this point in the revised manuscript, including a revision of the relevant sentence in the Results section for clarity, shown as follows:

      โ€œFurthermore, we used partial correlation analysis to examine decision- and stimulus-related components of DS (i.e., r-decision and r-stimulus, Figure 3E and 3F) using all four coherence levels. The decision-related component of LIP DS was significantly greater in the CT condition than in the IT condition (Figure 3E; nested ANOVA: P = 1.07e-6, F= 25.72), and this difference emerged even before motion stimulus onset. This suggests that the LIP DS was more closely correlated with monkeysโ€™ decisions in the CT condition than in the IT condition. The upregulation in r-decision for contralateral choices may reflect the monkeysโ€™ internal choice bias or expectation (choice between two motion directions) prior to stimulus presentation, which could influence their subsequent decisions more in the CT conditionโ€

      (3) Figure 2K: what is the very large condition-independent contribution? It almost seems as most of what these neurons code for is neither saccade or motion related.

      The condition-independent contribution is the time-dependent component that is unrelated to saccade, motion, or their interaction. Our findings are consistent with previous methodological studies, where this time-dependent component was shown to account for a significant portion of the variance in population activity (Kobak, D. et al., 2016)

      (4) Abstract:

      a) "We found that the PPC activity related to monkeys' abstract decisions about visual stimuli was nonlinearly modulated by monkeys' following saccade choices directing outside each neuron's response field."

      This sentence is not clear/precise in two regards:

      Should "directing" be "directed"?

      Also, it is not just saccades directed outside the RF, but towards the contralateral hemifield.

      We thank the reviewer for the suggestion. We agree that โ€˜directingโ€™ should be โ€˜directedโ€™ and revised it accordingly. However, we do not believe that โ€˜directed outside each neuron's response fieldโ€™ should be replaced with โ€œtowards the contralateral hemifieldโ€. There are two major reasons. First, the modulation effect was identified as the difference between contralateral and ipsilateral saccade directions. We cannot conclude that the modulation mainly happened in the contralateral saccade direction. Second, we used โ€˜directed outside each neuron's response fieldโ€™ to emphasize that this modulation cannot be simply explained by saccade direction selectivity, whereas โ€˜towards the contralateral hemifieldโ€™ cannot fulfill this purpose.

      (b) " Recurrent neural network modeling indicated that the feedback connections, matching the learned stimuli-response associations during the task, mediated such feedback modulation."

      - should be "that feedback connection .... might mediate". A model can only ever give a possible explanation.

      Thanks for the help on the writing again! We have revised this sentence as following: โ€œRecurrent neural network modeling indicated that the feedback connections, matching the learned stimuli-response associations during the task, might mediate such feedback modulation.โ€

      (c) "thereby increasing the consistency of flexible decisions." I am not sure what is really meant by increasing the consistency of flexible decisions? More correct or more the same?

      We apologize for the confusion. In the manuscript, "decision consistency" refers to the degree of agreement in the model's decisions under specific conditions. A higher decision consistency indicates that the model is more likely to produce the same choice when encountering encounters a stimulus in that condition. We have incorporated your suggestion and revise this sentence as โ€œthereby increasing the reliability of flexible decisionsโ€. We also clarified the definition of consistency in the main text as follows:

      โ€œThese disrupted patterns of saccade DS observed in the target module following projection-specific inactivation aligned with the decreased decision consistency of RNNs, where decision consistency reflects the degree of agreement in the model's choices under specific task conditions. This suggests a diminished reliance on sensory input and an increased dependence on internal noise in the decision-making process.โ€.

      (5) Results: headers should be changed to reflect the actual results, not the interpretation:

      "Nonlinear feedback modulation of saccade choice on visual motion selectivity in LIP"

      "Feedback modulation specifically impacted the decision-correlated activity in LIP"

      These first parts of the results describe neurophysiological modulations of LIP activity, the source cannot be known from the presented data alone. I thought that this feedback is suggested by the modelling results in the last part of the results. It is confusing to the reader that the titles already refer to the source of the modulation as "feedback". The titles should more accurelty describe what is found, not pre-judge the interpretation.

      We thank the reviewer for those valuable suggestions. We have updated the subtitles to: โ€œNonlinear modulation of saccade choice on visual motion selectivity in LIPโ€ and โ€œDecision-correlated but not stimulus-correlated activity was modulated in LIP.โ€

      (6) page 8, l366-380. Can you link the statements more directly to panels in Figure 6. For Figure 6H-K, it needs to be clarified that the headers for 6D-G also apply to H-K.

      ยญWe have added headers for Figure 6H-K in the revised version, and revised the corresponding results section as follows.

      โ€˜We further examined how the energy landscape in the 1-D subspace changed in relation to task difficulty (motion coherence). Consistent with prior findings, trials with lower decision consistency (trials using lower motion coherence) exhibited shallower attractor basins at the time of decision for all types of RNNs (Fig. 6H-K). However, both the depth and the positional separation of attractor basins in the network dynamics significantly decreased for all non-zero motion coherence levels after the ablation of all feedback connections (comparing Figure 6I with Figure 6H; P(depth) = 5.20e-25, F = 122.80; P(position) = 1.82e-27, F = 137.75; two-way ANOVA). Notably, this reduction in basin depth and separation was more pronounced in the specific group compared to the nonspecific groups after ablating the feedback connections (comparing Figure 6J with Figure 6K; P(depth) = 2.65e-13, F =57.35; P(position) = 3.73e-14, F = 61.79; two-way ANOVA). These results might underlie the computational mechanisms that explain the observed reduction in the decision consistency of RNNs following projection-specific inactivation: the shallower and closer attractor basins after ablating feedback connections resulted in less consistent decisions. This happened because the variability in neural activity made it more likely for population activity to stochastically shift out of the shallower basins and into nearby alternative ones.โ€™

      (7) line 556-557: Please provide a reference or data for the assertion that nearby recording sites in LIP (100 microns apart) have similar RFs.

      The reviewer raised an interesting question that we are unable to address in depth with the current data, as we lack information on the specific cortical location for each recording session. In the original manuscript, we suggested that nearby recording sites in LIP have similar receptive fields (RFs), based on both our own experience with LIP recordings and previous studies. Specifically, we observed that neurons recorded within a single penetration using a single-channel electrode typically exhibited similar RFs. Similarly, the majority of neurons recorded from the same multichannel linear probe within a single session also showed comparable RFs. Additionally, several studies (both electrophysiological and fMRI) have reported topographic organization of RFs in LIP (Gaurav H. Patel et al., 2010; S. Ben Hamed et al., 2001; Gene J. Blatt et al., 1990).

      (8) Line 568, Methods: a response criterion of a maximum firing rate of 2 spikes/s seems very low, especially for LIP. How do the results change if this lifted to something more realistic like 5 spikes/s or 10 spikes/s?

      We chose this criterion to ensure we included as many neurons as possible in our analysis. To further clarify, we have plotted the distribution of maximum firing rates across all neurons. Based on our findings, relaxing this criterion is unlikely to affect the results, as the majority of neurons exhibit maximum firing rates well above 5 spikes/s, and many exceed 10 spikes/s. We hope this explanation addresses the concern.

      Author response image 6.

      Reviewer #2 (Recommendations For The Authors):

      In this manuscript, the authors recorded activity in the posterior parietal cortex (PPC) of monkeys performing a perceptual decision-making task. The monkeys were first shown two choice dots of two different colors. Then, they saw a random dot motion stimulus. They had to learn to categorize the direction of motion as referring to either the right or left dot. However, the rule was based on the color of the dot and not its location. So, the red dot could either be to the right or left, but the rule itself remained the same. It is known from past work that PPC neurons would code the learned categorization. Here, the authors showed that the categorization signal depended on whether the executed saccade was in the same hemifield as the recorded PPC neuron or in the opposite one. That is, if a neuron categorized the two motion directions such that it responded stronger for one than the other, then this differential motion direction coding effect was amplified if the subsequent choice saccade was in the same hemifield. The authors then built a computational RNN to replicate the results and make further tests by simulated "lesions".

      The data are generally interesting, and the manuscript is generally well written (but see some specific comments below on where I was confused). However, I'm still not sure about the conclusions. The way the experiment is setup, the "contra" saccade target is essentially in the same hemifield as the motion patch stimulus. Given that the RF's can be quite large, isn't it important to try to check whether the saccade itself contributed to the effects? i.e. if the RF is on the left side, and the "contra" saccade is to the left, then even if it is orthogonal to the location of the stimulus motion patch itself, couldn't the saccade still be part of a residual edge of the RF? This could potentially contribute to elevating the firing rate on the preferred motion direction trials. I think it would help to align the data on saccade onset to see what happens. It would also help to have fully mapped the neurons' movement fields by asking the monkeys to generate saccades to all screen locations in the monitor. The authors mention briefly that they used a memory-guided saccade task to map RF's, but it is also important to map with a visual target. And, in any case, it would be important to show the mapping results aligned on saccade onset.

      Another comment is that the authors might want to mention this other recent related paper by the Pack group: https://www.biorxiv.org/content/10.1101/2023.08.03.551852v2.full.pdf

      We thank the reviewer for the comments and realized that we did not explain our results clearly in the original manuscript. We agree with the reviewer that saccade direction selectivity might be a confounding factor for the modulation of the saccade choice direction onto LIP neuronsโ€™ activity responded to visual motion stimuli. Because the RFs of LIP neurons might be large and the saccade target might be presented within the edge of the RFs. However, we believe that the observed modulation of saccade direction on LIP neuronsโ€™ response to motion stimuli cannot be simply explained by saccade direction selectivity. We presented several pieces of evidence to rule out such possibility. First, the modulation effect we observed was not linear; specifically, the firing rate of neurons increased for the preferred motion direction but decreased for the non-preferred motion direction (Figure 2i and Figure S1A-D). This phenomenon is unlikely to be attributed to a linear gain modulation driven by saccade directions. Second, we plotted the averaged neural activity for contralateral and ipsilateral saccade directions separately, aligned the activity to either motion stimulus onset or saccade onset, and found that LIP neurons showed similar levels of activity between the contralateral and ipsilateral directions (revised Figure 2L), which is not consistent with obvious saccade direction selectivity.

      To better control for this confound, we have added figures plotting the mean neural activity aligned to saccade onset for both contralateral and ipsilateral saccades, which are now included in the revised main Figure 2. These figures are presented in the detailed response below. Additionally, we have revised the corresponding results section to clarify our points, as outlined below:

      โ€œFigure 2A-2F shows three example LIP neurons that exhibited significant motion coherence correlated DS. Surprisingly, LIP neurons showed greater DS in the CT condition than in the IT condition, even though the same motion stimuli were used in the same spatial location for both conditions. The averaged population activity showed this DS difference between CT and IT conditions for all four coherence levels (Figure 2G, 2H). During presentation of their preferred motion direction, LIP neurons showed significantly elevated activity in the CT relative to the IT at all coherence levels (Figure S1A, S1B, nested ANOVA: P(high) = 0.0326, F = 4.65; P(medium) = 0.0088, 142 F = 7.03; P(low) = 0.0076, F = 7.32; P(zero) = 0.0124, F = 6.4), and a trend toward lower activity to the nonpreferred direction for CT vs. IT (Figure S1C, S1D, nested ANOVA: P(high) = 0.0994, F = 2.75; P(medium) = 0.0649, F = 3.12; P(low) = 0.0311, F = 4.73; P(zero) = 0.0273, F = 4.96). Most of the LIP neurons (48 of 83) showed such opposing trends in activity modulation between the preferred and nonpreferred directions (Figure 2I). These results indicated a nonlinear modulation of saccade choice on motion DS in LIP, aligned precisely with the response property of each neuron. This is unlikely to be driven by a linear gain modulation of saccade direction selectivity. Receiver operating characteristic (ROC) analysis further confirmed significantly greater motion DS in the CT condition than in the IT condition (Figure 2J 148 and 2K; nested ANOVA: P(high) = 5.0e-4, F= 12.44; P(medium) = 9.53e-6, F = 20.91; P(low) = 9.33e-7, F 149 = 26.03; P(zero) = 2.56e-8, F= 34.3). Such DS differences were observed even before stimulus onset. Moreover, LIP neurons exhibited similar levels of mean activity between different saccade directions (CT vs. IT) before monkeysโ€™ saccade choice (Figure 2L), further supporting that saccade direction selectivity did not significantly contribute to the observed modulation of LIP neuronsโ€™ responses to motion stimuli.

      We also thank the reviewer for pointing out the missing of this relevant study, we have added the suggested refence in the revised discussion section as follows:

      โ€˜A recent study demonstrated that neurons in the middle temporal area responded more strongly to motion stimuli when monkeys saccaded toward their RFs in a standard decision task with a fixed mapping between motion stimuli and saccade directions. This modulation emerged through the training process and contributed causally to the monkeys' following saccade choices. Consistently, we found that the response of LIP neurons to motion stimuli was more strongly correlated with the monkeys' decisions in the CT condition (saccades toward RFs) than in the IT condition, in a more flexible decision task. Together, these results suggest that the modulation of action selection on sensory processing may be a general process in perceptual decision-making. However, the observed modulation of saccade direction on LIP neurons' responses to motion stimuli cannot be simply explained by saccade direction selectivity. Several lines of evidence argue against this possibility. First, the modulation effect was nonlinear; specifically, neuronal firing rates increased for preferred motion directions but decreased for non-preferred directions (Figure 2I and Figure S1). This pattern is unlikely to be driven by a linear gain modulation based on saccade directions. Second, we found that LIP neurons exhibited similar levels of activity in both the CT and IT conditions (Figure 2L), which is inconsistent with the presence of clear saccade direction selectivity.

      Some more specific comments are below:

      - I had a bit of a hard time with the abstract. It does not appear to be crystal clear to me, and it is the first thing that I am reading after the title. For example, if there is a claim about both perceptual decision-making and later target selection, then I feel that the task should be explained a bit more clearly than saying "flexible decision" task. Also, "..modulated by monkeys' following saccade choices directing outside each neuron's response field" was hard to read. It needs to be rewritten. Maybe just say "...modulated by the subsequent eye movement choices, even when these eye movement choices always directed the eyes away from the recorded neuron's response field". Also, I don't fully understand what "selectivity-specific feedback" means. Then, the concept of "consistency" in flexible decisions is brought up, again without much context. The above are examples of why I had a hard time with the abstract.

      We realize that our original statement may have been unclear and potentially caused confusion for the readers. Following the reviewerโ€™s suggestions, we have revised the abstract as follows:

      โ€˜Neural activity in the primate brain correlates with both sensory evaluation and action selection aspects of decision-making. However, the intricate interaction between these distinct neural processes and their impact on decision behaviors remains unexplored. Here, we examined the interplay of these decision processes in posterior parietal cortex (PPC) when monkeys performed a flexible decision task, in which they chose between two color targets based on a visual motion stimulus. We found that the PPC activity related to monkeysโ€™ abstract decisions about visual stimuli was nonlinearly modulated by their subsequent saccade choices, which were directed outside each neuronโ€™s response field. Recurrent neural network modeling indicated that the feedback connections, matching the learned stimuli-response associations during the task, might mediate such feedback modulation. Further analysis on network dynamics revealed that selectivity-specific feedback connectivity intensified the attractor basins of population activity underlying saccade choices, thereby increasing the reliability of flexible decisions. These results highlight an iterative computation between different decision processes, mediated primarily by precise feedback connectivity, contributing to the optimization of flexible decision-making.โ€™

      Specifically, selectivity-specific feedback refers to the feedback connections with positive or negative weights between selectivity-matched and selectivity-nonmatched unit pairs, respectively.

      Regarding "decision consistency," we define it as the degree to which the modelโ€™s decisions remain congruent under specific conditions. A higher level of decision consistency indicates that the model is more likely to produce the same choice each time it is presented with a stimulus under those conditions, in another words, decision reliability. We have revised the corresponding results section to make these concepts clearer.

      - Line 69: I'm not fully sure, but I think that some people might suggest that superior colliculus is also involved in the sensory aspect of the evaluation. But, I guess the sentence itself is correct as you write it. So, I don't think anyone should argue with it. However, if someone does argue with it, then they would flag the next sentence, since if the colliculus does both, then do the sensory and motor parts really employ distinct neural processes? Anyway, I think this is very minor.

      This is an interesting point. We have also noticed a recent study that demonstrates that the superior colliculus is causally involved in the sensory aspect of decision-making, specifically in visual categorization. However, the study also distinguishes between neural activity related to categorical decisions and that related to saccade planning. This suggests that the sensory and motor aspects of decision-making likely involve distinct neural processing, even within the same brain regionโ€”potentially reflecting separate populations of neurons. Therefore, we stand by our statement in the โ€˜next sentenceโ€™.

      - Line 79-80: you might want to look at this work because I feel that it is relevant to cite here: https://www.biorxiv.org/content/10.1101/2023.08.03.551852v2

      We have discussed this reference in the revised discussion section of the manuscript, please refer to the above response.

      - For a result like that shown in Fig. 2, I feel that it is important to show RF mapping with a saccade task alone. i.e. for the same neurons, have a monkey make a delayed visually guided saccade task to all possible locations on the display, and demonstrate that there is no modulation by saccades to the targets. Otherwise, the result in Fig. 2 could reflect first an onset response by a motion, and then the saccade-related response that would happen anyway, even without the decision task. So, I feel that now, it is not entirely clear whether the result reflects this so-called feedback modulation, or whether simply planning the saccade to the target itself activates the neurons. With large RF's, this is a distinct possibility in my opinion.

      - Line 174: this would also be predicted if the neuron's were responding based on the saccade target plan independent of the motion stimulus

      - On a related note, I would recommend plotting all data also aligned on saccade onset. This can help establish what the cause of the effects described is

      We understand the reviewerโ€™s concern that the modulation might be related to saccade planning, and we acknowledge that the original manuscript might not adequately address this potential confound. Unfortunately, we did not map the LIP neurons' receptive fields (RFs) using a saccade-only task. However, as mentioned earlier, we believe that the modulation of LIP neurons' responses to motion stimuli based on saccade choice direction cannot be simply attributed to saccade direction selectivity. Several lines of evidence support this conclusion. First, the modulation we observed was nonlinear: the firing rate of neurons increased for the preferred motion direction but decreased for the non-preferred motion direction (Figure 2i and Figure S1A-D). This pattern is inconsistent with a simple linear gain modulation driven by saccade direction selectivity. Second, we directly compared LIP neuronal activity for contralateral and ipsilateral target conditions, and found no significant differences between the two. This suggests that saccade direction selectivity is unlikely to be the primary contributor to the observed modulation. In the revised figure, we added a plot (Figure 2L) that aligns neural activity to saccade onset, in addition to the original alignment to motion stimulus onset (Figure S1E). This new analysis further supports our interpretation.

      Author response image 7.

      - Even when reading the simulation results, I'm still not 100% sure I understand what is meant by this idea of "consistency" of flexible decision-making

      We have addressed this issue in a previous comment and please refer to the response above.

    1. eLife Assessment

      The manuscript by Russell et al. investigates an important problem: the current lack of methods for early and accurate N. fowleri diagnosis, which is >95% fatal. The authors provide solid evidence that a small RNA secreted by N. fowleri is detectable in biological fluids like blood and urine in a mouse model, and is present in cerebrospinal fluid and blood for a limited number of patient samples. This could potentially help with earlier diagnosis, which could save lives.

    2. Reviewer #1 (Public review):

      Summary:

      Early and accurate diagnosis is critical to treating N. fowleri infections, which often lead to death within 2 weeks of exposure. Current methods are based on sampling cerebrospinal fluid, and are invasive, slow, and sometimes unreliable. Therefore, there is a need for a new diagnostic method. Russell et al. address this need by identifying small RNAs secreted by Naegleria fowleri (Fig. 1) that are detectable by RT-qPCR in multiple biological fluids including blood and urine. SmallRNA-1 and smallRNA-2 were detectable in plasma samples of mice experimentally infected with 6 different N. fowleri strains, and were not detected in uninfected mouse or human samples (Fig. 4). Further, smallRNA-1 is detectable in the urine of experimentally infected mice as early as 24 hours post infection (Fig. 5). The study culminates with testing human samples (obtained from the CDC) from patients with confirmed N. fowleri infections; smallRNA-1 was detectable in cerebrospinal fluid in 6 out of 6 samples (Fig. 6B), and in whole blood from 2 out of 2 samples (Fig. 6C). These results suggest that smallRNA-1 could be a valuable diagnostic marker for N. fowleri infection, detectable in cerebrospinal fluid, blood, or potentially urine.

      Strengths:

      This study investigates an important problem, and comes to a potential solution with a new diagnostic test for N. fowleri infection that is fast, less invasive than current methods, and seems robust to multiple N. fowleri strains. The work in mice is convincing that smallRNA1 is detectable in blood and urine early in infection. Analysis of patient blood samples shows that whole blood could be tested for smallRNA-1 to diagnose N. fowleri infections. The potential for human blood or urine to be tested for N. fowleri could lead to critical early interventions.

      Weaknesses:

      There are not many N. fowleri cases, so the authors were limited in the human samples available for testing. It is difficult to know how robust this biomarker is in whole blood, serum, or human urine due to little to no sample material being available for testing. This limitation is examined thoroughly in the discussion section, and additional tests are beyond the scope of this work.

    3. Author response:

      The following is the authorsโ€™ response to the original reviews.

      Reviewer #1 (Public review):ย 

      Summary:ย 

      Early and accurate diagnosis is critical to treating N. fowleri infections, which often lead to death within 2 weeks of exposure. Current methods-sampling cerebrospinal fluid are invasive, slow, and sometimes unreliable. Therefore, there is a need for a new diagnostic method. Russell et al. address this need by identifying small RNAs secreted by Naegleria fowleri (Figure 1) that are detectable by RT-qPCR in multiple biological fluids including blood and urine. SmallRNA-1 and smallRNA-2 were detectable in plasma samples of mice experimentally infected with 6 different N. fowleri strains, and were not detected in uninfected mouse or human samples (Figure 4). Further, smallRNA-1 is detectable in the urine of experimentally infected mice as early as 24 hours post-infection (Figure 5). The study culminates with testing human samples (obtained from the CDC) from patients with confirmed N. fowleri infections; smallRNA-1 was detectable in cerebrospinal fluid in 6 out of 6 samples (Figure 6B), and in whole blood from 2 out of 2 samples (Figure 6C). These results suggest that smallRNA-1 could be a valuable diagnostic marker for N. fowleri infection, detectable in cerebrospinal fluid, blood, or potentially urine.ย 

      Strengths:ย 

      This study investigates an important problem, and comes to a potential solution with a new diagnostic test for N. fowleri infection that is fast, less invasive than current methods, and seems robust to multiple N. fowleri strains. The work in mice is convincing that smallRNA1 is detectable in blood and urine early in infection. Analysis of patient blood samples suggest that whole blood (but not plasma) could be tested for smallRNA-1 to diagnose N. fowleri infections.ย 

      Thank you for comments regarding the strengths of this study. We agree that our data for detecting the biomarker in biofluids from mice is convincing. In addition, our spike-in studies with human cerebrospinal fluid, plasma, and urine (Figure 6) suggest these biofluids from humans could be used for diagnosis.

      We appreciate the comment regarding plasma and recognize this was not fully explained in the manuscript. We do believe that plasma can be used to assess the biomarker. Firstly, we demonstrated equivalent sensitivity of the method to detect smallRNA-1 in plasma and urine in mice with end-stage PAM (Figure 5). In addition, spike in samples of human plasma, cerebrospinal fluid, and urine demonstrated equivalent sensitivity of detecting the biomarker (Figure 6).ย 

      The negative result for human plasma in Figure 6C requires clarification; this sample was convalescent plasma from a survivor. The patient presented to the hospital on August 7, 2016, was treated, made a remarkable recovery, and was released from the hospital later that month. The plasma sample in Figure 6C was collected September 7, 2016, which is a month after treatment was initiated and weeks after the patient was symptom free. Our interpretation of the convalescent plasma result is the patient had cleared the active amoeba infection and that is why we did not detect the biomarker. We have added text in the discussion and in the legend for Figure 6 to clarify the convalescent plasma result.ย 

      One additional caveat for consideration is that many of the samples we received from amoebaeinfected humans were stored at room temperatures for undefined periods of time before being moved to <-20ยฐC (see details in Table S9). We canโ€™t rule out possible sample degradation, but this is an unfortunate reality of obtaining human samples from individuals later confirmed to be infected with pathogenic free-living amoebae.

      Weaknesses:ย 

      (1)ย There are not many N. fowleri cases, so the authors were limited in the human samples available for testing. It is difficult to know how robust this biomarker is in whole blood (only 2 samples were tested, both had detectable smallRNA-1), serum (1 out of 1 sample tested negative), or human urine (presumably there is no material available for testing). This limitation is openly discussed in the last paragraph of the discussion section.ย 

      We agree the extremely limited availability of human samples is a limitation of this study. Given the rarity of these infections in the United States, even prospective studies to systematically collect samples would be very challenging. We hope that by publishing the details of this biomarker detection is that the method can be used by diagnostic reference centers, especially in areas where outbreaks of multiple cases per year have been reported.

      (2) There seems to be some noise in the data for uninfected samples (Figures 4B-C, 5B, and 6C), especially for those with serum (2E). While this is often orders of magnitude lower than the positive results, it does raise questions about false positives, especially early in infection when diagnosis would be the most useful. A few additional uninfected human samples may be helpful.ย 

      We agree; however, we would like to point out the progression of disease in humans and mice are similar. Typically, patients survive between 10-14 days after presumed exposure and mice have similar survival times following instillation of N. fowleri amoebae into a nare of the mouse. Therefore, detection of this biomarker as early as 72 h in mice is seemingly equivalent to the onset of initial symptoms in humans. ย 

      Reviewer #2 (Public review):ย 

      Summary:ย 

      The authors sought to develop a rapid and non-invasive diagnostic method for primary amoebic meningoencephalitis (PAM), a highly fatal disease caused by Naegleria fowleri. Due to the challenges of early diagnosis, they investigated extracellular vesicles (EVs) from N. fowleri, identifying small RNA biomarkers. They developed an RT-qPCR assay to detect these biomarkers in various biofluids.ย 

      Strengths:ย 

      (1)ย  This study has a clear methodological approach, which allows for the reproducibility of the experiments.ย 

      (2)ย Early and Non-Invasive Diagnosis - The identification of a small RNA biomarker that can be detected in urine, plasma, and cerebrospinal fluid (CSF) provides a non-invasive diagnostic approach, which is crucial for improving early detection of PAM.ย 

      (3)ย High Sensitivity and Rapid Detection - The RT-qPCR assay developed in the study is highly sensitive, detecting the biomarker in 100% of CSF samples from human PAM cases and in mouse urine as early as 24 hours post-infection. Additionally, the test can be completed in ~3 hours, making it feasible for clinical use.ย 

      (4)ย  Potential for Disease Monitoring - Since the biomarker is detectable throughout the course of infection, it could be used not only for early diagnosis but also for tracking disease progression and monitoring treatment efficacy.ย 

      (5)ย ย Strong Experimental Validation - The study demonstrates biomarker detection across multiple sample types (CSF, urine, whole blood, plasma) in both animal models and human cases, providing robust evidence for its clinical relevance.ย 

      (6)ย Addresses a Critical Unmet Need - With a >97% case fatality rate, PAM urgently requires improved diagnostics. This study provides one of the first viable liquid biopsy-based diagnostic approaches, potentially transforming how PAM is detected and managed.ย 

      Thank you for summarizing the strengths of the study.

      Weaknesses:ย 

      (1) Limited Human Sample Size - While the biomarker was detected in 100% of CSF samples from human PAM cases, the number of human samples analyzed (n=6 for CSF) is relatively small. A larger cohort is needed to validate its diagnostic reliability across diverse populations.ย 

      As noted in response to Reviewer #1 above, we agree this is a limitation of the study; however, we were fortunate to obtain even 15 ยตL samples of cerebrospinal fluid, plasma, serum, or whole blood from as many patients as we did. There is an urgent need for more systematic collection and storage of samples for rare diseases like primary amoebic meningoencephalitis so that advancements in diagnostics and biomarker discovery can be conducted. It is our sincere hope that by publishing our detailed methods and experimental results in this manuscript, that additional hospitals and research centers can replicate our studies and help advance this or other techniques for early diagnosis of PAM.

      (2) Lack of Pre-Symptomatic or Early-Stage Human Data - Although the biomarker was detected in mouse urine as early as 24 hours post-infection, there is no data on whether it can be reliably detected before symptoms appear in humans, which is crucial for early diagnosis and treatment initiation.ย 

      It is difficult to envision a method to obtain these biofluids from infected humans prior to onset of symptoms. More likely the best we can hope for is that physicians include primary amoebic meningoencephalitis in their assessment of patients that present with prodromal symptoms of meningitis.

      (3)ย  Plasma Detection Challenges - While the biomarker was detected in whole blood, it was not detected in human plasma, which could limit the ease of clinical implementation since plasma-based diagnostics are more common. Further investigation is needed to understand why it is absent in plasma and whether alternative blood-based approaches (e.g., whole blood assays) could be optimized.ย 

      See response to Reviewer #1 above.

      Reviewer #1 (Recommendations for the authors):ย 

      (1)ย What is the evidence that these small RNAs are secreted specifically in EVs? I believe that they are, and ultimately it doesn't impact the conclusions, but I think the evidence here could be either stronger or presented in a more obvious way.ย 

      Our data demonstrates that smallRNA-1 is present in N. fowleri-derived EVs (Figures 2 and Supplemental Figure 7) and in the intact amoebae (Figure 3B).ย  Initial sequencing data to identify these smallRNA biomarkers came from PEG-precipitated EVs (Figure S1), by using methods we previously published (22). The PEG-precipitated EVs were extracted specifically for spike in studies. Finally, the smallRNAs in EVs were confirmed after extraction of EVs from 7 N. fowleri strains (Figure 2). We do not have evidence that they are secreted outside of EVs.

      (2) The figure legends would be more useful with some additional information. For example: why are there two points for Nf69 in Fig 2B? In Figure 3A-B, please add more detail as to what the graphs are showing (are they histograms binned by a number of amoebae? This does not seem obvious to me).ย 

      We agree the Figure legends should be edited for clarity and to add additional information. Both Figure legends have been updated.

      In Figure 2B, each point represents the mean of three technical replicates of EV preps for each N. fowleri strain.

      In Figure 3 the points indicate the Copy#/ยตL of a well from a 96-well plate. The histograms show the mean of these observations for each condition.ย 

      (3)ย  In Figure 2E, the FBS seems like it has near detectable levels of smallRNA-1 compared to Ac and Bm (albeit N. fowleri has 4 orders of magnitude higher levels than the FBS). Because cows are likely exposed to N. fowleri and have documented infections (e.g. doi: 10.1016/j.rvsc.2012.01.002), is it possible this signal is real?ย 

      Thank you for making this interesting observation. We agree that cows are likely to have significant exposure to N. fowleri, yet documented infections are rare. In this case we do not believe the near detectable levels of smallRNA-1 in FBS was due to an infected donor animal. This noise was likely due to extracting RNA from concentrated FBS rather than FBS diluted in cell culture media. In addition, as shown in Supplemental Figure 4, the qPCR product from EVs extracted from FBS were not the same as that from the N. fowleri-derived EVs. Please note we used a PEG extraction reagent that separates lipid particles, so this is additional evidence the smallRNAs are present in EVs.

      (4)ย  In Figure 6A, why was the sample size greater for water and unspiked urine? Similarly, why is the number of infected mice so variable in Figure 4B?ย 

      In Figure 6A we assayed de-identified biofluids provided by Advent Hospital in Orlando, Florida. The plasma and serum samples were pooled from multiple individuals; whereas, individual urine samples (n=8) were provided for this experiment. We have updated the legend for Figure 6A to include these details.

      For Figure 4B we used plasma collected at the end-stage of disease following infections with five different strains of N. fowleri. The sample sizes varied for two reasons. First, Nf69 was the strain used most by our lab and we had plasma from several in vivo experiments. The lower sample sizes for the other strains came from an experiment with 8 mice per group. Some of these strains were less virulent and did not succumb to disease with the number of amoebae inoculated in this experiment. Thus, plasma was only collected from animals that were euthanized due to severe N.

      fowleri infections. In follow up studies (e.g., Figure 5B), plasma was collected every 24 hr for analysis.

      Very minor points:ย 

      (1)ย  The number of acronyms (FLA, PAM, EVs, CNS, CSF, LOD) could be reduced to make this paper more reader-friendly.ย 

      Acronyms that were used infrequently in the manuscript (FLA, CNS, LOD, mNGS, UC) have been edited to spell out the complete names. We kept the acronyms EVs and CSF because they are each used more than twenty times in the manuscript.

      (2)ย  The decimal point in the Cq values is formatted strangely.ย 

      The decimal points have been edited to normal format in both the manuscript and supplementary material.

      (3)ย  Figure 3C is not intuitive. I do not understand the logic for the placement of the different samples (was row A only amoebae, B only Veros, C blank, D a mix, and F more Veros?).ย 

      Thank you for this comment; we agree the microtiter plate schematic (Fig 3C) was misleading. We have revised Figure 3C to make the point that we tested amoebae alone, Vero cells alone, and we combined supernatants from Vero cells (alone) plus amoebae (alone) to confirm that 1) smallRNA-1 was only detected in amoeba-conditioned media, and 2) that Vero-conditioned media does not affect detection of smallRNA-1.

      Reviewer #2 (Recommendations for the authors):ย 

      Minor corrections:ย 

      The abbreviation 'Nf' for Naegleria fowleri is not appropriate in a scientific publication. According to taxonomic conventions, the correct way to abbreviate a scientific name is as follows:ย 

      The first mention should be written in full: Naegleria fowleri.ย 

      In subsequent mentions, the genus name should be abbreviated to its initial in uppercase, followed by a period, while the species name remains in lowercase: N. fowleri.ย 

      The same rule applies to Balamuthia mandrillaris and Acanthamoeba species, which should be abbreviated as B. mandrillaris and Acanthamoeba spp. after their first mention.ย 

      We agree and each of the scientific names have been updated to the proper format. Please note Nf69 is the accepted nomenclature for this N. fowleri strain, so no changes were made when referring to this specific strain.

      Temperatures should be expressed in international units (ยฐC). Please update the temperatures reported in Fahrenheit (ยฐF) in the 'Materials and Methods' section, specifically in the 'Animal Studies' subsection.ย 

      These changes were made in the revised manuscript.

    1. eLife Assessment

      This convincing study, which is based on a survey of researchers, finds that women are less likely than men to submit articles to elite journals. It also finds that there is no relation between gender and reported desk rejection. The study is an important contribution to work on gender bias in the scientific literature.

    2. Reviewer #1 (Public review):

      Summary

      This paper summarises responses from a survey completed by around 5,000 academics on their manuscript submission behaviours. The authors find several interesting stylised facts, including (but not limited to):

      - Women are less likely to submit their papers to highly influential journals (e.g., Nature, Science and PNAS).<br /> - Women are more likely to cite the demands of co-authors as a reason why they didn't submit to highly influential journals.<br /> - Women are also more likely to say that they were advised not to submit to highly influential journals.

      The paper highlights an important point, namely that the submission behaviours of men and women scientists may not be the same (either due to preferences that vary by gender, selection effects that arise earlier in scientists' careers or social factors that affect men and women differently and also influence submission patterns). As a result, simply observing gender differences in acceptance rates - or a lack thereof - should not be automatically interpreted as as evidence for or against discrimination (broadly defined) in the peer review process.

      Major comments

      What do you mean by bias?

      In the second paragraph of the introduction, it is claimed that "if no biases were present in the case of peer review, then we should expect the rate with which members of less powerful social groups enjoy successful peer review outcomes to be proportionate to their representation in submission rates." There are a couple of issues with this statement.

      First, the authors are implicitly making a normative assumption that manuscript submission and acceptance rates *should* be equalised across groups. This may very well be the case, but there can also be valid reasons - even when women are not intrinsically better at research than men - why a greater fraction of female-authored submissions are accepted relative to male-authored submissions (or vice versa). For example, if men are more likely to submit their less ground-breaking work, then one might reasonably expect that they experience higher rejection rates compared to women, conditional on submission.

      Second, I assume by "bias", the authors are taking a broad definition, i.e., they are not only including factors that specifically relate to gender but also factors that are themselves independent of gender but nevertheless disproportionately are associated with one gender or another (e.g., perhaps women are more likely to write on certain topics and those topics are rated more poorly by (more prevalent) male referees; alternatively, referees may be more likely to accept articles by authors they've met before, most referees are men and men are more likely to have met a given author if he's male instead of female). If that is the case, I would define more clearly what you mean by bias. (And if that isn't the case, then I would encourage the authors to consider a broader definition of "bias"!)

      Identifying policy interventions is not a major contribution of this paper

      I would take out the final sentence in the abstract. In my opinion, your survey evidence isn't really strong enough to support definitive policy interventions to address the issue and, indeed, providing policy advice is not a major - or even minor - contribution of your paper. (Basically, I would hope that someone interested in policy interventions would consult another paper that much more thoughtfully and comprehensively discusses the costs and benefits of various interventions!) While it's fine to briefly discuss them at the end of your paper - as you currently do - I wouldn't highlight that in the abstract as being an important contribution of your paper.

      Minor comments

      - What is the rationale for conditioning on academic rank and does this have explanatory power on its own - i.e., does it at least superficially potentially explain part of the gender gap in intention to submit?

    3. Reviewer #2 (Public review):

      Basson et al. present compelling evidence supporting a gender disparity in article submission to "elite" journals. Most notably, they found that women were more likely to avoid submitting to one of these journals based on advice from a colleague/mentor. Overall, this work is an important addition to the study of gender disparities in the publishing process.

      I thank the authors for addressing my concerns.

    4. Reviewer #4 (Public review):

      Main strengths

      The topic of the MS is very relevant given that across the sciences/academia, genders are unevenly represented, which has a range of potential negative consequences. To change this, we need to have the evidence on what mechanisms cause this pattern. Given that promotion and merit in academia are still largely based on the number of publications and the impact factor, one part of the gap likely originates from differences in publication rates of women compared to men.

      Women are underrepresented compared to men in journals with a high impact factor. While previous work has detected this gap and identified some potential mechanisms, the current MS provides strong evidence that this gap might be due to a lower submission rate of women compared to men, rather than the rejection rates. These results are based on a survey of close to 5000 authors. The survey seems to be conducted well (though I am not an expert in surveys), and data analysis is appropriate to address the main research aims. It was impossible to check the original data because of the privacy concerns.

      Interestingly, the results show no gender bias in rejection rates (desk rejection or overall) in three high-impact journals (Science, Nature, PNAS). However, submission rates are lower for women compared to men, indicating that gender biases might act through this pathway. The survey also showed that women are more likely to rate their work as not groundbreaking and are advised not to submit to prestigious journals, indicating that both intrinsic and extrinsic factors shape women's submission behaviour.

      With these results, the MS has the potential to inform actions to reduce gender bias in publishing, but also to inform assessment reform at a larger scale.

      I do not find any major weaknesses in the revised manuscript.

    5. Author response:

      The following is the authorsโ€™ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary

      This paper summarises responses from a survey completed by around 5,000 academics on their manuscript submission behaviours. The authors find several interesting stylised facts, including (but not limited to):

      - Women are less likely to submit their papers to highly influential journals (*e.g.*, Nature, Science and PNAS).

      - Women are more likely to cite the demands of co-authors as a reason why they didn't submit to highly influential journals.

      - Women are also more likely to say that they were advised not to submit to highly influential journals.

      Recommendation

      This paper highlights an important point, namely that the submissions' behaviours of men and women scientists may not be the same (either due to preferences that vary by gender, selection effects that arise earlier in scientists' careers or social factors that affect men and women differently and also influence submission patterns). As a result, simply observing gender differences in acceptance rates---or a lack thereof---should not be automatically interpreted as as evidence of for or against discrimination (broadly defined) in the peer review process. I do, however, make a few suggestions below that the authors may (or may not) wish to address.

      We thank the author for this comment and for the following suggestions, which we take into account in our revision of the manuscript.

      Major comments

      What do you mean by bias?

      In the second paragraph of the introduction, it is claimed that "if no biases were present in the case of peer review, then 'we should expect the rate with which members of less powerful social groups enjoy successful peer review outcomes to be proportionate to their representation in submission rates." There are a couple of issues with this statement.

      - First, the authors are implicitly making a normative assumption that manuscript submission and acceptance rates *should* be equalised across groups. This may very well be the case, but there can also be important reasons why not -- e.g., if men are more likely to submit their less ground-breaking work, then one might reasonably expect that they experience higher rejection rates compared to women, conditional on submission.

      We do assume that normative statement: unless we believe that menโ€™s papers are intrinsically better than womenโ€™s papers, the acceptance rate should be the same. But the referee is right: we have no way of controlling for the intrinsic quality of the work of men and women. That said, our manuscript does not show that there is a different acceptance rate for men and women; it shows that women are less likely to submit papers to a subset of journals that are of a lower Journal Impact Factor, controlling for their most cited paper, in an attempt to control for intrinsic quality of the manuscripts.

      - Second, I assume by "bias", the authors are taking a broad definition, i.e., they are not only including factors that specifically relate to gender but also factors that are themselves independent of gender but nevertheless disproportionately are associated with one gender or another (e.g., perhaps women are more likely to write on certain topics and those topics are rated more poorly by (more prevalent) male referees; alternatively, referees may be more likely to accept articles by authors they've met before, most referees are men and men are more likely to have met a given author if he's male instead of female). If that is the case, I would define more clearly what you mean by bias. (And if that isn't the case, then I would encourage the authors to consider a broader definition of "bias"!)

      Yes, the referee is right that we are taking a broad definition of bias. We provide a definition of bias on page 3, line 92. This definition is focused on differential evaluation which leads to differential outcomes. We also hedge our conversation (e.g., page 3, line 104) to acknowledge that observations of disparities may only be an indicator of potential bias, as many other things could explain the disparity. In short, disparities are a necessary but insufficient indicator of bias. We add a line in the introduction to reinforce this. The only other reference to the term bias comes on page 10, line 276. We add a reference to Lee here to contextualize.

      Identifying policy interventions is not a major contribution of this paper

      In my opinion, the survey evidence reported here isn't really strong enough to support definitive policy interventions to address the issue and, indeed, providing policy advice is not a major -- or even minor -- contribution of your paper, so I would not mention policy interventions in the abstract. (Basically, I would hope that someone interested in policy interventions would consult another paper that much more thoughtfully and comprehensively discusses the costs and benefits of various interventions!)

      We thank the referee for this comment. While we agree that our results do not lead to definitive policy interventions, we believe that our findings point to a phenomenon that should be addressed through policy interventions. Given that some interventions are proposed in our conclusion, we feel like stating this in the abstract is coherent.

      Minor comments

      - What is the rationale for conditioning on academic rank and does this have explanatory power on its own---i.e., does it at least superficially potentially explain part of the gender gap in intention to submit?

      The referee is right: academic rank was added to control for career age of researchers, with the assumption that this variable would influence submission behavior. However, the rank information we collected was for the time that the individual respondent took the survey, which could be different from the rank they held concerning their submission behaviors mentioned in the survey. That is why we didn't consider rank as an independent variable of interest. But I do also agree with the reviewer that it could be related to their submission behaviors in some cases. Our initial analysis shows that academic rank is not a significant predictor of whether researchers submitted to SNP, but does contribute significantly to the SNP acceptance rates and desk rejection rates of individuals in Medical Sciences.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Basson et al. study the representation of women in "high-impact" journals through the lens of gendered submission behavior. This work is clear and thorough, and it provides new insights into gender disparities in submissions, such as that women were more likely to avoid submitting to one of these journals based on advice from a colleague/mentor. The results have broad implications for all academic communities and may help toward reducing gender disparities in "high-impact" journal submissions. I enjoyed reading this article, and I have several recommendations regarding the methodology/reporting details that could help to enhance this work.

      We thank the referee for their comments.

      Strengths:

      This is an important area of investigation that is often overlooked in the study of gender bias in publishing. Several strengths of the paper include:

      (1) A comprehensive survey of thousands of academics. It is admirable that the authors retroactively reached out to other researchers and collected an extensive amount of data.

      (2) Overall, the modeling procedures appear thorough, and many different questions are modeled.

      (3) There are interesting new results, as well as a thoughtful discussion. This work will likely spark further investigation into gender bias in submission behavior, particularly regarding the possible gendered effect of mentorship on article submission.

      Thank you for those comments.

      Weaknesses:

      (1) The GitHub page should be further clarified. A detailed description of how to run the analysis and the location of the data would be helpful. For example, although the paper says that "Aggregated and de-identified data by gender, discipline, and rank for analyses are available on GitHub," I was unable to find such data.

      We added the link to the Github page, as well as more details on the how to run the statistical analysis. Unfortunately, our IRB approval does not allow for the sharing of the raw data.

      (2) Why is desk rejection rate defined as "the number of manuscripts that did not go out for peer review divided by the number of manuscripts rejected for each survey respondent"? For example, in your Grossman 2020 reference, it appears that manuscripts are categorized as "reviewed" or "desk-rejected" (Grossman Figure 2). If there are gender differences in the denominator, then this could affect the results.

      We thank the referee for pointing this out. Actually, what the referee is proposing is how we calculated it in the manuscript; the calculation mentioned in the manuscript was a mistake. We corrected the manuscript.

      (3) Have you considered correcting for multiple comparisons? Alternatively, you could consider reporting P-values and effect sizes in the main text. Otherwise, sometimes the conclusions can be misleading. For example, in Figure 3 (and Table S28), the effect is described as significant in Social Sciences (p=0.04) but not in Medical Sciences (p=0.07).

      We highly appreciate the suggestion. Weโ€™ve added Odds Ratio values and p-values to the main manuscript.

      (4) More detail about the models could be included. It may be helpful to include this in each table caption so that it is clear what all the terms of the model were. For instance, I was wondering if journal or discipline are included in the models.

      We appreciate the suggestion. Weโ€™ve added model details to the figure and table captions in the manuscript and the supplemental materials.

      Reviewer #3 (Public Review):

      Summary:

      This is a strong manuscript by Basson and colleagues which contributes to our understanding of gender disparities in scientific publishing. The authors examine attitudes and behaviors related to manuscript submission in influential journals (specifically, Science, Nature and PNAS). The authors rightly note that much attention has been paid to gender disparities in work that is already published, but this fails to capture the unseen hurdles that occur prior to publication (which include decisions about where to publish, desk rejections, revisions and resubmissions, etc.). They conducted a survey study to address some of these components and their results are interesting:

      They find that women are less likely to submit their manuscript to Science, Nature or PNAS. While both men and women feel their work would be better suited for more specialized journals, women were more likely to think their work was 'less novel or groundbreaking.'

      A smaller proportion of respondents indicated that they were actively discouraged from submitting their manuscripts to these journals. In this instance, women were more likely to receive this advice than men.

      Lastly, the authors also looked at self-reported acceptance and rejection rates and found that there were no gender differences in acceptance or rejection rates.

      These data are helpful in developing strategies to mitigate gender disparities in influential journals.

      We thank the referee for their comments

      Comments:

      The methods the authors used are appropriate for this study. The low response rate is common for this type of recruitment strategy. The authors provide a thoughtful interpretation of their data in the Discussion.

      We thank the referee for their comments

      Reviewer #4 (Public Review):

      This manuscript covers an important topic of gender biases in the authorship of scientific publications. Specifically, it investigates potential mechanisms behind these biases, using a solid approach, based on a survey of researchers.

      Main strengths

      The topic of the MS is very relevant given that across sciences/academia representation of genders is uneven, and identified as concerning. To change this, we need to have evidence on what mechanisms cause this pattern. Given that promotion and merit in academia are still largely based on the number of publications and impact factor, one part of the gap likely originates from differences in publication rates of women compared to men.

      Women are underrepresented compared to men in journals with high impact factor. While previous work has detected this gap, as well as some potential mechanisms, the current MS provides strong evidence, based on a survey of close to 5000 authors, that this gap might be due to lower submission rates of women compared to men, rather than the rejection rates. The data analysis is appropriate to address the main research aims. The results interestingly show that there is no gender bias in rejection rates (desk rejection or overall) in three high-impact journals (Science, Nature, PNAS). However, submission rates are lower for women compared to men, indicating that gender biases might act through this pathway. The survey also showed that women are more likely to rate their work as not groundbreaking, and be advised not to submit to prestigious journals

      With these results, the MS has the potential to inform actions to reduce gender bias in publishing, and actions to include other forms of measuring scientific impact and merit.

      We thank the referee for their comments.

      Main weakness and suggestions for improvement

      (1) The main message/further actions: I feel that the MS fails to sufficiently emphasise the need for a different evaluation system for researchers (and their research). While we might act to support women to submit more to high-impact journals, we could also (and several initiatives do this) consider a broader spectrum of merits (e.g. see https://coara.eu/ ). Thus, I suggest more space to discuss this route in the Discussion. Also, I would suggest changing the terms that imply that prestigious journals have a better quality of research or the highest scientific impact (line 40: journals of the highest scientific impact) with terms that actually state what we definitely know (i.e. that they have the highest impact factor). And think this could broaden the impact of the MS

      We agree with the referee. We changed the wording on impact, and added a few lines were added on this in the discussion.

      (2) Methods: while methods are all sound, in places it is difficult to understand what has been done or measured. For example, only quite late (as far as I can find, it's in the supplement) we learn the type of authorship considered in the MS is the corresponding authorship. This information should be clear from the very start (including the Abstract).

      We performed the suggested edits.

      Second, I am unclear about the question on the perceived quality of research work. Was this quality defined for researchers, as quality can mean different things (e.g. how robust their set-up was, how important their research question was)? If researchers have different definitions of what quality means, this can cause additional heterogeneity in responses. Given that the survey cannot be repeated now, maybe this can be discussed as a limitation.

      We agree that this can mean something different for researchersโ€”probably varies by discipline, but also by gender. But that was precisely the point: whether men/women considered their โ€œbest workโ€ to be published in higher impact venue. While there may be heterogeneity in those perceptions, the fact that 1) men and women rate their research at the same level and 2) we control for disciplinary differences should mitigate some of that.

      I was surprised to see that discipline was considered as a moderator for some of the analyses but not for the main analysis on the acceptance and rejection rates.

      We appreciate the attention to detail. In our analysis of acceptance and rejection rates, we conducted separate regression analyses for each discipline to capture any field-specific patterns that might otherwise be obscured.

      We added more details on this to clarify.

      I was also suppressed not to see publication charges as one of the reasons asked for not submitting to selected journals. Low and middle-income countries often have more women in science but are also less likely to support high publication charges.

      That is a good point. However, both Science and Nature have subscription options, which do not require any APCs.

      Finally, academic rank was asked of respondents but was not taken as a moderator.

      Academic rank is included in the regression as a control variable (Figure 1).

      Reviewer #2 (Recommendations For The Authors):

      In addition to the points in the "Weaknesses" section of the my Public Review above, I have several suggestions to improve this work.

      (1) Can you please indicate what the error bars mean in each plot? I am assuming that they are 95% confidence intervals.

      We appreciate the attention to detail. Yes, they are 95% confidence intervals. Weโ€™ve clarified this in the captions of the corresponding figures.ย 

      (2) Can you provide a more detailed explanation for why the 7 journals were separated? I see that on page 3 of the supporting information you write that "Due to limited responses, analysis per journal was not always viable. The results pertaining to the journals were aggregated, with new categories based on the shared similarities in disciplinary foci of the journals and their prestige." Specifically, why did you divide the data into (somewhat arbitrary) categories as opposed to using all the data and including a journal term in your model?

      The survey covered 7 journals:

      โ€ข Science, Nature, and PNAS (S.N.P.)

      โ€ข Nature Communications and Science Advances (NC.SA.)

      โ€ขย NEJM and Cell (NEJM.C.)

      We believe that the first three are a class of their own: they cover all fields (while NEJM and Cell are limited to (bio)medical sciences), and have a much higher symbolic capital than both Nature Comms and Science Advances (which are receiving cascading papers from Nature and Science, respectively). We believe that factors leading to submission to S.N.P. are much different than those leading to submission to the other groups of journals, which is why we separated the analysis in that manner.

      (3) You included random effects for linear regression but not for logistic regression. Please justify this choice or include additional logistic regression models with random effects.

      We used mixed-effect models for linear regressions (where number of submissions, acceptance rate, or rejection rate is the dependent variable). As mentioned in the previous comment, we tested using rank as the control variable and found it had a potential impact on the variables we analyzed using linear regressions in some disciplines. Therefore, we introduced it as a random effect for all the linear regression models.

      Reviewer #3 (Recommendations For The Authors):

      The limitations of this work are currently described in the Supplement. It may be helpful to bring several of these items into the Discussion so that they can be addressed more prominently.

      Added content

      Reviewer #4 (Recommendations For The Authors):

      (1) Line 40: add 'as leading authors of papers published in' before ' 'journals'

      Done

      (2) Explain what the direction in the ' relationship between' line 62 is

      Added

      (3) Lines 101-102 - this is a bit unclear. Please, provide some more info, also including what did these studies find.

      Added

      (4) Is 'sociodemographic' the best term in line 120

      Yes, we believe so.

      (5) Results would benefit from a short intro with the info on the number of respondents, also by gender.

      Those are present at the end of the intro (and in the methods, at the end). We nonetheless added gender.

      (6) Line 134 add how many woman and man did submit to Science, Nature, and PNAS

      Added. In all disciplines combined, 552 women and 1,583 men ever submitted to these three elite journals. More details can be found in SI Table 9

      (7) Add 'Self-' before reported, line 141

      Added

      (8) Add sample sizes to Figs 1 and 2

      Those are in the appendix

      (9) Line 168 - unclear if this is ever or as their first choice

      We do not discriminate โ€“ it is whether the considered it at all.

      (10) Add sample size in line 177

      Added. 480 women and 1404 men across all disciplines reported desk rejections by S.N.P. journals.

      (11) I would like to see some discussion on the fact that the highest citation paper will also be a paper that the authors have submitted earlier in their careers given that citations will pile up over time.

      Those are actually quite evenly distributed. We modified the supplementary materials.

      (12) Data availability - be clear that supporting info contains only summary data. Also, while the Data availability statement refers to de-identified data on Github, the Github page only contains the code, and the note that 'The STAT code used for our analyses is shared.

      We are unable to share the survey response details publicly per IRB protocols.' Why were de-identified data shared? This is extremely important to allow for the reproducibility of MS results. I would also suggest sharing data in a trusted repository (e.g. Dryad, ZENODO...) rather than on Github, as per current recommendations on the best practices for data sharing.

      Thank you for your careful reading and for highlighting the importance of clear data availability. We will revise our Data Availability Statement to explicitly state that the supporting information contains only summary data and that the complete analysis code is available on GitHub.

      We understand the importance of sharing de-identified data for reproducibility. However, our IRB strictly prohibits the sharing of any individual-level data, including de-identified files, to protect participant confidentiality. Consequently, the summary data included in the supporting information, together with the provided code, is intended to facilitate the verification of our core findings. Our previous statement regarding โ€œde-identifiedโ€ data sharing was inaccurate and thus has been removed. We apologize for the confusion.

      In light of your suggestion, we are also exploring depositing the summary data and code in a trusted repository (e.g., Dryad or Zenodo) to further align with current best practices for data sharing.

    1. eLife Assessment

      In this useful study, the authors perform voltage imaging of CA1 pyramidal cells in head-fixed mice running on a track while local field potentials (LFPs) were recorded in the contralateral hemisphere. The authors conclude that synchronous ensembles of neurons are associated with theta rhythms but not with contralateral sharp wave-ripples. However, evidence for some of the paper's primary claims remains incomplete, due to limitations of the experimental approach.

    2. Joint Public Review:

      Summary:

      For many years, there has been extensive electrophysiological research investigating the relationship between local field potential patterns and individual cell spike patterns in the hippocampus. In this study, using innovative imaging techniques, they examined spike synchrony of hippocampal cells during locomotion and immobility states. The authors demonstrated that hippocampal place cells exhibit prominent synchronous spikes locked to theta oscillations.

      Strengths:

      The single cell voltage imaging used in this study is a highly novel method that may allow recordings that were not previously possible using existing methods.

      Weaknesses:

      The strength of evidence remains incomplete because of the main claim that synchronous events are not associated with ripples. As was mentioned in previous rounds of review, ripples emerge locally and independently in the two hemispheres. Thus, obtaining ripple recordings from the contralateral hemisphere does not provide solid evidence for this claim. The papers the authors are citing to make the claim that "Additionally, we implanted electrodes in the contralateral CA1 region to monitor theta and ripple oscillations, which are known to co-occur across hemispheres (29-31)" do not support this claim. For example, reference 29 contains the following statement: "These findings suggest that ripples emerge locally and independently in the two hemispheres".